The Biggest Con of the 21st Century: Tokens

You pay for what you use. That's the deal. Except it's not.

When you use an AI model — GPT-4, Claude, Gemini — you do not pay per word. You pay per token. And that tiny technical detail is quietly costing you, depending on which company you choose, up to 60% more for the exact same request.

60% Extra cost for non-English speakers

420× Price gap between cheapest & priciest model

0 Standardization across providers

What Is a Token, Really?

Before we get to the money, a crash course. Tokens are not words. They are subword units produced by a compression algorithm called BPE (Byte Pair Encoding) — originally a data-compression technique, repurposed for NLP in the 2010s. The algorithm learns frequent character sequences in a corpus and groups them into single vocabulary entries.

The catch: every AI company trains its own tokenizer on its own corpus with its own vocabulary size. The result is that the same word gets sliced differently depending on who's counting:

OpenAI · tiktoken

"unbelievable"

un believ able

Total tokens 3

Google · SentencePiece

"unbelievable"

▁un believable

Total tokens 2

Anthropic · Proprietary

"unbelievable"

un be liev able

Total tokens 4

Same word. Three different prices. The bill you receive depends not on what you said — but on which tokenizer counted it.

The Dirty Secret — Tokens Are Not Standardized

There is no ISO standard for AI tokens. No regulatory body. No published audit. Each major provider uses a different system:

OpenAI      → tiktoken (cl100k_base / o200k_base)    ~100k vocab
Google      → SentencePiece (older) + custom (Gemini) ~256k vocab
Anthropic   → Proprietary — barely documented          ~?? vocab
Meta LLaMA  → BPE                                      ~32k vocab
Mistral     → Custom BPE                               ~32k vocab

Anthropic's tokenizer is particularly opaque. There is no public specification, no open-source release, and the documentation amounts to a single paragraph in their pricing FAQ. You are billed by a black box.

The Language Tax

The most damaging consequence of non-standardized tokenization is what we call the Language Tax. English — specifically American English — was the dominant language in most training corpora. As a result, English tokenizes efficiently. Every other language pays a premium.

Language	Tokens / Word	Overhead vs English	Relative Cost
English	~1.3	baseline	1.0×
Spanish	~2.1	+62%	1.6×
French	~2.0	+54%	1.5×
German	~2.1	+62%	1.6×
Russian	~3.3	+154%	2.5×
Arabic	~4.0	+208%	3.1×
Hindi	~6.4	+392%	4.9×

A Spanish speaker pays 60% more tokens for the same content. A Hindi speaker pays nearly 5× more. The pricing page lists the same dollar rate per million tokens — but the number of tokens you consume is quietly different depending on your language.

The Pricing War

On top of tokenization differences, the pricing gap between providers has exploded. As of March 2026:

Provider / Model	Input $/M	Output $/M	Note
Google Gemini Flash-Lite	$0.10	$0.40	Cheapest viable
Google Gemini 2.5 Pro	$1.25	$10	Strong value
OpenAI GPT-4o	$3	$10	Mainstream
Anthropic Claude Opus 4.6	$5	$25	Standard
Anthropic Claude Opus 4.6 (Fast)	$30	$150	Speed premium
OpenAI GPT-5.2 Pro (projected)	$21	$168	Most expensive

💸 420× Price Gap

Between GPT-5.2 Pro output ($168/M) and Gemini Flash-Lite ($0.40/M), there is a 420× price difference — for models both marketed as "AI assistants." The gap is real, and growing.

Same Prompt, Different Bill

Let's make this concrete. Take a real-world agent task: 100-word user message + 500-word system prompt + 200-word response. English vs Spanish, same content:

                       English     Spanish     Difference
──────────────────────────────────────────────────────────
User message (100w)  ~  130 tok  ~  210 tok
System prompt (500w) ~  650 tok  ~ 1,050 tok
Response (200w)      ~  260 tok  ~  404 tok
                    ─────────── ───────────
TOTAL                ~ 1,040 tok ~ 1,664 tok   +60%

At Claude Opus 4.6 rates:
  English:  ~$0.0052  (input) + ~$0.0065 (output)
  Spanish:  ~$0.0083  (input) + ~$0.0101 (output)
  Extra monthly cost for a Spanish-language app: significant.

This is not a rounding error. At scale — millions of agent calls per month — the language tax becomes a serious cost factor, and most teams discover it only after they've already committed to a provider and a language.

When Token Became Fake Currency

This pattern has happened before. When cloud computing emerged in the 2000s, every major provider invented their own unit of compute: AWS had EC2 hours, Azure had Credits, Google had Compute Units. Each defined differently. Each deliberately opaque. Comparison required a spreadsheet — and that friction always benefited the seller.

AI has recreated the same opacity with tokens. A "token" from OpenAI is not the same as a "token" from Anthropic, which is not the same as a "token" from Google. They share a name and nothing else.

The uncomfortable truth: Tokens are a brilliant business model. Abstract enough that most users don't think deeply about them. Defined differently by every player. Non-comparable by design. And confusion, in markets with asymmetric information, always benefits the seller.

The Solution: TokensTree

We built TokensTree precisely because this problem is structural — it won't be fixed by any single provider, because it's in their interest to maintain the fog. The answer has to be infrastructural.

Two mechanisms address this directly:

SafePaths with Remote Cache: Verified command paths are stored once and reused across agents. The first agent that solves a problem pays the full token cost. Every subsequent agent retrieves the cached result for a fraction of the tokens. Like Bazel build caching for AI knowledge — repeated computations are cached, shared, and reused. Token consumption drops. Latency drops. The language of the requesting agent becomes irrelevant to the token cost of the stored answer.

Cross-provider token accounting: TokensTree normalizes token counts across providers, so you can see what a task actually costs — not what each provider's tokenizer claims it costs. One dashboard. Real comparisons. No fog.

Every 1B tokens saved = 1 tree planted. When token efficiency is the mission, not just a talking point, the incentives align differently. We save tokens because it matters — for cost, for access equity, and for the planet.

If the language tax is the toll you pay at every call, tokenstree.eu is the route optimizer that finds the cheapest crossing before your prompt even reaches the tokenizer. It intercepts requests automatically — translating them into the most BPE-efficient encoding, sending them to the model, then returning the response in your language. Your French stays French. Your Spanish stays Spanish. The token count drops in the middle. That is what fighting the fog looks like in practice.

TokensTree is building the infrastructure for a more efficient AI economy. Token pricing data reflects publicly available rates as of March 2026 and is subject to change. Language tax ratios are approximate averages across common use cases, not guarantees for specific inputs. tokenstree.com