You pay for what you use. That's the deal. Except it's not.
When you use an AI model โ GPT-4, Claude, Gemini โ you do not pay per word. You pay per token. And that tiny technical detail is quietly costing you, depending on which company you choose, up to 60% more for the exact same request.
What Is a Token, Really?
Before we get to the money, a crash course. Tokens are not words. They are subword units produced by a compression algorithm called BPE (Byte Pair Encoding) โ originally a data-compression technique, repurposed for NLP in the 2010s. The algorithm learns frequent character sequences in a corpus and groups them into single vocabulary entries.
The catch: every AI company trains its own tokenizer on its own corpus with its own vocabulary size. The result is that the same word gets sliced differently depending on who's counting:
The Dirty Secret โ Tokens Are Not Standardized
There is no ISO standard for AI tokens. No regulatory body. No published audit. Each major provider uses a different system:
OpenAI โ tiktoken (cl100k_base / o200k_base) ~100k vocab Google โ SentencePiece (older) + custom (Gemini) ~256k vocab Anthropic โ Proprietary โ barely documented ~?? vocab Meta LLaMA โ BPE ~32k vocab Mistral โ Custom BPE ~32k vocab
Anthropic's tokenizer is particularly opaque. There is no public specification, no open-source release, and the documentation amounts to a single paragraph in their pricing FAQ. You are billed by a black box.
The Language Tax
The most damaging consequence of non-standardized tokenization is what we call the Language Tax. English โ specifically American English โ was the dominant language in most training corpora. As a result, English tokenizes efficiently. Every other language pays a premium.
| Language | Tokens / Word | Overhead vs English | Relative Cost |
|---|---|---|---|
| English | baseline | 1.0ร | |
| Spanish | +62% | 1.6ร | |
| French | +54% | 1.5ร | |
| German | +62% | 1.6ร | |
| Russian | +154% | 2.5ร | |
| Arabic | +208% | 3.1ร | |
| Hindi | +392% | 4.9ร |
The Pricing War
On top of tokenization differences, the pricing gap between providers has exploded. As of March 2026:
| Provider / Model | Input $/M | Output $/M | Note |
|---|---|---|---|
| Google Gemini Flash-Lite | $0.10 | $0.40 | Cheapest viable |
| Google Gemini 2.5 Pro | $1.25 | $10 | Strong value |
| OpenAI GPT-4o | $3 | $10 | Mainstream |
| Anthropic Claude Opus 4.6 | $5 | $25 | Standard |
| Anthropic Claude Opus 4.6 (Fast) | $30 | $150 | Speed premium |
| OpenAI GPT-5.2 Pro (projected) | $21 | $168 | Most expensive |
Between GPT-5.2 Pro output ($168/M) and Gemini Flash-Lite ($0.40/M), there is a 420ร price difference โ for models both marketed as "AI assistants." The gap is real, and growing.
Same Prompt, Different Bill
Let's make this concrete. Take a real-world agent task: 100-word user message + 500-word system prompt + 200-word response. English vs Spanish, same content:
English Spanish Difference
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
User message (100w) ~ 130 tok ~ 210 tok
System prompt (500w) ~ 650 tok ~ 1,050 tok
Response (200w) ~ 260 tok ~ 404 tok
โโโโโโโโโโโ โโโโโโโโโโโ
TOTAL ~ 1,040 tok ~ 1,664 tok +60%
At Claude Opus 4.6 rates:
English: ~$0.0052 (input) + ~$0.0065 (output)
Spanish: ~$0.0083 (input) + ~$0.0101 (output)
Extra monthly cost for a Spanish-language app: significant.
This is not a rounding error. At scale โ millions of agent calls per month โ the language tax becomes a serious cost factor, and most teams discover it only after they've already committed to a provider and a language.
When Token Became Fake Currency
This pattern has happened before. When cloud computing emerged in the 2000s, every major provider invented their own unit of compute: AWS had EC2 hours, Azure had Credits, Google had Compute Units. Each defined differently. Each deliberately opaque. Comparison required a spreadsheet โ and that friction always benefited the seller.
AI has recreated the same opacity with tokens. A "token" from OpenAI is not the same as a "token" from Anthropic, which is not the same as a "token" from Google. They share a name and nothing else.
The Solution: TokensTree
We built TokensTree precisely because this problem is structural โ it won't be fixed by any single provider, because it's in their interest to maintain the fog. The answer has to be infrastructural.
Two mechanisms address this directly:
SafePaths with Remote Cache: Verified command paths are stored once and reused across agents. The first agent that solves a problem pays the full token cost. Every subsequent agent retrieves the cached result for a fraction of the tokens. Like Bazel build caching for AI knowledge โ repeated computations are cached, shared, and reused. Token consumption drops. Latency drops. The language of the requesting agent becomes irrelevant to the token cost of the stored answer.
Cross-provider token accounting: TokensTree normalizes token counts across providers, so you can see what a task actually costs โ not what each provider's tokenizer claims it costs. One dashboard. Real comparisons. No fog.
If the language tax is the toll you pay at every call, tokenstree.eu is the route optimizer that finds the cheapest crossing before your prompt even reaches the tokenizer. It intercepts requests automatically โ translating them into the most BPE-efficient encoding, sending them to the model, then returning the response in your language. Your French stays French. Your Spanish stays Spanish. The token count drops in the middle. That is what fighting the fog looks like in practice.
TokensTree is building the infrastructure for a more efficient AI economy. Token pricing data reflects publicly available rates as of March 2026 and is subject to change. Language tax ratios are approximate averages across common use cases, not guarantees for specific inputs. tokenstree.com