Two weeks ago the GitHub repo called caveman went from zero to 64,000 stars by making Claude talk like a caveman. "Brain still big. Mouth small." Sixty-five percent fewer output tokens, on their own benchmark, with no accuracy loss. It is a beautiful piece of work and the numbers hold up.

I forked it. Not to compete, to complete. The fork is at vfalbor/caveman_tokenstransfer.2.0 and it goes live today as v2.0.0.

caveman cuts the mouth. we cut the ear.

Every Claude call has two token counts: the input you sent (system prompt + context + question) and the output the model produced (the answer). Anthropic prices them separately. On Haiku 4.5 the input is $1/M and the output is $5/M. Output is five times more expensive per token, but on long-context apps the input is five to twenty times more voluminous. The bill split:

Workload % of bill that is input Caveman wins?
RAG (10k context, 500 answer) 80% small win
Agent loop (15k system+tools, 300 answer) 91% small win
Chat (2k Q, 800 A) 33% good win
Code generation (1k Q, 3k A) 6% massive win

Caveman attacks the side where the per-token price is highest. For codegen that is the right move. For RAG and agents, where the bill is dominated by the input that the user never sees, it is the smaller lever. The big lever on those workloads is input compression.

what 2.0 adds

Three new skills, all peer to caveman (not subordinate):

  • /tokenstransfer β€” LLMLingua-2 input compression. Runs 100% local by default (pip install llmlingua torch tiktoken), no API key needed. Same fully-local philosophy as caveman.
  • /tokenstranslation β€” multilingual fix. Same string in Spanish costs 27% more tokens than in English. Arabic: 230% more. tokenstranslation detects source language, translates prompt to English before sending, translates response back. The user sees their native language. The bill sees English.
  • /caveman-fullstack β€” caveman dialect + tokenstransfer compression in one command. All three at once.

All upstream caveman skills are untouched: /caveman with its four levels (lite, full, ultra, wenyan), /caveman-compress with the regex rules, /caveman-review, /caveman-commit, /cavecrew, the installer for 30+ agents. The fork is additive.

the numbers

25 prompts across 4 suites β€” coding, RAG, agent loops, multilingual. Tokenized with tiktoken cl100k_base (Claude tokenizer within ~2% on Latin scripts). Cost modeled on Haiku 4.5 pricing. Average per-call cost reduction:

Suite caveman tokenstransfer fullstack
coding (short-Q, long-A) -65% -2% -65%
RAG (long context, short answer) -62% -27% -64%
agent (system + tools + history) -63% -42% -69%
multilingual -65% -3% (en→en) -65% + tokenstranslation
overall avg -60.2% -3.7% -63.9%

The two compress different things and they compose. On agent loops with verbose tool definitions and long histories, stacking takes -69% off per call. On codegen, caveman alone is already enough.

fully local, like caveman

Caveman's strength is that it does not depend on any service. Just markdown skills and rules, all running locally. The first version of tokenstransfer required an API call to our hosted LLMLingua-2 service, which broke that property. 2.0 fixes it: with pip install llmlingua torch tiktoken, the compressor runs in your Python process, downloads the ~1.5GB model once from HuggingFace, and never phones home again. Same numbers as the hosted version.

For users who want it as a microservice β€” typically teams running it across many agents β€” there is a server/ folder with a FastAPI app, Dockerfile, and compose file. Self-host in one command. And for those who want neither (no Python deps, no Docker), the hosted convenience tier at transfer.tokenstree.com stays available with a free tier.

install

# macOS / Linux / WSL
curl -fsSL https://raw.githubusercontent.com/vfalbor/caveman_tokenstransfer.2.0/main/install.sh | bash

# enable 100% local mode (optional but recommended)
pip install llmlingua torch tiktoken

Then in Claude Code, Codex, Cursor, Windsurf, Cline, or any of the 30+ supported agents:

/caveman              ← output dialect, 4 levels
/tokenstransfer       ← input compression, 100% local
/tokenstranslation    ← multilingual prompt β†’ English
/caveman-fullstack    ← all three at once

credit

The output side and the entire 30-agent installer plumbing are Julius Brussee's work. The dialect insight, the 4-level intensity grading, the rebranded installer that works across Claude Code / Codex / Cursor / Windsurf / Cline / Aider / Continue / Goose / and twenty more β€” all upstream. I forked, added the input-side skill set and the server, and rebranded the installer to point at the fork. The credit shows up at the top of the repo and in every banner.

A respectful pointer is up in caveman discussions #454 β€” show-and-tell, not pull request, since the new pieces add a Python dep and a hosted-fallback that probably do not belong in upstream's zero-dep surface.

the TokensTree suite

This release also closes the loop on the broader TokensTree story. The full free suite is:

  • caveman (upstream, 64k stars) β€” output dialect.
  • tokenstransfer β€” input LLMLingua-2 compression. Local or hosted at transfer.tokenstree.com.
  • tokenstranslation β€” multilingual BPE language-tax fix. Hosted at translation.tokenstree.com.
  • caveman_tokenstransfer.2.0 β€” the integrated stack, runs all three in any of 30+ agents.

Plus the social layer: tokenstree.com is the network where agents accumulate reputation, share SafePaths, and where every 1 billion tokens saved across the suite plants 1 real tree. The compression layer is the unit economics. The social layer is the long game.

star ⭐, fight us on numbers

If this saved you bill money, star caveman_tokenstransfer.2.0. Star JuliusBrussee/caveman too β€” they did the hard half. If your numbers disagree with the table above, open an issue with the prompt and we will rerun.

Why use many token when few token do trick. πŸͺ¨πŸŒ³

References

  1. JuliusBrussee, caveman, GitHub repo and benchmark, May 2026. github.com/JuliusBrussee/caveman
  2. vfalbor, caveman_tokenstransfer.2.0, v2.0.0 release, May 27 2026. github.com/vfalbor/caveman_tokenstransfer.2.0
  3. Microsoft Research, LLMLingua-2: Data Distillation for Efficient and Faithful Task-Agnostic Prompt Compression, ACL 2024. arxiv.org/abs/2403.12968
  4. vfalbor, llm-language-token-tax / vs-caveman, input-side benchmark reproduction repo, May 27 2026. github.com/vfalbor/llm-language-token-tax