If you have updated Claude Code in the last month, you are likely paying for roughly 20,000 extra cache_creation_input_tokens on every request you send. Same prompt, same project, same account — the only thing that changed is the version string in the User-Agent header, and the server bills you ~40% more for it.

GitHub issue #46917 (open since April 12, labeled bug / area:cost / has repro) documents it cleanly. The user Adrian-Mteam proxied Claude Code’s HTTP traffic across four versions in single-call mode and measured the same payload against billed token counts.

The Numbers

VersionBytes sentcache_creation_input_tokensDelta vs v2.1.98
v2.1.98169,51449,726baseline
v2.1.100168,53669,922+20,196 (+40%)
v2.1.101171,903~72,000+22,274

v2.1.100 sends 978 fewer bytes than v2.1.98 and gets billed 20,196 more tokens. The inflation is 100% server-side. It happens after the request is received, and it correlates with the version string in the User-Agent header. Spoofing the User-Agent back to claude-cli/2.1.98 returns the token count to baseline — which is the smoking gun that this is a routing decision on Anthropic’s side, not anything the client sends differently.

The extra tokens are classified as cache_creation_input_tokens. That matters: they are not just a billing-ledger artifact. They enter the model’s context window. Whatever those 20K tokens are, the model is seeing them.

Why This Matters Right Now

Two reasons.

First, the bug has been open since April 12, has a clean reproduction, and is still present in the 2.1.x line as of mid-May. There is no published root cause analysis from Anthropic and no fix shipped. The community guess is that prompt caching is silently broken in newer builds, forcing full re-processing of conversation history every turn — but that is inference from behaviour, not confirmation.

Second, the metering split lands June 15. Right now those phantom tokens are eating your 5-hour subscription cap. After June 15, the same inflation hitting Agent SDK / claude -p / GitHub Actions usage draws from the new capped credit pool at API list rates. A 40% overhead on cache_creation tokens against $3/MTok input billing adds real money fast on any agentic loop that runs more than a handful of times per day.

The Workaround

The cheap fix is to pin an older version:

npx claude-code@2.1.98

The same effect is achievable by spoofing the User-Agent on a current version:

export ANTHROPIC_CUSTOM_HEADERS='User-Agent: claude-cli/2.1.98 (external, sdk-cli)'

Either makes the routing decision pick the pre-bug billing path. The second has the obvious downside of being an undocumented header trick that could break the next time Anthropic touches the User-Agent parsing.

The middle option — and the one I would pick for any production setup — is to pin a known-good version in the install command and add a smoke test that compares token counts against a fixed prompt across versions. If you have ever shipped a service that quietly regressed on cost because a dependency got more chatty, this is the same shape of problem.

What I Would Not Do

Stay on whatever version is current and assume the bug will be fixed before June 15. Five weeks of fix-window with no public timeline is not enough certainty to budget against. If your agentic workload depends on Claude Code on a metered plan, the floor on your usage between now and the metering cutover is 40% higher than it should be, and you have no audit trail to push back with.

The pattern is the same as the llama.cpp MTP flag rename from last week: small change in the runner, large change in the bill, no error message to alert you. The model file is stable. The infrastructure around it is moving fast enough that you need to be watching the meter, not just the output.

Sources: