Claude Code's /simplify Stopped Fixing Code Yesterday
Claude Code 2.1.147 renamed /simplify to /code-review and dropped the auto-fix behavior. The new command reports bugs at chosen effort levels but no longer changes code.
16 posts
Claude Code 2.1.147 renamed /simplify to /code-review and dropped the auto-fix behavior. The new command reports bugs at chosen effort levels but no longer changes code.
Anthropic shipped MCP tunnels on May 19. Claude agents can call internal databases, ticketing systems, and on-prem APIs through one outbound connection — no inbound firewall rules required.
vLLM 0.21.0 shipped Friday with a quiet fix: thinking_token_budget was being silently ignored when MTP speculative decoding was enabled. If you serve reasoning models with spec decode, you have been paying for it.
A server-side bug in Claude Code v2.1.100+ inflates every request by roughly 20K cache_creation tokens — about 40% overhead. Pin v2.1.98 until fixed.
llama.cpp renamed the MTP flag on May 13. The old --spec-type mtp is silently ignored. If your tok/s dropped from 140 to 70 you are likely running without speculative decoding.
Eighteen months after Anthropic released MCP, the ecosystem is wide enough that picking the wrong servers slows your agent down. Here is the practical short list — what to install, what to skip, and the trap most people fall into.
The same Qwen3.6-27B that ran at 70 tokens/sec on a 4090 in January was running at 140 tokens/sec by April. Nothing changed about the model. Speculative decoding moved from research curiosity to default. Here is what it actually does.
Anthropic is splitting Claude billing on June 15 — Agent SDK and ACP usage moves to a capped credit pool ($20/$100/$200) at full API rates.
Three model releases in three weeks moved local AI from 'good enough for hobbies' to 'good enough for production'. Here's what changed and why it matters.
A practical guide to running Qwen3.6-27B on consumer hardware in 2026 — memory requirements per quant level, recommended runners, and the MTP trick that doubles your tokens per second.
Qwen3.6-27B running locally now scores within 10 points of frontier closed models on SWE-bench Verified. The benchmark table, lined up side by side.
Anthropic shipped Opus 4.7 on April 16, 2026, with a seven-point SWE-bench jump, the 1M context window now generally available with no premium, and a new task budget primitive for agent loops.
Google released Gemma 4 on April 2, 2026 — four variants from 2B to 31B, with 256K context, native vision and audio, and Apache 2.0 licensing. Here's what it's for, where it fits, and how to run it.
Anthropic released Opus 4.6 on February 5, 2026, with a 1M token context beta, agent teams, adaptive thinking, and developer effort controls — all at the same price as 4.5.
Anthropic's latest model achieves state-of-the-art results in agentic coding and brings meaningful improvements across reasoning, mathematics, and everyday tasks.
Google's Gemini 3 Pro brings generative interfaces, 1M token context, and state-of-the-art multimodal reasoning to developers and consumers alike.