#ai — Hans Christian Thjømøe

May 22, 2026

Claude Code's /simplify Stopped Fixing Code Yesterday

Claude Code 2.1.147 renamed /simplify to /code-review and dropped the auto-fix behavior. The new command reports bugs at chosen effort levels but no longer changes code.

May 21, 2026

Your Private MCP Server Is Now Claude-Reachable

Anthropic shipped MCP tunnels on May 19. Claude agents can call internal databases, ticketing systems, and on-prem APIs through one outbound connection — no inbound firewall rules required.

An hourglass counting down on a desk, sand falling through the narrow neck

May 18, 2026

Your vLLM Thinking Budget Was Doing Nothing With MTP On

vLLM 0.21.0 shipped Friday with a quiet fix: thinking_token_budget was being silently ignored when MTP speculative decoding was enabled. If you serve reasoning models with spec decode, you have been paying for it.

Time-lapse photograph of US dollar banknotes on fire

May 17, 2026

Claude Code v2.1.100+ Burns ~20K Phantom Tokens Per Request

A server-side bug in Claude Code v2.1.100+ inflates every request by roughly 20K cache_creation tokens — about 40% overhead. Pin v2.1.98 until fixed.

Close-up portrait of a llama looking directly at the camera

May 16, 2026

Your Local Qwen3.6 Throughput Probably Just Halved (and How to Fix It)

llama.cpp renamed the MTP flag on May 13. The old --spec-type mtp is silently ignored. If your tok/s dropped from 140 to 70 you are likely running without speculative decoding.

Abstract 3D illustration of connected blocks arranged on a dark background

May 16, 2026

MCP Server Roundup: Which Are Actually Worth Adding to Your Setup in May 2026

Eighteen months after Anthropic released MCP, the ecosystem is wide enough that picking the wrong servers slows your agent down. Here is the practical short list — what to install, what to skip, and the trap most people fall into.

Motion blur of light streaks on a dark background suggesting high speed

May 16, 2026

Speculative Decoding Explained: Why Your Local Model Got 2× Faster in 2026

The same Qwen3.6-27B that ran at 70 tokens/sec on a 4090 in January was running at 140 tokens/sec by April. Nothing changed about the model. Speculative decoding moved from research curiosity to default. Here is what it actually does.

May 16, 2026

Third-Party Claude Agents Lose the Subscription Subsidy June 15

Anthropic is splitting Claude billing on June 15 — Agent SDK and ACP usage moves to a capped credit pool ($20/$100/$200) at full API rates.

Close-up of server rack components with cables and indicator lights

May 15, 2026

The Local AI Inflection Point: May 2026

Three model releases in three weeks moved local AI from 'good enough for hobbies' to 'good enough for production'. Here's what changed and why it matters.

Close-up of computer hardware showing a GPU and motherboard components

May 11, 2026

Running Qwen3.6-27B Locally: Hardware, Quantization, and What Actually Works

A practical guide to running Qwen3.6-27B on consumer hardware in 2026 — memory requirements per quant level, recommended runners, and the MTP trick that doubles your tokens per second.

Laptop screen displaying performance analytics graphs and dashboards

May 8, 2026

A 27B Model on a Single GPU Is 10 Points Off Claude Opus 4.7

Qwen3.6-27B running locally now scores within 10 points of frontier closed models on SWE-bench Verified. The benchmark table, lined up side by side.

Lines of source code displayed on a black screen with syntax highlighting

April 17, 2026

Claude Opus 4.7: 87.6% on SWE-bench and 1M Context at Standard Pricing

Anthropic shipped Opus 4.7 on April 16, 2026, with a seven-point SWE-bench jump, the 1M context window now generally available with no premium, and a new task budget primitive for agent loops.

Rainbow prism light spectrum spread across a dark surface

April 5, 2026

Gemma 4: Google's Open Model Family Goes Multimodal

Google released Gemma 4 on April 2, 2026 — four variants from 2B to 31B, with 256K context, native vision and audio, and Apache 2.0 licensing. Here's what it's for, where it fits, and how to run it.

Abstract digital sphere formed of glowing dots and connecting lines on a dark background

February 6, 2026

Claude Opus 4.6: A Million-Token Context and a New Agent Team Model

Anthropic released Opus 4.6 on February 5, 2026, with a 1M token context beta, agent teams, adaptive thinking, and developer effort controls — all at the same price as 4.5.

A friendly orange humanoid robot with a digital screen for a face

November 25, 2025

Claude Opus 4.5: Anthropic's New Flagship Model Sets the Bar for AI Coding

Anthropic's latest model achieves state-of-the-art results in agentic coding and brings meaningful improvements across reasoning, mathematics, and everyday tasks.

Glowing blue and purple lines forming an abstract network pattern on a dark background

November 25, 2025

Google Gemini 3 Pro: The New Leader in Multimodal AI

Google's Gemini 3 Pro brings generative interfaces, 1M token context, and state-of-the-art multimodal reasoning to developers and consumers alike.

← All posts