#llama-cpp — Hans Christian Thjømøe

Motion blur of light streaks on a dark background suggesting high speed

May 16, 2026

Speculative Decoding Explained: Why Your Local Model Got 2× Faster in 2026

The same Qwen3.6-27B that ran at 70 tokens/sec on a 4090 in January was running at 140 tokens/sec by April. Nothing changed about the model. Speculative decoding moved from research curiosity to default. Here is what it actually does.