Speculative Decoding Explained: Why Your Local Model Got 2× Faster in 2026
The same Qwen3.6-27B that ran at 70 tokens/sec on a 4090 in January was running at 140 tokens/sec by April. Nothing changed about the model. Speculative decoding moved from research curiosity to default. Here is what it actually does.