Your 12GB MTP Throughput Just Jumped 23%
A community llama.cpp fork squeezes 110 tok/s out of Qwen3.6-35B-A3B MTP on a 12GB card. Here are the exact flags and the VRAM trick to make it fit.
1 post
A community llama.cpp fork squeezes 110 tok/s out of Qwen3.6-35B-A3B MTP on a 12GB card. Here are the exact flags and the VRAM trick to make it fit.