Quantization matters (via) What impact does quantization have on the performance of an LLM? been wondering about this for quite a while, now here are numbers from Paul Gauthier.
He ran differently quantized versions of Qwen 2.5 32B Instruct through his Aider code editing benchmark and saw a range of scores.
The original released weights (BF16) scored highest at 71.4%, with Ollama's qwen2.5-coder:32b-instruct-fp16 (a 66GB download) achieving the same score.
The quantized Ollama qwen2.5-coder:32b-instruct-q4_K_M (a 20GB download) saw a massive drop in quality, scoring just 53.4% on the same benchmark.
Recent articles
- Introducing Showboat and Rodney, so agents can demo what they’ve built - 10th February 2026
- How StrongDM's AI team build serious software without even looking at the code - 7th February 2026
- Running Pydantic's Monty Rust sandboxed Python subset in WebAssembly - 6th February 2026