Quantization matters (via) What impact does quantization have on the performance of an LLM? been wondering about this for quite a while, now here are numbers from Paul Gauthier.
He ran differently quantized versions of Qwen 2.5 32B Instruct through his Aider code editing benchmark and saw a range of scores.
The original released weights (BF16) scored highest at 71.4%, with Ollama's qwen2.5-coder:32b-instruct-fp16 (a 66GB download) achieving the same score.
The quantized Ollama qwen2.5-coder:32b-instruct-q4_K_M (a 20GB download) saw a massive drop in quality, scoring just 53.4% on the same benchmark.
Recent articles
- Video: Building a tool to copy-paste share terminal sessions using Claude Code for web - 23rd October 2025
- Dane Stuckey (OpenAI CISO) on prompt injection risks for ChatGPT Atlas - 22nd October 2025
- Living dangerously with Claude - 22nd October 2025