Qwen3-4B Instruct and Thinking

Draft: This is a draft post. Please do not share this URL with anyone else.

Qwen3-4B Instruct and Thinking. Yet another interesting model from Qwen—these are tiny compared to their other recent releases (just 4B parameters, 7.5GB on Hugging Face and even smaller when quantized) but with a 262,144 context length, which Qwen suggest is essential for all of those thinking tokens.

The new model somehow beats the significantly larger Qwen3-30B-A3B Thinking on the AIME25 and HMMT25 benchmarks, according to Qwen’s self-reported scores.

The easiest way to try it on a Mac is via LM Studio, who already have their own MLX quantized versions out in

Posted 6th August 2025 at 4:40 pm

Simon Willison’s Weblog

Recent articles

Monthly briefing