Draft: This is a draft post. Please do not share this URL with anyone else.
Qwen3-4B Instruct and Thinking. Yet another interesting model from Qwen—these are tiny compared to their other recent releases (just 4B parameters, 7.5GB on Hugging Face and even smaller when quantized) but with a 262,144 context length, which Qwen suggest is essential for all of those thinking tokens.
The new model somehow beats the significantly larger Qwen3-30B-A3B Thinking on the AIME25 and HMMT25 benchmarks, according to Qwen’s self-reported scores.
The easiest way to try it on a Mac is via LM Studio, who already have their own MLX quantized versions out in
Recent articles
- OpenAI's new open weight (Apache 2) models are really good - 5th August 2025
- ChatGPT agent's user-agent - 4th August 2025
- The ChatGPT sharing dialog demonstrates how difficult it is to design privacy preferences - 3rd August 2025