Simon Willison’s Weblog

Subscribe
Draft: This is a draft post. Please do not share this URL with anyone else.

Qwen3-4B Instruct and Thinking. Yet another interesting model from Qwen—these are tiny compared to their other recent releases (just 4B parameters, 7.5GB on Hugging Face and even smaller when quantized) but with a 262,144 context length, which Qwen suggest is essential for all of those thinking tokens.

The new model somehow beats the significantly larger Qwen3-30B-A3B Thinking on the AIME25 and HMMT25 benchmarks, according to Qwen’s self-reported scores.

The easiest way to try it on a Mac is via LM Studio, who already have their own MLX quantized versions out in

Monthly briefing

Sponsor me for $10/month and get a curated email digest of the month's most important LLM developments.

Pay me to send you less!

Sponsor & subscribe