r1.py script to run R1 with a min-thinking-tokens parameter (via) Fantastically creative hack by Theia Vogel. The DeepSeek R1 family of models output their chain of thought inside a <think>...</think>
block. Theia found that you can intercept that closing </think>
and replace it with "Wait, but" or "So" or "Hmm" and trick the model into extending its thought process, producing better solutions!
You can stop doing this after a few iterations, or you can keep on denying the </think>
string and effectively force the model to "think" forever.
Theia's code here works against Hugging Face transformers but I'm confident the same approach could be ported to llama.cpp or MLX.
Recent articles
- Claude can write complete Datasette plugins now - 8th October 2025
- Vibe engineering - 7th October 2025
- OpenAI DevDay 2025 live blog - 6th October 2025