r1.py script to run R1 with a min-thinking-tokens parameter (via) Fantastically creative hack by Theia Vogel. The DeepSeek R1 family of models output their chain of thought inside a <think>...</think>
block. Theia found that you can intercept that closing </think>
and replace it with "Wait, but" or "So" or "Hmm" and trick the model into extending its thought process, producing better solutions!
You can stop doing this after a few iterations, or you can keep on denying the </think>
string and effectively force the model to "think" forever.
Theia's code here works against Hugging Face transformers but I'm confident the same approach could be ported to llama.cpp or MLX.
Recent articles
- Watching o3 guess a photo's location is surreal, dystopian and wildly entertaining - 26th April 2025
- Exploring Promptfoo via Dave Guarino's SNAP evals - 24th April 2025
- AI assisted search-based research actually works now - 21st April 2025