r1.py script to run R1 with a min-thinking-tokens parameter (via) Fantastically creative hack by Theia Vogel. The DeepSeek R1 family of models output their chain of thought inside a <think>...</think> block. Theia found that you can intercept that closing </think> and replace it with "Wait, but" or "So" or "Hmm" and trick the model into extending its thought process, producing better solutions!
You can stop doing this after a few iterations, or you can keep on denying the </think> string and effectively force the model to "think" forever.
Theia's code here works against Hugging Face transformers but I'm confident the same approach could be ported to llama.cpp or MLX.
Recent articles
- New prompt injection papers: Agents Rule of Two and The Attacker Moves Second - 2nd November 2025
- Hacking the WiFi-enabled color screen GitHub Universe conference badge - 28th October 2025
- Video: Building a tool to copy-paste share terminal sessions using Claude Code for web - 23rd October 2025