Simon Willison’s Weblog

Subscribe

r1.py script to run R1 with a min-thinking-tokens parameter (via) Fantastically creative hack by Theia Vogel. The DeepSeek R1 family of models output their chain of thought inside a <think>...</think> block. Theia found that you can intercept that closing </think> and replace it with "Wait, but" or "So" or "Hmm" and trick the model into extending its thought process, producing better solutions!

You can stop doing this after a few iterations, or you can keep on denying the </think> string and effectively force the model to "think" forever.

Theia's code here works against Hugging Face transformers but I'm confident the same approach could be ported to llama.cpp or MLX.