The most surprising part of DeepSeek-R1 is that it only takes ~800k samples of 'good' RL reasoning to convert other models into RL-reasoners. Now that DeepSeek-R1 is available people will be able to refine samples out of it to convert any other model into an RL reasoner.
Recent articles
- Useful patterns for building HTML tools - 10th December 2025
- Under the hood of Canada Spends with Brendan Samek - 9th December 2025
- Highlights from my appearance on the Data Renegades podcast with CL Kao and Dori Wilson - 26th November 2025