The most surprising part of DeepSeek-R1 is that it only takes ~800k samples of 'good' RL reasoning to convert other models into RL-reasoners. Now that DeepSeek-R1 is available people will be able to refine samples out of it to convert any other model into an RL reasoner.
Recent articles
- Nano Banana Pro aka gemini-3-pro-image-preview is the best available image generation model - 20th November 2025
- How I automate my Substack newsletter with content from my blog - 19th November 2025
- Trying out Gemini 3 Pro with audio transcription and a new pelican benchmark - 18th November 2025