The most surprising part of DeepSeek-R1 is that it only takes ~800k samples of 'good' RL reasoning to convert other models into RL-reasoners. Now that DeepSeek-R1 is available people will be able to refine samples out of it to convert any other model into an RL reasoner.
Recent articles
- Live blog: Claude 4 launch at Code with Claude - 22nd May 2025
- I really don't like ChatGPT's new memory dossier - 21st May 2025
- Building software on top of Large Language Models - 15th May 2025