In 2025, Reinforcement Learning from Verifiable Rewards (RLVR) emerged as the de facto new major stage to add to this mix. By training LLMs against automatically verifiable rewards across a number of environments (e.g. think math/code puzzles), the LLMs spontaneously develop strategies that look like "reasoning" to humans - they learn to break down problem solving into intermediate calculations and they learn a number of problem solving strategies for going back and forth to figure things out (see DeepSeek R1 paper for examples).
— Andrej Karpathy, 2025 LLM Year in Review
Recent articles
- Your job is to deliver code you have proven to work - 18th December 2025
- Gemini 3 Flash - 17th December 2025
- I ported JustHTML from Python to JavaScript with Codex CLI and GPT-5.2 in 4.5 hours - 15th December 2025