Reinforcement Learning with Prediction-Based Rewards (via) Fascinating result: by teaching a reinforcement learning agent that plays video games to optimize for “unfamiliar states”—states where it cannot predict what will happen next—the agent does a much better job of playing some games. “... for the first time exceeds average human performance on Montezuma’s Revenge. RND achieves state-of-the-art performance, periodically finds all 24 rooms and solves the first level without using demonstrations or having access to the underlying state of the game.”
Recent articles
- Your job is to deliver code you have proven to work - 18th December 2025
- Gemini 3 Flash - 17th December 2025
- I ported JustHTML from Python to JavaScript with Codex CLI and GPT-5.2 in 4.5 hours - 15th December 2025