31st October 2018 - Link Blog
Reinforcement Learning with Prediction-Based Rewards (via) Fascinating result: by teaching a reinforcement learning agent that plays video games to optimize for "unfamiliar states" - states where it cannot predict what will happen next - the agent does a much better job of playing some games.
... for the first time exceeds average human performance on Montezuma’s Revenge. RND achieves state-of-the-art performance, periodically finds all 24 rooms and solves the first level without using demonstrations or having access to the underlying state of the game.
Recent articles
- GPT-5.4 mini and GPT-5.4 nano, which can describe 76,000 photos for $52 - 17th March 2026
- My fireside chat about agentic engineering at the Pragmatic Summit - 14th March 2026
- Perhaps not Boring Technology after all - 9th March 2026