October 21 post-incident analysis (via) Legitimately fascinating post-mortem by GitHub. They run database masters in multiple data centers with raft for leader election... but when they had an unexpected network split between east and west coast they ended up with several seconds of write that had not been correctly replicated. Cleaning up the resulting mess took the best part of 24 hours! Distributed systems are hard.
Recent articles
- Recreating the Apollo AI adoption rate chart with GPT-5, Python and Pyodide - 9th September 2025
- GPT-5 Thinking in ChatGPT (aka Research Goblin) is shockingly good at search - 6th September 2025
- V&A East Storehouse and Operation Mincemeat in London - 27th August 2025