Simon Willison’s Weblog

Blogmarks tagged github, scaling in 2018

Filters: Type: blogmark × Year: 2018 × github × scaling ×

October 21 post-incident analysis (via) Legitimately fascinating post-mortem by GitHub. They run database masters in multiple data centers with raft for leader election... but when they had an unexpected network split between east and west coast they ended up with several seconds of write that had not been correctly replicated. Cleaning up the resulting mess took the best part of 24 hours! Distributed systems are hard. # 31st October 2018, 8:50 pm

MySQL High Availability at GitHub. Cutting edge high availability case-study: GitHub are now using Consul, raft, their own custom load balancer and their own custom orchestrator replication management toolkit to achieve cross-datacenter failover for their MySQL master/replica clusters. # 20th June 2018, 11:05 pm