Simon Willison’s Weblog

Items tagged replication in Oct

Filters: Month: Oct × replication ×


Scaling the GitLab database. Lots of interesting details on how GitLab have worked to scale their PostgreSQL setup. They’ve avoided sharding so far, instead opting for database pooling with pgbouncer and read-only replicas using hot standbys. I like the way they deal with replica lag—they store the current WAL position in a redis key for the user every time there’s a write, then use pg_last_xlog_replay_location() on the various replicas to check and see if they have caught up next time the user makes a request that needs to read some data. # 30th October 2017, 8:53 pm

PostgreSQL 10 Released. Highlights include major improvements to parallelized queries, quorum commit for synchronous replication (sounds reminiscent of Cassandra) and logical replication, which allows modifications to specific tables to be replicated to different clusters. They’re also changing their versioning scheme to Major.Minor, so the next minor release will be 10.1 and the next major release will be 11. # 5th October 2017, 3:31 pm

PostgreSQL 8.5 alpha 2 is out. “P.S. If you’re wondering about Hot Standby and Synchronous Replication, they’re still under heavy development and still (at this point) expected to be in 8.5.”—Hot Standby is PostgreSQL-speak for MySQL-style master/slave replication for scaling your reads. # 28th October 2009, 9:02 am

How We Made GitHub Fast. Detailed overview of the new GitHub architecture. It’s a lot more complicated than I would have expected—lots of moving parts are involved in ensuring they can scale horizontally when they need to. Interesting components include nginx, Unicorn, Rails, DRBD, HAProxy, Redis, Erlang, memcached, SSH, git and a bunch of interesting new open source projects produced by the GitHub team such as BERT/Ernie and ProxyMachine. # 21st October 2009, 9:14 pm

When I worked at Amazon.com we had a deeply-ingrained hatred for all of the SQL databases in our systems. Now, we knew perfectly well how to scale them through partitioning and other means. But making them highly available was another matter. Replication and failover give you basic reliability, but it’s very limited and inflexible compared to a real distributed datastore with master-master replication, partition tolerance, consensus and/or eventual consistency, or other availability-oriented features.

Matt Brubeck # 4th October 2009, 9:50 am

Minimal nginx conf to split get/post requests. Interesting idea for master-slave replication balancing where GET v.s. POST is load-balanced by nginx, presumably to different backend servers that are configured to talk to either a slave or a master. This won’t deal very will with replication lag though—you really want a user’s session to be bound to the master server for the next few GET requests after data is modified to ensure they see the effects of their updates. UPDATE: Amit fixed my complaint with a neat hack based around a cookie with a max age of 10 seconds. # 14th October 2008, 4:33 pm