Simon Willison’s Weblog

22 items tagged “replication”

Scaling Postgres with Read Replicas & Using WAL to Counter Stale Reads (via) The problem with sending writes to the primary and balancing reads across replicas is dealing with replica lag—what if you write to the primary and then read from a replica that hasn’t had the new state applied to it yet? Brandur Leach dives deep into an elegant solution using PostgreSQL’s LSN (log sequence numbers) accesesed using pg_last_wal_replay_lsn(). An observer process continuously polls the replicas for their most recently applied LSN and stores them in a table. A column in the Users table then records the min_lsn valid for that user, updating it to the pg_current_wal_lsn() of the primary whenever that user makes a write. Combining the two allows the application to randomly select a replica that is up-to-date for the purposes of a specific user any time it needs to make a read. # 18th November 2017, 6:42 pm

PostgreSQL 10 Released. Highlights include major improvements to parallelized queries, quorum commit for synchronous replication (sounds reminiscent of Cassandra) and logical replication, which allows modifications to specific tables to be replicated to different clusters. They’re also changing their versioning scheme to Major.Minor, so the next minor release will be 10.1 and the next major release will be 11. # 5th October 2017, 3:31 pm

PostgreSQL 9.0 Beta 1 Now Available. With asynchronous streaming replication. # 5th May 2010, 2:36 pm

PostgreSQL 8.5alpha3 now available. “Hot Standby, allowing read-only connections during recovery, provides a built-in master-slave replication solution.” Woohoo! # 23rd December 2009, 9:57 am

Simple CouchDB multi-master clustering via Nginx. An impressive combination. CouchDB can be easily set up in a multi-master configuration, where writes to one master are replicated to the other and vice versa. This makes setting up a reliable CouchDB cluster is as simple as putting two such servers behind a single nginx proxy. # 19th November 2009, 4:37 pm

PostgreSQL 8.5 alpha 2 is out. “P.S. If you’re wondering about Hot Standby and Synchronous Replication, they’re still under heavy development and still (at this point) expected to be in 8.5.”—Hot Standby is PostgreSQL-speak for MySQL-style master/slave replication for scaling your reads. # 28th October 2009, 9:02 am

How We Made GitHub Fast. Detailed overview of the new GitHub architecture. It’s a lot more complicated than I would have expected—lots of moving parts are involved in ensuring they can scale horizontally when they need to. Interesting components include nginx, Unicorn, Rails, DRBD, HAProxy, Redis, Erlang, memcached, SSH, git and a bunch of interesting new open source projects produced by the GitHub team such as BERT/Ernie and ProxyMachine. # 21st October 2009, 9:14 pm

When I worked at Amazon.com we had a deeply-ingrained hatred for all of the SQL databases in our systems. Now, we knew perfectly well how to scale them through partitioning and other means. But making them highly available was another matter. Replication and failover give you basic reliability, but it’s very limited and inflexible compared to a real distributed datastore with master-master replication, partition tolerance, consensus and/or eventual consistency, or other availability-oriented features.

Matt Brubeck # 4th October 2009, 9:50 am

Londiste Tutorial. Master/slave replication for PostgreSQL, developed and used by Skype. # 9th September 2009, 11:06 pm

Keyspace. Yet Another Key-Value Store—this one focuses on high availability, with one server in the cluster serving as master (and handling all writes), and the paxos algorithm handling replication and ensuring a new master can be elected should the existing master become unavailable. Clients can chose to make dirty reads against replicated servers or clean reads by talking directly to the master. Underlying storage is BerkeleyDB, and the authors claim 100,000 writes/second. Released under the AGPL. # 16th July 2009, 10:30 am

PostgreSQL Development Priorities. The top two for 8.4 are “Simple built-in replication” and “Upgrade-in-place”, Josh Berkus is seeking feedback on priorities for future work on 8.5. # 28th May 2009, 8:08 pm

redis (via) An in-memory scalable key/value store but with an important difference: this one lets you perform list and set operations against keys, opening up a whole new set of possibilities for application development. It’s very young but already supports persistence to disk and master-slave replication. # 15th March 2009, 1:32 pm

[Drizzle] won’t be a get-out-of-jail-free card for very write-heavy applications but I bet it will do wonders for heavily replicated, heavily federated, read-heavy architectures (you know, normal stuff).

Richard Crowley # 8th March 2009, 6:05 pm

What happened to Hot Standby? Hot Standby (the ability to have read-only replication slaves) has been dropped from PostgreSQL 8.4 and is now scheduled for 8.5. “Making hard decisions to postpone features which aren’t quite ready is how PostgreSQL makes sure that our DBMS is ”bulletproof“ and that we release close to on-time every year”. # 8th March 2009, 9:28 am

Tokyo Tyrant Tutorial. Buried at the bottom of the Tokyo Tyrant protocol documentation, this is the best resource I’ve seen yet for getting up and running with the database server (including setting up replication). # 14th February 2009, 11:29 am

MemcacheDB. A server that speaks the memcache protocol but uses Berkeley DB for reliable persistent storage. Speedy: 20,000 writes/second and 60,000+ reads/second. Includes a full replication mechanism (with custom memcache protocol commands) based on Berkeley DB’s. # 5th January 2009, 12:37 pm

Minimal nginx conf to split get/post requests. Interesting idea for master-slave replication balancing where GET v.s. POST is load-balanced by nginx, presumably to different backend servers that are configured to talk to either a slave or a master. This won’t deal very will with replication lag though—you really want a user’s session to be bound to the master server for the next few GET requests after data is modified to ensure they see the effects of their updates. UPDATE: Amit fixed my complaint with a neat hack based around a cookie with a max age of 10 seconds. # 14th October 2008, 4:33 pm

Facebook engineering notes on Scaling Out. Jason Sobel explains a couple of tricks Facebook use to deal with consistency between their California and Virginia data centres. The first is to hijack the MySQL replication stream to include information about memcached records to invalidate; the second is to use Layer 7 load balancers which inspect a “last modification time” cookie and send users to the masters in California if they have updated their profile in the past 20 seconds. # 20th August 2008, 11:51 pm

Historically the project policy has been to avoid putting replication into core PostgreSQL, so as to leave room for development of competing solutions [...] However, it is becoming clear that this policy is hindering acceptance of PostgreSQL to too great an extent, compared to the benefit it offers to the add-on replication projects. Users who might consider PostgreSQL are choosing other database systems because our existing replication options are too complex to install and use for simple cases.

Tom Lane # 7th July 2008, 2:08 pm

mysql_cluster (via) My Russian isn’t all that good, but this looks like a neat way of getting Django to talk to a master/slave setup, written by Ivan Sagalaev. UPDATE: English docs are linked from the comments. # 21st March 2008, 8:45 am

Semi-synchronous replication for MySQL (via) Google’s patch for MySQL which enables more reliable master-slave replication (a transaction isn’t committed until at least one slave has replicated the data). # 5th June 2007, 10:07 pm