Simon Willison’s Weblog

Items tagged scaling, databases

Filters: scaling × databases ×

github/gh-ost: Thoughts on Foreign Keys? The biggest challenge I’ve seen with foreign key constraints at scale (at least with MySQL) is how they conflict with online schema migrations using tools like pt-online-schema-change or GitHub’s gh-ost. This is a good explanation of the issue by Shlomi Noach, one of the gh-ost maintainers. # 19th June 2018, 4:12 pm

Why does Django still not have support for multiple joins?

I don’t fully understand the question, but if you’re talking about doing a single join across multiple tables the Django ORM handles that just fine. Let’s say you want to get every BlogEntry written by a User who belongs to the Group with the name “admins”:

[... 67 words]

What is the largest production deployment of Redis?

I’d guess Twitter or Craigslist.

[... 19 words]

What is the largest production deployment of CouchDB for online use?

The BBC have a pretty big CouchDB cluster, which they use mostly as a replicated key-value store. It’s used by their new identity platform which includes customisation features for iPlayer.

[... 47 words]

Django | Multiple Databases. Russell just checked in the final patch developed from Alex Gaynor’s Summer of Code project to add multiple database support to Django. I’d link to the 21,000 line changeset but it crashed our Trac, so here’s the documentation instead. # 22nd December 2009, 5:22 pm

PostgreSQL 8.5 alpha 2 is out. “P.S. If you’re wondering about Hot Standby and Synchronous Replication, they’re still under heavy development and still (at this point) expected to be in 8.5.”—Hot Standby is PostgreSQL-speak for MySQL-style master/slave replication for scaling your reads. # 28th October 2009, 9:02 am

Keyspace. Yet Another Key-Value Store—this one focuses on high availability, with one server in the cluster serving as master (and handling all writes), and the paxos algorithm handling replication and ensuring a new master can be elected should the existing master become unavailable. Clients can chose to make dirty reads against replicated servers or clean reads by talking directly to the master. Underlying storage is BerkeleyDB, and the authors claim 100,000 writes/second. Released under the AGPL. # 16th July 2009, 10:30 am

Introducing Digg’s IDDB Infrastructure. IDDB is Digg’s new infrastructure component for sharding data across multiple databases, with support for both MySQL and memcachedb. “The DiggBar and URL minifying service is powered by a 16 machine IDDB cluster, which includes 8 write masters in the index and 8 MySQL storage nodes.” # 3rd April 2009, 8:42 pm

What happened to Hot Standby? Hot Standby (the ability to have read-only replication slaves) has been dropped from PostgreSQL 8.4 and is now scheduled for 8.5. “Making hard decisions to postpone features which aren’t quite ready is how PostgreSQL makes sure that our DBMS is ”bulletproof“ and that we release close to on-time every year”. # 8th March 2009, 9:28 am

Database Sharding at Netlog, with MySQL and PHP. Detailed MySQL sharding case study from Netlog, who serve five billion page requests a month using thousands of shards across more than 80 database servers. # 2nd March 2009, 10:22 am

How FriendFeed uses MySQL to store schema-less data. The pain of altering/ adding indexes to tables with 250 million rows was killing their ability to try out new features, so they’ve moved to storing pickled Python objects and manually creating the indexes they need as denormalised two column tables. These can be created and dropped much more easily, and are continually populated by an off-line index building process. # 27th February 2009, 2:33 pm

Spock Proxy. A MySQL Proxy fork (no Lua) that concentrates solely on sharding, by parsing incoming SQL statements and redirecting them across multiple databases. There are some limitations on the SQL that can be handled (no nested queries, joins across a maximum of two tables) but generally it looks pretty impressive. # 11th December 2008, 9:49 am

What You Need To Know About Amazon SimpleDB. Amazon have finally launched the database component of their web service suite. It fits a bunch of current trends: key/value pairs, schemaless, built on top of Erlang. “Eventual consistency” is an interesting characteristic. # 14th December 2007, 11:21 am

SELECT * FROM everything, or why databases are awesome. I’m beginning to think that for scalable applications the thinner your ORM is the better—if you even use one at all. # 22nd June 2007, 12:40 am