Simon Willison’s Weblog

Subscribe

Items tagged scaling in 2021

Filters: Year: 2021 × scaling × Sorted by date


Transactionally Staged Job Drains in Postgres. Any time I see people argue that relational databases shouldn’t be used to implement job queues I think of this post by Brandur from 2017. If you write to a queue before committing a transaction you run the risk of a queue consumer trying to read from the database before the new row becomes visible. If you write to the queue after the transaction there’s a risk an error might result in your message never being written. So: write to a relational staging table as part of the transaction, then have a separate process read from that table and write to the queue. # 18th December 2021, 1:34 am

Centrifuge: a reliable system for delivering billions of events per day (via) From 2018, a write-up from Segment explaining how they solved the problem of delivering webhooks from thousands of different producers to hundreds of potentially unreliable endpoints. They started with Kafka and ended up on a custom system written in Go against RDS MySQL that was specifically tuned to their write-heavy requirements. # 6th December 2021, 1:41 am

How Discord Stores Billions of Messages (via) Fascinating article from 2017 describing how Discord migrated their primary message store to Cassandra (from MongoDB, but I could easily see them making the same decision if they had started with PostgreSQL or MySQL). The trick with scalable NoSQL databases like Cassandra is that you need to have a very deep understanding of the kinds of queries you will need to answer—and Discord had exactly that. In the article they talk about their desire to eventually migrate to Scylla (a compatible Cassandra alternative written in C++)—in the Hacker News comments they confirm that in 2021 they are using Scylla for a few things but they still have their core messages in Cassandra. # 24th August 2021, 9:31 pm

Notes on streaming large API responses

I started a Twitter conversation last week about API endpoints that stream large amounts of data as an alternative to APIs that return 100 results at a time and require clients to paginate through all of the pages in order to retrieve all of the data:

[... 1692 words]

Multi-region PostgreSQL on Fly (via) Really interesting piece of architectural design from Fly here. Fly can run your application (as a Docker container run using Firecracker) in multiple regions around the world, and they’ve now quietly added PostgreSQL multi-region support. The way it works is that all-but-one region can have a read-only replica, and requests sent to application servers can perform read-only queries against their local region’s replica. If a request needs to execute a SQL update your application code can return a “fly-replay: region=scl” HTTP header and the Fly CDN will transparently replay the request against the region containing the leader database. This also means you can implement tricks like setting a 10s expiring cookie every time the user performs a write, such that their requests in the next 10s will go straight to the leader and avoid them experiencing any replication lag that hasn’t caught up with their latest update. # 17th June 2021, 6:39 pm

Why I Built Litestream. Litestream is a really exciting new piece of technology by Ben Johnson, who previously built BoltDB, the key-value store written in Go that is used by etcd. It adds replication to SQLite by running a process that converts the SQLite WAL log into a stream that can be saved to another folder or pushed to S3. The S3 option is particularly exciting—Ben estimates that keeping a full point-in-time recovery log of a high write SQLite database should cost in the order of a few dollars a month. I think this could greatly expand the set of use-cases for which SQLite is sensible choice. # 11th February 2021, 7:25 pm