Simon Willison’s Weblog

Items tagged scaling in 2020

Filters: Year: 2020 × scaling ×

Scaling Datastores at Slack with Vitess (via) Slack spent three years migrating 99% of their MySQL query load to run against Vitess, the open source MySQL sharding system originally built by YouTube. “Today, we serve 2.3 million QPS at peak. 2M of those queries are reads and 300K are writes. Our median query latency is 2 ms, and our p99 query latency is 11 ms.” # 1st December 2020, 9:30 pm

Reducing search indexing latency to one second. Really detailed dive into the nuts and bolts of Twitter’s latest iteration of search indexing technology, including a great explanation of skip lists. # 26th June 2020, 5:06 pm

PostGraphile: Production Considerations. PostGraphile is a tool for building a GraphQL API on top of an existing PostgreSQL schema. Their “production considerations” documentation is particularly interesting because it directly addresses some of my biggest worries about GraphQL: the potential for someone to craft an expensive query that ties up server resources. PostGraphile suggests a number of techniques for avoiding this, including a statement timeout, a query allowlist, pagination caps and (in their “pro” version) a cost limit that uses a calculated cost score for the query. # 27th March 2020, 1:22 am

Millions of tiny databases. Fascinating, detailed review of a paper that describes Amazon’s Physalia, a distributed configuration store designed to provide extremely high availability coordination for Elastic Block Store replication. My eyebrows raised at “Physalia is designed to offer consistency and high-availability, even under network partitions.” since that’s such a blatant violation of CAP theorem, but it later justifies it like so: “One desirable property therefore, is that in the event of a partition, a client’s Physalia database will be on the same side of the partition as the client. Clever placement of cells across nodes can maximise the chances of this.” # 5th March 2020, 4:37 am

Serving 100µs reads with 100% availability (via) Fascinating use-case for SQLite from Segment: they needed a massively replicated configuration database across all of their instances that process streaming data. They chose to make the configuration available as a ~50GB SQLite database file mirrored to every instance, meaning lookups against that data could complete in microseconds. Changes to the central MySQL configuration store are pulled every 2-3 seconds, resulting in a trade-off of consistency for availability which fits their use-case just fine. # 10th January 2020, 5:15 am