Simon Willison’s Weblog


7 items tagged “dogpile”


How Discord Stores Trillions of Messages (via) This is a really interesting case-study. Discord migrated from MongoDB to Cassandra back in 2016 to handle billions of messages. Today they’re handling trillions, and they completed a migration from Cassandra to Scylla, a Cassandra-like data store written in C++ (as opposed to Cassandra’s Java) to help avoid problems like GC pauses. In addition to being a really good scaling war story this has some interesting details about their increased usage of Rust. As a fan of request coalescing (which I’ve previously referred to as dogpile prevention) I particularly liked this bit:

“Our data services sit between the API and our ScyllaDB clusters. They contain roughly one gRPC endpoint per database query and intentionally contain no business logic. The big feature our data services provide is request coalescing. If multiple users are requesting the same row at the same time, we’ll only query the database once. The first user that makes a request causes a worker task to spin up in the service. Subsequent requests will check for the existence of that task and subscribe to it. That worker task will query the database and return the row to all subscribers.” # 8th March 2023, 7:07 pm


Promise Maps. Egbert Teeselink describes a neat JavaScript caching pattern: instead of caching key:value cache key:promise-that-resolves-to-value—doing this gives you dog piling prevention for free, because the first lookup of a value trigers the computation to fetch it while subsequent lookups wait on the same promise to resolve—or resolve instantly if the computation has completed. # 22nd July 2022, 3:52 pm


New HTTP standards for caching on the modern web (via) Cache-Status is a new HTTP header (RFC from August 2021) designed to provide better debugging information about which caches were involved in serving a request—“Cache-Status: Nginx; hit, Cloudflare; fwd=stale; fwd-status=304; collapsed; ttl=300” for example indicates that Nginx served a cache hit, then Cloudflare had a stale cached version so it revalidated from Nginx, got a 304 not modified, collapsed multiple requests (dogpile prevention) and plans to serve the new cached value for the next five minutes. Also described is $Target-Cache-Control: which allows different CDNs to respond to different headers and is already supported by Cloudflare and Akamai (Cloudflare-CDN-Cache-Control: and Akamai-Cache-Control:). # 21st October 2021, 10:40 pm


nginx proxy-cache-lock (via) Crucially important feature hidden away in the nginx documentation: proxy_cache_lock enables request coalescing, or dog-pile protection: it means that if a hundred simultaneous requests all suffer the same cache miss, only one request is made to the backend and the answer is then sent back to all hundred requests at once. I’ve leaned heavily on this feature in Varnish for years—useful to know that nginx has the same capability. # 14th November 2017, 9:53 pm


dogproxy. Another of my experiments with Node.js—this is a very simple HTTP proxy which addresses the dog pile effect (also known as the thundering herd) by watching out for multiple requests for a URL that is currently “in flight” and bundling them together. # 3rd February 2010, 1:05 pm


RSSCloud Vs. PubSubHubbub: Why The Fat Pings Win. A PubSubHubbub advocate explains the differences between the two proposals: most importantly, PubSubHubbub includes the actual new content with the “fat ping” whereas RSSCloud just notifies you that you should poll the RSS feed, leading to a potential thundering herd. I’m still hoping one of those specs will detail a way in which they can be used for scalable regular WebHook-style notifications without any feed infrastructure at all. # 10th September 2009, 3:49 pm


MintCache for Django. Caching scheme for Django that solves the dog-pile effect, where high traffic causes many processes to regenerate stale cached data at the same time. # 2nd May 2007, 8:49 am