Simon Willison’s Weblog

Subscribe

45 items tagged “caching”

2024

Weeknotes: Page caching and custom templates for Datasette Cloud

My main development focus this week has been adding public page caching to Datasette Cloud, and exploring what custom template support might look like for that service.

[... 924 words]

2023

Cloudflare does not consider vary values in caching decisions. Here’s the spot in Cloudflare’s documentation where they hide a crucially important detail:

“Cloudflare does not consider vary values in caching decisions. Nevertheless, vary values are respected when Vary for images is configured and when the vary header is vary: accept-encoding.”

This means you can’t deploy an application that uses content negotiation via the Accept header behind the Cloudflare CDN—for example serving JSON or HTML for the same URL depending on the incoming Accept header. If you do, Cloudflare may serve cached JSON to an HTML client or vice-versa.

There’s an exception for image files, which Cloudflare added support for in September 2021 (for Pro accounts only) in order to support formats such as WebP which may not have full support across all browsers. # 20th November 2023, 5:08 am

2021

New HTTP standards for caching on the modern web (via) Cache-Status is a new HTTP header (RFC from August 2021) designed to provide better debugging information about which caches were involved in serving a request—“Cache-Status: Nginx; hit, Cloudflare; fwd=stale; fwd-status=304; collapsed; ttl=300” for example indicates that Nginx served a cache hit, then Cloudflare had a stale cached version so it revalidated from Nginx, got a 304 not modified, collapsed multiple requests (dogpile prevention) and plans to serve the new cached value for the next five minutes. Also described is $Target-Cache-Control: which allows different CDNs to respond to different headers and is already supported by Cloudflare and Akamai (Cloudflare-CDN-Cache-Control: and Akamai-Cache-Control:). # 21st October 2021, 10:40 pm

Last Mile Redis (via) Fly.io article about running a local redis cache in each of their geographic regions—“Cache data overlaps a lot less than you assume it will. For the most part, people in Singapore will rely on a different subset of data than people in Amsterdam or São Paulo or New Jersey.” But then they note that Redis has the ability to act as both a replica of a primary AND a writable server at the same time (“replica-read-only no”), which actually makes sense for a cache—it lets you cache local data but send out cluster-wide cache purges if necessary. # 17th July 2021, 2:44 am

2017

nginx proxy-cache-lock (via) Crucially important feature hidden away in the nginx documentation: proxy_cache_lock enables request coalescing, or dog-pile protection: it means that if a hundred simultaneous requests all suffer the same cache miss, only one request is made to the backend and the answer is then sent back to all hundred requests at once. I’ve leaned heavily on this feature in Varnish for years—useful to know that nginx has the same capability. # 14th November 2017, 9:53 pm

2011

Display your events on your own website with Lanyrd Badges. We’ve launched badges for Lanyrd—JavaScript that lets you embed a top bar or a content “splat” showing events you plan to attend, talks you’ve given in the past and other various combinations. I’m quite pleased with the implementation—the badges are configured using classes on a link to your Lanyrd profile, and the badges themselves are served through a combination of Amazon CloudFront for the initial script and a Varnish cache for the badge data itself to keep things nice and snappy. # 13th January 2011, 8:38 pm

2010

Reddit is now running on Cassandra. Migrating their persistent cache over from memcacheDB to Cassandra took one developer just ten days. # 13th March 2010, 12:14 am

Cache Machine: Automatic caching for your Django models. This is the third new ORM caching layer for Django I’ve seen in the past month! Cache Machine was developed for zamboni, the port of addons.mozilla.org to Django. Caching is enabled using a model mixin class (to hook up some post_delete hooks) and a custom caching manager. Invalidation works by maintaining a “flush list” of dependent cache entries for each object—this is currently stored in memcached and hence has potential race conditions, but a comment in the source code suggests that this could be solved by moving to redis. # 11th March 2010, 7:35 pm

Announcing django-cachebot. The ORM caching space around Django is heating up. django-cachebot is used in production at mingle.com and takes a more low level approach to cache invalidation than Johnny Cache, enabling you to specifically mark the querysets you wish to cache and providing some advanced options for cache invalidation. Unfortunately it currently relies on a patch to Django core to enable its own manager. # 6th March 2010, 12:48 pm

Is johnny-cache for you? “Using Johnny is really adopting a particular caching strategy. This strategy isn’t always a win; it can impact performance negatively”—but for a high percentage of Django sites there’s a very good chance it will be a net bonus. # 2nd March 2010, 11:44 am

jmoiron.net: Johnny Cache. The blog entry announcing Johnny Cache (“a drop-in caching library/framework for Django that will cache all of your querysets forever in a consistent and safe manner”) to the world. # 1st March 2010, 11:48 am

Johnny Cache. Clever twist on ORM-level caching for Django. Johnny Cache (great name) monkey-patches Django’s QuerySet classes and caches the result of every single SELECT query in memcached with an infinite expiry time. The cache key includes a “generation” ID for each dependent database table, and the generation is changed every single time a table is updated. For apps with infrequent writes, this strategy should work really well—but if a popular table is being updated constantly the cache will be all but useless. Impressively, the system is transaction-aware—cache entries created during a transaction are held in local memory and only pushed to memcached should the transaction complete successfully. # 28th February 2010, 10:55 pm

Internet Explorer Cookie Internals (FAQ). Grr... IE 6, 7 and 8 don’t support the max-age cookie argument, forcing you to use an explicit expiry date instead. This appears to affect the cache busting cookie pattern, where you set a cookie to expire in 30 seconds for any user who posts content and use the presence of that cookie to skip caches and/or send their queries to a master instead of slave database. If you have to use expires, users with incorrect system clocks may get inconsistent results. Anyone know of a workaround? # 26th February 2010, 12:25 pm

2009

High-end Varnish-tuning. Tuning the Varnish HTTP cache to serve 27K requests/second on a single core 2.2GHz Opteron. # 20th October 2009, 9:25 am

Caching in ASP.NET with the SqlCacheDependency Class. Interesting cache invalidation concept: set up dependencies between cache entries and tables or rows in the database, then use triggers (which I presume are automatically created for you) to clear your cache. # 18th August 2009, 12:15 pm

Memcached 1.4.0 released. The big new feature is the (optional) binary protocol, which enables other features such as CAS-everywhere and efficient client-side replication. Maintainer Dustin Sallings has also released some useful sounding EC2 instances which automatically assign nearly all of their RAM to memcached on launch and shouldn’t need any further configuration. # 17th July 2009, 10:26 pm

Yahoo! proposal to open source “Traffic Server” via the ASF. Traffic Server is a “fast, scalable and extensible HTTP/1.1 compliant caching proxy server” (presumably equivalent to things like Squid and Varnish) originally acquired from Inktomi and developed internally at Yahoo! for the past three years, which has been benchmarked handling 35,000 req/s on a single box. No source code yet but it looks like the release will arrive pretty soon. # 7th July 2009, 12:37 pm

cache-money. A “write-through caching library for ActiveRecord”, maintained by Nick Kallen from Twitter. Queries hit memcached first, and caches are automatically kept up-to-date when objects are created, updated and deleted. Only some queries are supported—joins and comparisons won’t hit the cache, for example. # 28th June 2009, 3:17 pm

Twitter, an Evolving Architecture. The most detailed write-up of Twitter’s current architecture I’ve seen, explaining the four layers of cache (all memcached) used by the Twitter API. # 28th June 2009, 3:09 pm

Django tip: Caching and two-phased template rendering. Neat trick for expensive pages which can be mostly cached with the exception of the “logged in as” bit—run them through the template system twice, caching the intermediary generated template. # 19th May 2009, 1:34 am

mmalone’s django-caching. Mike Malone shares code used by Pownce to add QuerySet level caching to Django. It’s a smart implementation—a CachingQuerySet class inspects the arguments passed to get(), and if they’re just a straight forward exact PK lookup hits memcache for the object before hitting the database. Signals are used to invalidate the cache. # 7th May 2009, 7:36 am

hash_ring 1.2. A Python library for consistent hashing with memcached, using MD5 and the same algorithm as libketama. Exposes an interface that is identical to regular memcache making this a drop-in replacement. # 5th May 2009, 1:45 pm

How search.twitter.com uses Varnish. Includes examples of the configuration options they use. # 2nd March 2009, 5:08 pm

Sharding Counters on Google App Engine. “While the datastore for App Engine scales to support a huge number of entities it is important to note that you can only expect to update any single entity, or entity-group, about five times a second”. This article explains a technique for sharding writes across multiple counters in detail, including a way to keep a memcache counter updated at the same time for faster reads. # 27th January 2009, 8:27 pm

2008

ETags And Modification Times In Django. Part of Malcolm’s series of tutorials on implementing advanced HTTP concepts in Django. # 13th December 2008, 9:49 am

REST, I just don’t get it. Read the comments for some excellent practical reasons to care about REST, including cache management (PUT and DELETE can expire the cache entries for the corresponding GET), the ability to add or move parts of the server API without redeploying client libraries and the idempotency of GET / PUT / DELETE and HEAD (repeated POST operations may have side-effects). # 15th August 2008, 8:20 am

Facelift Image Replacement. Like sIFR but with JavaScript and a PHP text rendering component. I question the need for the JavaScript if you’re already generating the images on the server, but the actual generation script is nicely done—it makes smart use of ImageMagick and caches the generated images. # 5th August 2008, 6:36 pm

quipt (via) Extremely clever idea: Cache JavaScript in window.name (which persists between page views and can hold several MB of data), but use document.referrer to check that an external domain hasn’t loaded the cache with malicious code for an XSS attack. UPDATE: Jesse Ruderman points out a fatal flaw in the comments. # 4th July 2008, 3:49 pm

Velocity: A Distributed In-Memory Cache from Microsoft. I’d been wondering what Microsoft ecosystem developers were using in the absence of memcached. Is Velocity the first Windows platform implementation of this idea? # 6th June 2008, 9:52 pm

Obscure bugs revisited: IE, HTTPS and plugins. Filed for future reference: IE breaks mysteriously if you serve it up plugin content (e.g. Flash) over HTTPS with a no-cache header—it deletes the file from cache before the plugin software gets a chance to open it. # 30th May 2008, 9:54 am