Weeknotes: Page caching and custom templates for Datasette Cloud
7th January 2024
My main development focus this week has been adding public page caching to Datasette Cloud, and exploring what custom template support might look like for that service.
Datasette Cloud primarily provides private “spaces” for teams to collaborate on data. A team can invite additional members, upload CSV files, use the API to ingest data, run enrichments, share private comments and browse and query the data together.
The overall goal is to help teams find stories in their data.
Originally I planned Datasette Cloud as an exclusively private collaboration space, but with hindsight this was a mistake. Datasette has been a tool for publishing data right from the start, and Datasette Cloud users quickly started asking for ways to share their data with the world.
I started with a plugin for this, datasette-public, allowing tables to be selectively made visible to unauthenticated users.
This raised a couple of challenges though. First, I worry about sudden spikes of traffic. Each Datasette Cloud user gets their own dedicated Fly container to ensure performance issues are isolated and don’t affect other users, but I still don’t like the idea of a big public traffic spike taking down a user’s site.
This week I’ve been exploring a way to address both of these issues.
Full page caching for unauthorized users
I’ve used this trick multiple times through my career—at Lanyrd, at Eventbrite and even for my own personal blog. If a user is signed out, serve them pages through a simple full-page cache—something like Varnish. Set a short TTL on that cache—maybe as short as 15s—such that cached content doesn’t have time to go stale.
Good caches include support for dog-pile prevention, also known as request coalescing. If 10 requests come in for the same page at exactly the same moment, the cache bundles them together and makes just a single request to the backend, then serves the result to all 10 waiting clients.
How to implement this for Datasette Cloud? My current plan is to use a separate domain—
.datasette.site—for the publicly visible pages of each site. So
simon.datasette.cloud (my personal Datasette Cloud space) would have
simon.datasette.site as its public domain.
I got this working as a proof-of-concept this week. I actually got it working twice: I figured out how to run a dedicated Varnish instance on Fly, and then I realized that Cloudflare also now offer wildcard DNS support so I tried that out too.
I have both mechanisms up and running at the moment, on two separate domains. I’ll likely go with the Cloudflare option to reduce the number of moving parts I’m responsible for myself, but having both means I can compare them to see which one is likely to work best.
Custom templates based on host
The other reason I decided to explore
*.datasette.site was the security issue I mentioned earlier.
My current plan is to have
*.datasette.site work as an entirely cookie-free domain. Any functionality that requires authentication will be handled by the privileged
*.datasette.cloud domain instead.
This means I can allow users to provide their own custom templates for their public Datasette instance, without worrying that any mistakes in those templates could lead to a security breach elsewhere within the service.
There was just one catch: this meant I needed Datasette to be able to use different templates depending on host that the content was being served on.
After wasting a bunch of time trying to get this to work through monkey-patching, I realized the solution was to add a new plugin hook. jinja2_environment_from_request(datasette, request, env) is now implemented on
main and should be out in a new alpha release pretty soon. The documentation for that hook includes an example that hints at how I’m using it for Datasette Cloud.
Fun further applications of this pattern
I’m wary of adding features to Datasette that only serve Datasette Cloud. In this case, I realized that the new plugin hook opens up some interesting possibilities for other users of Datasette.
Currently, those sites are hosted separately—which means I’m paying to run Datasette multiple times.
With the ability to serve different templates based on host, I’ve realized I could instead serve a single Datasette instance for multiple sites, each with their own custom templates.
Taking advantage of CNAMEs—or even wildcard DNS—means I could run a whole family of weird personal projects on a single instance without any incremental cost for each new project!
Upgrade Datasette instance configuration to handle new features
More recent articles
- Interesting ideas in Observable Framework - 3rd March 2024
- Weeknotes: Getting ready for NICAR - 27th February 2024
- The killer app of Gemini Pro 1.5 is video - 21st February 2024
- Weeknotes: a Datasette release, an LLM release and a bunch of new plugins - 9th February 2024
- LLM 0.13: The annotated release notes - 26th January 2024
- Weeknotes: datasette-test, datasette-build, PSF board retreat - 21st January 2024
- Talking about Open Source LLMs on Oxide and Friends - 17th January 2024
- Publish Python packages to PyPI with a python-lib cookiecutter template and GitHub Actions - 16th January 2024
- What I should have said about the term Artificial Intelligence - 9th January 2024