Weeknotes: DjangoCon, SQLite in Django, datasette-gunicorn
I spent most of this week at DjangoCon in San Diego—my first outside-of-the-Bay-Area conference since the before-times.
It was a most excellent event. I spent a lot of time in the corridor track—actually the sitting-outside-in-the-sunshine track, catching up with people I haven’t seen in several years.
I gave a talk titled "Massively increase your productivity on personal projects with comprehensive documentation and automated tests", with the alternative title "Coping strategies for the serial project hoarder". I’ll do a full write-up of this once the video is made available in a few weeks time, but in the meantime the talk materials can be found here:
- Supporting notes and links
- Slides on Speaker Deck
- Video for paying DjangoCon attendees (public video coming soon)
I also gave a lightning talk about AI and magic, which was effectively the five minute oral version of my recent blog post Is the AI spell-casting metaphor harmful or helpful?
Benchmarking SQLite in Django
I also hung around for the first day of the DjangoCon sprints.
For over a decade, the Django documentation has warned against using SQLite in production—recommending PostgreSQL or MySQL instead.
I asked Django Fellow Carlton Gibson what it would take to update that advice for 2022. He suggested that what we really needed was a solid idea for how well modern SQLite performs with Django, against a variety of different settings.
So I spent some time running benchmarks, using my new django_sqlite_benchmark repository.
You can follow the full details of my experiments in these issues:
- #2: Locust test to exercise /counter/xxx endpoint which runs benchmarks against a simple Django view that increments a counter stored in a SQLite table
- #3: Load test for larger writes runs a benchmark using a script that inserts larger JSON objects into a database table. I also tried this against PostgreSQL, getting very similar numbers to SQLite.
- #4: Benchmark endpoint that doesn’t interact with database benchmarks a simple “hello world” view that doesn’t use SQLite at all—as a baseline for comparison
Here’s the TLDR version of the results: SQLite in its default “journal” mode starts returning “database locked” errors pretty quickly as the write load increases. But... if you switch to “wal” mode (here’s how) those errors straight up vanish!
I was expecting WAL mode to improve things, but I thought I’d still be able to hit errors even with it enabled. No—it turns out that, at least for the amount of traffic I could generate on may laptop, WAL mode proved easily capable of handling the load.
Even without WAL mode, bumping the SQLite “timeout” option up to 20s solved most of the errors.
Even more interestingly: I tried using Gunicorn (and Uvicorn) to run multiple Django workers at once. I was certain this would lead to problems, as SQLite isn’t designed to handle writes from multiple processes at once... or so I thought. It turned out SQLite’s use of file locking meant everything worked far better than I expected—and upping the number of worker processes from 1 to 4 resulted in approximately a 4x increase in throughput.
I shouldn’t be surprised by this, if only because every time I’ve tried to push SQLite in a new direction it’s impressed me with how much more capable it is than I expected.
But still, these results are very exciting. This problem still needs more thorough testing and more eyes than just mine, but I think this indicates that SQLite should absolutely be considered a viable option for running Django in production in 2022.
Based on my experiments with SQLite and Django—in particular how running multiple worker processes gave me an increase in how much traffic I could handle—I decided to try the same thing with Datasette itself.
Gunicorn remains one of the most well regarded options for deploying Python web applications. It acts as a process monitor, balancing requests between different workers and restarting anything that fails with an error.
I decided to experiment with this through the medium of a Datasette plugin. So I built datasette-gunicorn, a plugin that adds an extra command to Datasette that lets you start it like this:
datasette gunicorn my.db --workers 4
It takes most of the same arguments as Datasette’s regular
datasette serve command, plus that new
-w/--workers option for setting the number of workers.
Initial benchmarks were very positive: 21 requests a second with a single worker, increasing to 75 requests/second with four! Not bad for an initial experiment. I also tested it serving a static page through Datasette and got up to over 500 requests a second with a warning that Locust needed to be moved to a separate machine for a full load test.
In writing the plugin I had to figure out how to build a new command that mostly copied parameters from the existing
datasette serve Click command—I wrote a TIL about how I ended up doing that.
Also this week: I released shot-scraper 1.0.
Despite the exciting version number this actually only has two small new features. Here’s the full changelog:
- Tutorial: Automating screenshots for the Datasette documentation using shot-scraper.
I bumped it to 1.0 because
shot-scraper is mature enough now that I’m ready to commit to not breaking existing features (at least without shipping a 2.0, which I hope to avoid for as long as possible).
I’m always trying to get more brave when it comes to stamping a 1.0 release on my main projects.
(I really, really need to get Datasette 1.0 shipped soon.)
Releases this week
Plugin for running Datasette using Gunicorn
shot-scraper: 1.0—(23 releases total)—2022-10-15
A command-line utility for taking automated screenshots of websites
asgi-gzip: 0.2—(2 releases total)—2022-10-13
gzip middleware for ASGI applications, extracted from Starlette