<?xml version="1.0" encoding="utf-8"?>
<feed xml:lang="en-us" xmlns="http://www.w3.org/2005/Atom"><title>Simon Willison's Weblog: asgi</title><link href="http://simonwillison.net/" rel="alternate"/><link href="http://simonwillison.net/tags/asgi.atom" rel="self"/><id>http://simonwillison.net/</id><updated>2023-07-24T19:51:33+00:00</updated><author><name>Simon Willison</name></author><entry><title>asgi-replay</title><link href="https://simonwillison.net/2023/Jul/24/asgi-replay/#atom-tag" rel="alternate"/><published>2023-07-24T19:51:33+00:00</published><updated>2023-07-24T19:51:33+00:00</updated><id>https://simonwillison.net/2023/Jul/24/asgi-replay/#atom-tag</id><summary type="html">
    
&lt;p&gt;&lt;strong&gt;&lt;a href="https://github.com/simonw/asgi-replay"&gt;asgi-replay&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;
As part of submitting LLM to Homebrew core I needed an automated test that demonstrated that the tool was working—but I couldn’t test against the live OpenAI API because I didn’t want to have to reveal my API token as part of the test. I solved this by creating a dummy HTTP endpoint that simulates a hit to the OpenAI API, then configuring the Homebrew test to hit that instead. As part of THAT I ended up building this tiny tool which uses my asgi-proxy-lib package to intercept and log the details of hits made to a service, then provides a mechanism to replay that traffic.


    &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/projects"&gt;projects&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/asgi"&gt;asgi&lt;/a&gt;&lt;/p&gt;



</summary><category term="projects"/><category term="asgi"/></entry><entry><title>Writing a chat application in Django 4.2 using async StreamingHttpResponse, Server-Sent Events and PostgreSQL LISTEN/NOTIFY</title><link href="https://simonwillison.net/2023/May/19/chat-application-in-django/#atom-tag" rel="alternate"/><published>2023-05-19T15:42:03+00:00</published><updated>2023-05-19T15:42:03+00:00</updated><id>https://simonwillison.net/2023/May/19/chat-application-in-django/#atom-tag</id><summary type="html">
    
&lt;p&gt;&lt;strong&gt;&lt;a href="https://valberg.dk/django-sse-postgresql-listen-notify.html"&gt;Writing a chat application in Django 4.2 using async StreamingHttpResponse, Server-Sent Events and PostgreSQL LISTEN/NOTIFY&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;
Excellent tutorial by Víðir Valberg Guðmundsson on implementing chat with server-sent events using the newly async-capable StreamingHttpResponse from Django 4.2x.&lt;/p&gt;

&lt;p&gt;He uses PostgreSQL’a LISTEN/NOTIFY mechanism which can be used asynchronously in psycopg3—at the cost of a separate connection per user of the chat.&lt;/p&gt;

&lt;p&gt;The article also covers how to use the Last-Event-ID header to implement reconnections in server-sent events, transmitting any events that may have been missed during the time that the connection was dropped.

    &lt;p&gt;&lt;small&gt;&lt;/small&gt;Via &lt;a href="https://lobste.rs/s/qyler8/writing_chat_application_django_4_2_using"&gt;lobste.rs&lt;/a&gt;&lt;/small&gt;&lt;/p&gt;


    &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/async"&gt;async&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/django"&gt;django&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/postgresql"&gt;postgresql&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/asgi"&gt;asgi&lt;/a&gt;&lt;/p&gt;



</summary><category term="async"/><category term="django"/><category term="postgresql"/><category term="asgi"/></entry><entry><title>datasette-granian</title><link href="https://simonwillison.net/2023/Jan/20/datasette-granian/#atom-tag" rel="alternate"/><published>2023-01-20T02:12:03+00:00</published><updated>2023-01-20T02:12:03+00:00</updated><id>https://simonwillison.net/2023/Jan/20/datasette-granian/#atom-tag</id><summary type="html">
    
&lt;p&gt;&lt;strong&gt;&lt;a href="https://github.com/simonw/datasette-granian"&gt;datasette-granian&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;
Granian is a new Python web server—similar to Gunicorn—written in Rust. I built a small plugin that adds a “datasette granian” command starting a Granian server that serves Datasette’s ASGI application, using the same pattern as my existing datasette-gunicorn plugin.

    &lt;p&gt;&lt;small&gt;&lt;/small&gt;Via &lt;a href="https://github.com/emmett-framework/granian/issues/35#issuecomment-1397829516"&gt;Granian issue tracker: Ability to serve an ASGI app object directly, rather than passing a module string&lt;/a&gt;&lt;/small&gt;&lt;/p&gt;


    &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/rust"&gt;rust&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/datasette"&gt;datasette&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/asgi"&gt;asgi&lt;/a&gt;&lt;/p&gt;



</summary><category term="rust"/><category term="datasette"/><category term="asgi"/></entry><entry><title>Deploying Python web apps as AWS Lambda functions</title><link href="https://simonwillison.net/2022/Sep/19/mangum/#atom-tag" rel="alternate"/><published>2022-09-19T04:05:03+00:00</published><updated>2022-09-19T04:05:03+00:00</updated><id>https://simonwillison.net/2022/Sep/19/mangum/#atom-tag</id><summary type="html">
    
&lt;p&gt;&lt;strong&gt;&lt;a href="https://til.simonwillison.net/awslambda/asgi-mangum"&gt;Deploying Python web apps as AWS Lambda functions&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;
After literally years of failed half-hearted attempts, I finally managed to deploy an ASGI Python web application (Datasette) to an AWS Lambda function! Here are my extensive notes.


    &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/aws"&gt;aws&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/lambda"&gt;lambda&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/python"&gt;python&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/serverless"&gt;serverless&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/datasette"&gt;datasette&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/asgi"&gt;asgi&lt;/a&gt;&lt;/p&gt;



</summary><category term="aws"/><category term="lambda"/><category term="python"/><category term="serverless"/><category term="datasette"/><category term="asgi"/></entry><entry><title>Weeknotes: Datasette Lite, nogil Python, HYTRADBOI</title><link href="https://simonwillison.net/2022/May/6/weeknotes/#atom-tag" rel="alternate"/><published>2022-05-06T22:56:39+00:00</published><updated>2022-05-06T22:56:39+00:00</updated><id>https://simonwillison.net/2022/May/6/weeknotes/#atom-tag</id><summary type="html">
    &lt;p&gt;My big project this week was &lt;a href="https://simonwillison.net/2022/May/4/datasette-lite/"&gt;Datasette Lite&lt;/a&gt;, a new way to run Datasette directly in a browser, powered by WebAssembly and &lt;a href="https://pyodide.org/"&gt;Pyodide&lt;/a&gt;. I also continued my research into running SQL queries in parallel, described &lt;a href="https://simonwillison.net/2022/Apr/27/parallel-queries/"&gt;last week&lt;/a&gt;. Plus I spoke at &lt;a href="https://www.hytradboi.com/"&gt;HYTRADBOI&lt;/a&gt;.&lt;/p&gt;
&lt;h4&gt;Datasette Lite&lt;/h4&gt;
&lt;p&gt;This started out as a research project, inspired by the excitement around Python in the browser from PyCon US last week (which I didn't attend, but observed with some jealousy on Twitter).&lt;/p&gt;
&lt;p&gt;I've been wanting to explore this possibility for a while. &lt;a href="https://jupyterlite.readthedocs.io/en/latest/"&gt;JupyterLite&lt;/a&gt; had convinced me that it would be feasible to run Datasette using Pyodide, especially after I found out that the &lt;code&gt;sqlite3&lt;/code&gt; module from the Python standard library works there already.&lt;/p&gt;
&lt;p&gt;I have a private "notes" GitHub repository which I use to keep notes in GitHub issues. I started a thread there researching the possibility of running an ASGI application in Pyodide, thinking that might be a good starting point to getting Datasette to work.&lt;/p&gt;
&lt;p&gt;The proof of concept moved remarkably quickly, especially once I realized that Service Workers weren't going to work but Web Workers might.&lt;/p&gt;
&lt;p&gt;Once I had comitted to Datasette Lite as a full project I started &lt;a href="https://github.com/simonw/datasette-lite"&gt;a new repository&lt;/a&gt; for it and transferred across my initial prototype issue thread. You can read that full thread for a blow-by-blow account of how my research pulled together in &lt;a href="https://github.com/simonw/datasette-lite/issues/1"&gt;datasette-lite issue #1&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;The rest of the project is documented in detail in &lt;a href="https://simonwillison.net/2022/May/4/datasette-lite/"&gt;my blog post&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;Since launching it the biggest change I've made was a change of URL: since it's clearly going to be a core component of the Datasette project going forward I promoted it from &lt;code&gt;simonw.github.io/datasette-lite/&lt;/code&gt; to its new permanent home at &lt;a href="https://lite.datasette.io"&gt;lite.datasette.io&lt;/a&gt;. It's still hosted by GitHub Pages - here's &lt;a href="https://til.simonwillison.net/github/custom-subdomain-github-pages"&gt;my TIL&lt;/a&gt; about setting up the new domain.&lt;/p&gt;
&lt;p&gt;It may have started as a proof of concept tech demo, but the response to it so far has convinced me that I should really take it seriously. Being able to host Datasette without needing to run any server-side code at all is an incredibly compelling experience.&lt;/p&gt;
&lt;p&gt;It doesn't matter how hard I work on getting the Datasette &lt;a href="https://docs.datasette.io/en/stable/publish.html"&gt;deployment experience&lt;/a&gt; as easy as possible, static file hosting will always be an order of magnitude more accessible. And even at this early stage Datasette Lite is already proving to be a genuinely useful way to run the software.&lt;/p&gt;
&lt;p&gt;As part of this research I also shipped &lt;a href="https://sqlite-utils.datasette.io/en/stable/changelog.html#v3-26-1"&gt;sqlite-utils 3.26.1&lt;/a&gt; with a minor dependency fix that means it works in Pyodide now. You can try that out by running the following in the &lt;a href="https://pyodide.org/en/stable/console.html"&gt;Pyodide REPL&lt;/a&gt;:&lt;/p&gt;
&lt;div class="highlight highlight-text-python-console"&gt;&lt;pre&gt;&amp;gt;&amp;gt;&amp;gt; &lt;span class="pl-k"&gt;import&lt;/span&gt; micropip
&amp;gt;&amp;gt;&amp;gt; &lt;span class="pl-k"&gt;await&lt;/span&gt; micropip.install(&lt;span class="pl-s"&gt;&lt;span class="pl-pds"&gt;"&lt;/span&gt;sqlite-utils&lt;span class="pl-pds"&gt;"&lt;/span&gt;&lt;/span&gt;)
&amp;gt;&amp;gt;&amp;gt; &lt;span class="pl-k"&gt;import&lt;/span&gt; sqlite_utils
&amp;gt;&amp;gt;&amp;gt; db &lt;span class="pl-k"&gt;=&lt;/span&gt; sqlite_utils.Database(&lt;span class="pl-v"&gt;memory&lt;/span&gt;&lt;span class="pl-k"&gt;=&lt;/span&gt;&lt;span class="pl-c1"&gt;True&lt;/span&gt;)
&amp;gt;&amp;gt;&amp;gt; &lt;span class="pl-c1"&gt;list&lt;/span&gt;(db.query(&lt;span class="pl-s"&gt;&lt;span class="pl-pds"&gt;"&lt;/span&gt;select 3 * 5&lt;span class="pl-pds"&gt;"&lt;/span&gt;&lt;/span&gt;))
[{'3 * 5': 15}]&lt;/pre&gt;&lt;/div&gt;
&lt;h4 id="nogil"&gt;Parallel SQL queries work... if you can get rid of the GIL&lt;/h4&gt;
&lt;p&gt;Last week I described my effort to implement &lt;a href="https://simonwillison.net/2022/Apr/27/parallel-queries/"&gt;Parallel SQL queries for Datasette&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;The idea there was that many Datasette pages execute multiple SQL queries - a &lt;code&gt;count(*)&lt;/code&gt; and a &lt;code&gt;select ... limit 101&lt;/code&gt; for example - that could be run in parallel instead of serial, for a potential improvement in page load times.&lt;/p&gt;
&lt;p&gt;My hope was that I could get away with this despite Python's infamous Global Interpreter Lock because the &lt;code&gt;sqlite3&lt;/code&gt; C module releases the GIL when it executes a query.&lt;/p&gt;
&lt;p&gt;My initial results weren't showing an increase in performance, even while the queries were shown to be overlapping each other. I opened &lt;a href="https://github.com/simonw/datasette/issues/1727"&gt;a research thread&lt;/a&gt; and spent some time this week investigating.&lt;/p&gt;
&lt;p&gt;My conclusion, sadly, was that the GIL was indeed to blame. &lt;code&gt;sqlite3&lt;/code&gt; releases the GIL to execute the query, but there's still a lot of work that happens in Python land itself - most importantly the code that assembles the objects that represent the rows returned by the query, which is still subject to the GIL.&lt;/p&gt;
&lt;p&gt;Then &lt;a href="https://lobste.rs/s/9hj80j/when_python_can_t_thread_deep_dive_into_gil#c_2n0fga"&gt;this comment&lt;/a&gt; on a thread about the GIL on Lobsters reminded me of the &lt;a href="https://github.com/colesbury/nogil"&gt;nogil fork&lt;/a&gt; of Python by Sam Gross, who has been working on this problem for several years now.&lt;/p&gt;
&lt;p&gt;Since that fork has &lt;a href="https://github.com/colesbury/nogil#docker"&gt;a Docker image&lt;/a&gt; trying it out was easy... and to my amazement &lt;a href="https://simonwillison.net/2022/Apr/29/nogil/"&gt;it worked&lt;/a&gt;! Running my parallel queries implementation against &lt;code&gt;nogil&lt;/code&gt; Python reduced a page load time from 77ms to 47ms.&lt;/p&gt;
&lt;p&gt;Sam's work is against Python 3.9, but he's &lt;a href="https://lukasz.langa.pl/5d044f91-49c1-4170-aed1-62b6763e6ad0/"&gt;discussing options&lt;/a&gt; for bringing his improvemets into Python itself with the core maintainers. I'm hopeful that this might happen in the next few years. It's an incredible piece of work.&lt;/p&gt;
&lt;p&gt;An amusing coincidence: one restriction of WASM and Pyodide is that they can't start new threads - so as part of getting Datasette to work on that platform I had to &lt;a href="https://github.com/simonw/datasette/issues/1735"&gt;add a new setting&lt;/a&gt; that disables the ability to run SQL queries in threads entirely!&lt;/p&gt;
&lt;h4&gt;datasette-copy-to-memory&lt;/h4&gt;
&lt;p&gt;One question I found myself asking while investigating parallel SQL queries (before I determined that the GIL was to blame) was whether parallel SQLite queries against the same database file were suffering from some form of file locking or contention.&lt;/p&gt;
&lt;p&gt;To rule that out, I built a new plugin: &lt;a href="https://datasette.io/plugins/datasette-copy-to-memory"&gt;datasette-copy-to-memory&lt;/a&gt; - which reads a SQLite database from disk and copies it into an in-memory database when Datasette first starts up.&lt;/p&gt;
&lt;p&gt;This didn't make an observable difference in performance, but I've not tested it extensively - especially not against larger databases using servers with increased amounts of available RAM.&lt;/p&gt;
&lt;p&gt;If you're inspired to give this plugin a go I'd love to hear about your results.&lt;/p&gt;
&lt;h4&gt;asgi-gzip and datasette-gzip&lt;/h4&gt;
&lt;p&gt;I mentioned &lt;code&gt;datasette-gzip&lt;/code&gt; last week: a plugin that acts as a wrapper around the excellent &lt;code&gt;GZipMiddleware&lt;/code&gt; from &lt;a href="https://www.starlette.io/"&gt;Starlette&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;The performance improvements from this - especially for larger HTML tables, which it turns out compress extremely well - were significant. Enough so that I plan to bring gzip support into Datasette core very shortly.&lt;/p&gt;
&lt;p&gt;Since I don't want to add the whole of Starlette as a dependency just to get gzip support, I extracted that code out into a new Python package called &lt;a href="https://github.com/simonw/asgi-gzip"&gt;asgi-gzip&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;The obvious risk with doing this is that it might fall behind the excellent Starlette implementation. So I came up with a pattern based on &lt;a href="https://simonwillison.net/2020/Oct/9/git-scraping/"&gt;Git scraping&lt;/a&gt; that would automatically open a new GitHub issue should the borrowed Starlette code change in the future.&lt;/p&gt;
&lt;p&gt;I wrote about that pattern in &lt;a href="https://simonwillison.net/2022/Apr/28/issue-on-changes/"&gt;Automatically opening issues when tracked file content changes&lt;/a&gt;.&lt;/p&gt;
&lt;h4&gt;Speaking at HYTRADBOI&lt;/h4&gt;
&lt;p&gt;I spoke at the &lt;a href="https://www.hytradboi.com/"&gt;HYTRADBOI conference&lt;/a&gt; last week: Have You Tried Rubbing A Database On It.&lt;/p&gt;
&lt;p&gt;HYTRADBOI was organized by Jamie Brandon. It was a neat event, with a smart format: 34 pre-recorded 10 minute long talks, arranged into a schedule to encourage people to watch and discuss them at specific times during the day of the event.&lt;/p&gt;
&lt;p&gt;It's worth reading Jamie's &lt;a href="https://www.scattered-thoughts.net/writing/hytradboi-2022-postmortem/"&gt;postmortem of the event&lt;/a&gt; for some insightful thinking on online event organization.&lt;/p&gt;
&lt;p&gt;My talk was &lt;a href="https://www.hytradboi.com/2022/datasette-a-big-bag-of-tricks-for-solving-interesting-problems-using-sqlite"&gt;Datasette: a big bag of tricks for solving interesting problems using SQLite&lt;/a&gt;. It ended up working out as a lightning-fast 10 minute tutorial on using the &lt;a href="https://sqlite-utils.datasette.io/en/stable/cli.html"&gt;sqlite-utils CLI&lt;/a&gt; to clean up some data (in this case &lt;a href="https://geodata.myfwc.com/datasets/myfwc::manatee-carcass-recovery-locations-in-florida/about"&gt;Manatee Carcass Recovery Locations in Florida&lt;/a&gt; since 1974) and then using Datasette to explore and publish it.&lt;/p&gt;
&lt;p&gt;I've posted &lt;a href="https://gist.github.com/simonw/c61447d866f7f29d368183fb09d9bf41"&gt;some basic notes&lt;/a&gt; to accompany the talk. My plan is to use this as the basis for an official tutorial on &lt;code&gt;sqlite-utils&lt;/code&gt; for the &lt;a href="https://datasette.io/tutorials"&gt;tutorials section&lt;/a&gt; of the Datasette website.&lt;/p&gt;
&lt;h4&gt;Releases this week&lt;/h4&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;&lt;a href="https://github.com/simonw/datasette"&gt;datasette&lt;/a&gt;&lt;/strong&gt;: &lt;a href="https://github.com/simonw/datasette/releases/tag/0.62a0"&gt;0.62a0&lt;/a&gt; - (&lt;a href="https://github.com/simonw/datasette/releases"&gt;111 releases total&lt;/a&gt;) - 2022-05-02
&lt;br /&gt;An open source multi-tool for exploring and publishing data&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;a href="https://github.com/simonw/sqlite-utils"&gt;sqlite-utils&lt;/a&gt;&lt;/strong&gt;: &lt;a href="https://github.com/simonw/sqlite-utils/releases/tag/3.26.1"&gt;3.26.1&lt;/a&gt; - (&lt;a href="https://github.com/simonw/sqlite-utils/releases"&gt;100 releases total&lt;/a&gt;) - 2022-05-02
&lt;br /&gt;Python CLI utility and library for manipulating SQLite databases&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;a href="https://github.com/simonw/click-default-group-wheel"&gt;click-default-group-wheel&lt;/a&gt;&lt;/strong&gt;: &lt;a href="https://github.com/simonw/click-default-group-wheel/releases/tag/1.2.2"&gt;1.2.2&lt;/a&gt; - 2022-05-02
&lt;br /&gt;Extends click.Group to invoke a command without explicit subcommand name (this version publishes a wheel)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;a href="https://github.com/simonw/s3-credentials"&gt;s3-credentials&lt;/a&gt;&lt;/strong&gt;: &lt;a href="https://github.com/simonw/s3-credentials/releases/tag/0.11"&gt;0.11&lt;/a&gt; - (&lt;a href="https://github.com/simonw/s3-credentials/releases"&gt;11 releases total&lt;/a&gt;) - 2022-05-01
&lt;br /&gt;A tool for creating credentials for accessing S3 buckets&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;a href="https://github.com/simonw/datasette-copy-to-memory"&gt;datasette-copy-to-memory&lt;/a&gt;&lt;/strong&gt;: &lt;a href="https://github.com/simonw/datasette-copy-to-memory/releases/tag/0.2"&gt;0.2&lt;/a&gt; - (&lt;a href="https://github.com/simonw/datasette-copy-to-memory/releases"&gt;5 releases total&lt;/a&gt;) - 2022-04-30
&lt;br /&gt;Copy database files into an in-memory database on startup&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;a href="https://github.com/simonw/datasette-gzip"&gt;datasette-gzip&lt;/a&gt;&lt;/strong&gt;: &lt;a href="https://github.com/simonw/datasette-gzip/releases/tag/0.2"&gt;0.2&lt;/a&gt; - (&lt;a href="https://github.com/simonw/datasette-gzip/releases"&gt;2 releases total&lt;/a&gt;) - 2022-04-28
&lt;br /&gt;Add gzip compression to Datasette&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;a href="https://github.com/simonw/asgi-gzip"&gt;asgi-gzip&lt;/a&gt;&lt;/strong&gt;: &lt;a href="https://github.com/simonw/asgi-gzip/releases/tag/0.1"&gt;0.1&lt;/a&gt; - 2022-04-28
&lt;br /&gt;gzip middleware for ASGI applications, extracted from Starlette&lt;/li&gt;
&lt;/ul&gt;
&lt;h4&gt;TIL this week&lt;/h4&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href="https://til.simonwillison.net/service-workers/intercept-fetch"&gt;Intercepting fetch in a service worker&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://til.simonwillison.net/github/custom-subdomain-github-pages"&gt;Setting up a custom subdomain for a GitHub Pages site&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
    
        &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/gil"&gt;gil&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/projects"&gt;projects&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/python"&gt;python&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/speaking"&gt;speaking&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/datasette"&gt;datasette&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/asgi"&gt;asgi&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/webassembly"&gt;webassembly&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/weeknotes"&gt;weeknotes&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/pyodide"&gt;pyodide&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/datasette-lite"&gt;datasette-lite&lt;/a&gt;&lt;/p&gt;
    

</summary><category term="gil"/><category term="projects"/><category term="python"/><category term="speaking"/><category term="datasette"/><category term="asgi"/><category term="webassembly"/><category term="weeknotes"/><category term="pyodide"/><category term="datasette-lite"/></entry><entry><title>Automatically opening issues when tracked file content changes</title><link href="https://simonwillison.net/2022/Apr/28/issue-on-changes/#atom-tag" rel="alternate"/><published>2022-04-28T17:18:14+00:00</published><updated>2022-04-28T17:18:14+00:00</updated><id>https://simonwillison.net/2022/Apr/28/issue-on-changes/#atom-tag</id><summary type="html">
    &lt;p&gt;I figured out a GitHub Actions pattern to keep track of a file published somewhere on the internet and automatically open a new repository issue any time the contents of that file changes.&lt;/p&gt;
&lt;h4&gt;Extracting GZipMiddleware from Starlette&lt;/h4&gt;
&lt;p&gt;Here's why I needed to solve this problem.&lt;/p&gt;
&lt;p&gt;I want to add gzip support to my &lt;a href="https://datasette.io/"&gt;Datasette&lt;/a&gt; open source project. Datasette builds on the Python &lt;a href="https://asgi.readthedocs.io/"&gt;ASGI&lt;/a&gt; standard, and &lt;a href="https://www.starlette.io/"&gt;Starlette&lt;/a&gt; provides an extremely well tested, robust &lt;a href="https://www.starlette.io/middleware/#gzipmiddleware"&gt;GZipMiddleware class&lt;/a&gt; that adds gzip support to any ASGI application. As with everything else in Starlette, it's &lt;em&gt;really&lt;/em&gt; good code.&lt;/p&gt;
&lt;p&gt;The problem is, I don't want to add the whole of Starlette as a dependency. I'm trying to keep Datasette's core as small as possible, so I'm very careful about new dependencies. Starlette itself is actually very light (and only has a tiny number of dependencies of its own) but I still don't want the whole thing just for that one class.&lt;/p&gt;
&lt;p&gt;So I decided to extract the &lt;code&gt;GZipMiddleware&lt;/code&gt; class into a separate Python package, under the same BSD license as Starlette itself.&lt;/p&gt;
&lt;p&gt;The result is my new &lt;a href="https://pypi.org/project/asgi-gzip/"&gt;asgi-gzip&lt;/a&gt; package, now available on PyPI.&lt;/p&gt;
&lt;h4&gt;What if Starlette fixes a bug?&lt;/h4&gt;
&lt;p&gt;The problem with extracting code like this is that Starlette is a very effectively maintained package. What if they make improvements or fix bugs in the &lt;code&gt;GZipMiddleware&lt;/code&gt; class? How can I make sure to apply those same fixes to my extracted copy?&lt;/p&gt;
&lt;p&gt;As I thought about this challenge, I realized I had most of the solution already.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;&lt;a href="https://simonwillison.net/2020/Oct/9/git-scraping/"&gt;Git scraping&lt;/a&gt;&lt;/strong&gt; is the name I've given to the trick of running a periodic scraper that writes to a git repository in order to track changes to data over time.&lt;/p&gt;
&lt;p&gt;It may seem redundant to do this against a file that already &lt;a href="https://github.com/encode/starlette/commits/master/starlette/middleware/gzip.py"&gt;lives in version control&lt;/a&gt; elsewhere - but in addition to tracking changes, Git scraping can offfer a cheap and easy way to add automation that triggers when a change is detected.&lt;/p&gt;
&lt;p&gt;I need an actionable alert any time the Starlette code changes so I can review the change and apply a fix to my own library, if necessary.&lt;/p&gt;
&lt;p&gt;Since I already run all of my projects out of GitHub issues, automatically opening an issue against the &lt;a href="https://github.com/simonw/asgi-gzip"&gt;asgi-gzip repository&lt;/a&gt; would be ideal.&lt;/p&gt;
&lt;p&gt;My &lt;a href="https://github.com/simonw/asgi-gzip/blob/0.1/.github/workflows/track.yml"&gt;track.yml workflow&lt;/a&gt; does exactly that: it implements the Git scraping pattern against the &lt;a href="https://github.com/encode/starlette/blob/master/starlette/middleware/gzip.py"&gt;gzip.py module&lt;/a&gt; in Starlette, and files an issue any time it detects changes to that file.&lt;/p&gt;
&lt;p&gt;Starlette haven't made any changes to that file since I started tracking it, so I created &lt;a href="https://github.com/simonw/issue-when-changed"&gt;a test repo&lt;/a&gt; to try this out.&lt;/p&gt;
&lt;p&gt;Here's &lt;a href="https://github.com/simonw/issue-when-changed/issues/3"&gt;one of the example issues&lt;/a&gt;. I decided to include the visual diff in the issue description and have a link to it from the underlying commit as well.&lt;/p&gt;
&lt;p&gt;&lt;img src="https://static.simonwillison.net/static/2022/issue-when-changed.jpg" alt="Screenshot of an open issue page. The issues is titled &amp;quot;gzip.py was updated&amp;quot; and contains a visual diff showing the change to a file. A commit that references the issue is listed too." style="max-width:100%;" /&gt;&lt;/p&gt;
&lt;h4&gt;How it works&lt;/h4&gt;
&lt;p&gt;The implementation is contained entirely in this &lt;a href="https://github.com/simonw/asgi-gzip/blob/0.1/.github/workflows/track.yml"&gt;track.yml workflow&lt;/a&gt;. I designed this to be contained as a single file to make it easy to copy and paste it to adapt it for other projects.&lt;/p&gt;
&lt;p&gt;It uses &lt;a href="https://github.com/actions/github-script"&gt;actions/github-script&lt;/a&gt;, which makes it easy to do things like file new issues using JavaScript.&lt;/p&gt;
&lt;p&gt;Here's a heavily annotated copy:&lt;/p&gt;
&lt;div class="highlight highlight-source-yaml"&gt;&lt;pre&gt;&lt;span class="pl-ent"&gt;name&lt;/span&gt;: &lt;span class="pl-s"&gt;Track the Starlette version of this&lt;/span&gt;

&lt;span class="pl-c"&gt;&lt;span class="pl-c"&gt;#&lt;/span&gt; Run on repo pushes, and if a user clicks the "run this action" button,&lt;/span&gt;
&lt;span class="pl-c"&gt;&lt;span class="pl-c"&gt;#&lt;/span&gt; and on a schedule at 5:21am UTC every day&lt;/span&gt;
&lt;span class="pl-ent"&gt;on&lt;/span&gt;:
  &lt;span class="pl-ent"&gt;push&lt;/span&gt;:
  &lt;span class="pl-ent"&gt;workflow_dispatch&lt;/span&gt;:
  &lt;span class="pl-ent"&gt;schedule&lt;/span&gt;:
  - &lt;span class="pl-ent"&gt;cron&lt;/span&gt;:  &lt;span class="pl-s"&gt;&lt;span class="pl-pds"&gt;'&lt;/span&gt;21 5 * * *&lt;span class="pl-pds"&gt;'&lt;/span&gt;&lt;/span&gt;

&lt;span class="pl-c"&gt;&lt;span class="pl-c"&gt;#&lt;/span&gt; Without this block I got this error when the action ran:&lt;/span&gt;
&lt;span class="pl-c"&gt;&lt;span class="pl-c"&gt;#&lt;/span&gt; HttpError: Resource not accessible by integration&lt;/span&gt;
&lt;span class="pl-ent"&gt;permissions&lt;/span&gt;:
  &lt;span class="pl-c"&gt;&lt;span class="pl-c"&gt;#&lt;/span&gt; Allow the action to create issues&lt;/span&gt;
  &lt;span class="pl-ent"&gt;issues&lt;/span&gt;: &lt;span class="pl-s"&gt;write&lt;/span&gt;
  &lt;span class="pl-c"&gt;&lt;span class="pl-c"&gt;#&lt;/span&gt; Allow the action to commit back to the repository&lt;/span&gt;
  &lt;span class="pl-ent"&gt;contents&lt;/span&gt;: &lt;span class="pl-s"&gt;write&lt;/span&gt;

&lt;span class="pl-ent"&gt;jobs&lt;/span&gt;:
  &lt;span class="pl-ent"&gt;check&lt;/span&gt;:
    &lt;span class="pl-ent"&gt;runs-on&lt;/span&gt;: &lt;span class="pl-s"&gt;ubuntu-latest&lt;/span&gt;
    &lt;span class="pl-ent"&gt;steps&lt;/span&gt;:
    - &lt;span class="pl-ent"&gt;uses&lt;/span&gt;: &lt;span class="pl-s"&gt;actions/checkout@v2&lt;/span&gt;
    - &lt;span class="pl-ent"&gt;uses&lt;/span&gt;: &lt;span class="pl-s"&gt;actions/github-script@v6&lt;/span&gt;
      &lt;span class="pl-c"&gt;&lt;span class="pl-c"&gt;#&lt;/span&gt; Using env: here to demonstrate how an action like this can&lt;/span&gt;
      &lt;span class="pl-c"&gt;&lt;span class="pl-c"&gt;#&lt;/span&gt; be adjusted to take dynamic inputs&lt;/span&gt;
      &lt;span class="pl-ent"&gt;env&lt;/span&gt;:
        &lt;span class="pl-ent"&gt;URL&lt;/span&gt;: &lt;span class="pl-s"&gt;https://raw.githubusercontent.com/encode/starlette/master/starlette/middleware/gzip.py&lt;/span&gt;
        &lt;span class="pl-ent"&gt;FILE_NAME&lt;/span&gt;: &lt;span class="pl-s"&gt;tracking/gzip.py&lt;/span&gt;
      &lt;span class="pl-ent"&gt;with&lt;/span&gt;:
        &lt;span class="pl-ent"&gt;script&lt;/span&gt;: &lt;span class="pl-s"&gt;|&lt;/span&gt;
&lt;span class="pl-s"&gt;          const { URL, FILE_NAME } = process.env;&lt;/span&gt;
&lt;span class="pl-s"&gt;          // promisify pattern for getting an await version of child_process.exec&lt;/span&gt;
&lt;span class="pl-s"&gt;          const util = require("util");&lt;/span&gt;
&lt;span class="pl-s"&gt;          // Used exec_ here because 'exec' variable name is already used:&lt;/span&gt;
&lt;span class="pl-s"&gt;          const exec_ = util.promisify(require("child_process").exec);&lt;/span&gt;
&lt;span class="pl-s"&gt;          // Use curl to download the file&lt;/span&gt;
&lt;span class="pl-s"&gt;          await exec_(`curl -o ${FILE_NAME} ${URL}`);&lt;/span&gt;
&lt;span class="pl-s"&gt;          // Use 'git diff' to detect if the file has changed since last time&lt;/span&gt;
&lt;span class="pl-s"&gt;          const { stdout } = await exec_(`git diff ${FILE_NAME}`);&lt;/span&gt;
&lt;span class="pl-s"&gt;          if (stdout) {&lt;/span&gt;
&lt;span class="pl-s"&gt;            // There was a diff to that file&lt;/span&gt;
&lt;span class="pl-s"&gt;            const title = `${FILE_NAME} was updated`;&lt;/span&gt;
&lt;span class="pl-s"&gt;            const body =&lt;/span&gt;
&lt;span class="pl-s"&gt;              `${URL} changed:` +&lt;/span&gt;
&lt;span class="pl-s"&gt;              "\n\n```diff\n" +&lt;/span&gt;
&lt;span class="pl-s"&gt;              stdout +&lt;/span&gt;
&lt;span class="pl-s"&gt;              "\n```\n\n" +&lt;/span&gt;
&lt;span class="pl-s"&gt;              "Close this issue once those changes have been integrated here";&lt;/span&gt;
&lt;span class="pl-s"&gt;            const issue = await github.rest.issues.create({&lt;/span&gt;
&lt;span class="pl-s"&gt;              owner: context.repo.owner,&lt;/span&gt;
&lt;span class="pl-s"&gt;              repo: context.repo.repo,&lt;/span&gt;
&lt;span class="pl-s"&gt;              title: title,&lt;/span&gt;
&lt;span class="pl-s"&gt;              body: body,&lt;/span&gt;
&lt;span class="pl-s"&gt;            });&lt;/span&gt;
&lt;span class="pl-s"&gt;            const issueNumber = issue.data.number;&lt;/span&gt;
&lt;span class="pl-s"&gt;            // Now commit and reference that issue number, so the commit shows up&lt;/span&gt;
&lt;span class="pl-s"&gt;            // listed at the bottom of the issue page&lt;/span&gt;
&lt;span class="pl-s"&gt;            const commitMessage = `${FILE_NAME} updated, refs #${issueNumber}`;&lt;/span&gt;
&lt;span class="pl-s"&gt;            // https://til.simonwillison.net/github-actions/commit-if-file-changed&lt;/span&gt;
&lt;span class="pl-s"&gt;            await exec_(`git config user.name "Automated"`);&lt;/span&gt;
&lt;span class="pl-s"&gt;            await exec_(`git config user.email "actions@users.noreply.github.com"`);&lt;/span&gt;
&lt;span class="pl-s"&gt;            await exec_(`git add -A`);&lt;/span&gt;
&lt;span class="pl-s"&gt;            await exec_(`git commit -m "${commitMessage}" || exit 0`);&lt;/span&gt;
&lt;span class="pl-s"&gt;            await exec_(`git pull --rebase`);&lt;/span&gt;
&lt;span class="pl-s"&gt;            await exec_(`git push`);&lt;/span&gt;
&lt;span class="pl-s"&gt;          }&lt;/span&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;In the &lt;a href="https://github.com/simonw/asgi-gzip"&gt;asgi-gzip&lt;/a&gt; repository I keep the fetched &lt;code&gt;gzip.py&lt;/code&gt; file in a &lt;code&gt;tracking/&lt;/code&gt; directory. This directory isn't included in the Python package that gets uploaded to PyPI - it's there only so that my code can track changes to it over time.&lt;/p&gt;
&lt;h4&gt;More interesting applications&lt;/h4&gt;
&lt;p&gt;I built this to solve my "tell me when Starlette update their &lt;code&gt;gzip.py&lt;/code&gt; file" problem, but clearly this pattern has much more interesting uses.&lt;/p&gt;
&lt;p&gt;You could point this at any web page to get a new GitHub issue opened when that page content changes. Subscribe to notifications for that repository and you get a robust , shared mechanism for alerts - plus an issue system where you can post additional comments and close the issue once someone has reviewed the change.&lt;/p&gt;
&lt;p&gt;There's a lot of potential here for solving all kinds of interesting problems. And it doesn't cost anything either: GitHub Actions (somehow) remains completely free for public repositories!&lt;/p&gt;
&lt;h4&gt;Update: October 13th 2022&lt;/h4&gt;
&lt;p&gt;Almost six months after writing about this... it triggered for the first time!&lt;/p&gt;
&lt;p&gt;Here's the issue that the script opened: &lt;a href="https://github.com/simonw/asgi-gzip/issues/4"&gt;#4: tracking/gzip.py was updated&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;I applied the improvement (Marcelo Trylesinski and Kai Klingenberg updated Starlette's code to avoid gzipping if the response already had a Content-Encoding header) and released &lt;a href="https://github.com/simonw/asgi-gzip/releases/tag/0.2"&gt;version 0.2&lt;/a&gt; of the package.&lt;/p&gt;
    
        &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/github"&gt;github&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/gzip"&gt;gzip&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/projects"&gt;projects&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/python"&gt;python&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/datasette"&gt;datasette&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/asgi"&gt;asgi&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/github-actions"&gt;github-actions&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/git-scraping"&gt;git-scraping&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/github-issues"&gt;github-issues&lt;/a&gt;&lt;/p&gt;
    

</summary><category term="github"/><category term="gzip"/><category term="projects"/><category term="python"/><category term="datasette"/><category term="asgi"/><category term="github-actions"/><category term="git-scraping"/><category term="github-issues"/></entry><entry><title>Weeknotes: sqlite-utils updates, Datasette and asgi-csrf, open-sourcing VIAL</title><link href="https://simonwillison.net/2021/Jun/28/weeknotes/#atom-tag" rel="alternate"/><published>2021-06-28T17:23:21+00:00</published><updated>2021-06-28T17:23:21+00:00</updated><id>https://simonwillison.net/2021/Jun/28/weeknotes/#atom-tag</id><summary type="html">
    &lt;p&gt;Some work on &lt;code&gt;sqlite-utils&lt;/code&gt;, &lt;code&gt;asgi-csrf&lt;/code&gt;, a Datasette alpha and we open-sourced VIAL.&lt;/p&gt;
&lt;h4&gt;sqlite-utils&lt;/h4&gt;
&lt;p&gt;Last week's &lt;a href="https://sqlite-utils.datasette.io/en/stable/changelog.html#v3-10"&gt;sqlite-utils 3.10&lt;/a&gt; introduced a huge new feature: the ability to &lt;a href="https://simonwillison.net/2021/Jun/19/sqlite-utils-memory/"&gt;run joins directly against CSV and JSON files from the command-line&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;I've since released &lt;a href="https://sqlite-utils.datasette.io/en/latest/changelog.html#v3-11"&gt;sqlite-utils 3.11&lt;/a&gt; and &lt;a href="https://sqlite-utils.datasette.io/en/latest/changelog.html#v3-12"&gt;3.12&lt;/a&gt;, much smaller releases.&lt;/p&gt;
&lt;p&gt;3.11 added a new &lt;code&gt;--schema&lt;/code&gt; option to the &lt;code&gt;sqlite-utils memory&lt;/code&gt; command which lets you see the schema you'll be querying for the imported data:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;$ curl 'https://api.github.com/users/dogsheep/repos' | \
  sqlite-utils memory - --schema
CREATE TABLE [stdin] (
   [id] INTEGER,
   [node_id] TEXT,
   [name] TEXT,
   [full_name] TEXT,
   [private] INTEGER,
   [owner] TEXT,
   [html_url] TEXT,
   [description] TEXT,
   ...
   [watchers] INTEGER,
   [default_branch] TEXT
);
CREATE VIEW t1 AS select * from [stdin];
CREATE VIEW t AS select * from [stdin];
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;3.12 focused on the Python library side of the package. It adds a new method, &lt;code&gt;db.query(sql)&lt;/code&gt; which returns an iterator over Python dictionaries representing the results of a query.&lt;/p&gt;
&lt;p&gt;This was a pretty obvious missing feature of the library: the rest of &lt;code&gt;sqlite-utils&lt;/code&gt; deals with rows that are represented as dictionaries - you pass a list of Python dictionaries to &lt;code&gt;db[table_name].insert_all(list_of_dicts)&lt;/code&gt; to create a table with the correct schema, for example. But if you wanted to execute &lt;code&gt;SELECT&lt;/code&gt; queries you had to use &lt;code&gt;db.execute()&lt;/code&gt; which would return a standard library cursor object which could then return tuples if you called &lt;code&gt;.fetchall()&lt;/code&gt; on it.&lt;/p&gt;
&lt;p&gt;It was only when I started to work on an interactive Jupyter notebook tutorial for &lt;code&gt;sqlite-utils&lt;/code&gt; that I realized how weird it was not to have an equivalent method for reading data out of the database again.&lt;/p&gt;
&lt;p&gt;Here's what the new method looks like:&lt;/p&gt;
&lt;pre&gt;&lt;span class="pl-s1"&gt;db&lt;/span&gt; &lt;span class="pl-c1"&gt;=&lt;/span&gt; &lt;span class="pl-v"&gt;Database&lt;/span&gt;(&lt;span class="pl-s1"&gt;memory&lt;/span&gt;&lt;span class="pl-c1"&gt;=&lt;/span&gt;&lt;span class="pl-c1"&gt;True&lt;/span&gt;)
&lt;span class="pl-s1"&gt;db&lt;/span&gt;[&lt;span class="pl-s"&gt;"dogs"&lt;/span&gt;].&lt;span class="pl-en"&gt;insert_all&lt;/span&gt;([
    {&lt;span class="pl-s"&gt;"name"&lt;/span&gt;: &lt;span class="pl-s"&gt;"Cleo"&lt;/span&gt;},
    {&lt;span class="pl-s"&gt;"name"&lt;/span&gt;: &lt;span class="pl-s"&gt;"Pancakes"&lt;/span&gt;}
])
&lt;span class="pl-k"&gt;for&lt;/span&gt; &lt;span class="pl-s1"&gt;row&lt;/span&gt; &lt;span class="pl-c1"&gt;in&lt;/span&gt; &lt;span class="pl-s1"&gt;db&lt;/span&gt;.&lt;span class="pl-en"&gt;query&lt;/span&gt;(&lt;span class="pl-s"&gt;"select * from dogs"&lt;/span&gt;):
    &lt;span class="pl-en"&gt;print&lt;/span&gt;(&lt;span class="pl-s1"&gt;row&lt;/span&gt;)
&lt;span class="pl-c"&gt;# Outputs:&lt;/span&gt;
&lt;span class="pl-c"&gt;# {'name': 'Cleo'}&lt;/span&gt;
&lt;span class="pl-c"&gt;# {'name': 'Pancakes'}&lt;/span&gt;&lt;/pre&gt;
&lt;p&gt;&lt;a href="https://sqlite-utils.datasette.io/en/stable/python-api.html#db-query-sql-params"&gt;Full documentation here&lt;/a&gt;.&lt;/p&gt;
&lt;h4&gt;asgi-csrf and a Datasette alpha&lt;/h4&gt;
&lt;p&gt;I'm building a custom Datasette integration for a consulting client at the moment which needs to be able to accept &lt;code&gt;POST&lt;/code&gt; form data as part of an API. Datasette has &lt;a href="https://docs.datasette.io/en/stable/internals.html#csrf-protection"&gt;CSRF protection&lt;/a&gt; but for this particular project I need to opt-out of that protection for this one endpoint.&lt;/p&gt;
&lt;p&gt;I ended up releasing &lt;a href="https://github.com/simonw/asgi-csrf/releases/tag/0.9"&gt;asgi-csrf 0.9&lt;/a&gt; with a new &lt;code&gt;skip_if_scope=&lt;/code&gt; mechanism for dynamically disabling CSRF protection based on the incoming ASGI scope. I then shipped a &lt;a href="https://github.com/simonw/datasette/releases/tag/0.58a1"&gt;Datasette 0.58a1&lt;/a&gt; alpha release with a new &lt;a href="https://docs.datasette.io/en/latest/plugin_hooks.html#plugin-hook-skip-csrf"&gt;skip_csrf(datasette, scope)&lt;/a&gt; plugin hook for plugins to take advantage of that mechanism.&lt;/p&gt;
&lt;p&gt;Expect another alpha release shortly to preview the new &lt;a href="https://github.com/simonw/datasette/issues/1384"&gt;get_metadata plugin hook&lt;/a&gt; contributed by Brandon Roberts. I've decided that alphas are the ideal way to explore new plugin hooks while they are still being developed as it lets projects &lt;code&gt;pip install&lt;/code&gt; the alpha while making it clear that the interface may not yet be fully baked.&lt;/p&gt;
&lt;h4&gt;Open-sourcing VIAL&lt;/h4&gt;
&lt;p&gt;VIAL is the project I've been working on for VaccinateCA/VaccinateTheStates - see &lt;a href="https://simonwillison.net/tags/vaccinateca/"&gt;previous posts&lt;/a&gt;. It's a Django application which powers a crowd-sourced and scraper-driven effort to catalogue all of the places in the USA that you can get the Covid vaccine - 77,000 and counting right now.&lt;/p&gt;
&lt;p&gt;We had always intended to open-source the code and now we have! &lt;a href="https://github.com/CAVaccineInventory/vial"&gt;github.com/CAVaccineInventory/vial&lt;/a&gt; is the newly-made-public repository.&lt;/p&gt;
&lt;p&gt;I still need to produce a bunch of extra documentation about VIAL, likely including a video introduction to the project. But it's great to have it out there!&lt;/p&gt;
&lt;h4&gt;Releases this week&lt;/h4&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;&lt;a href="https://github.com/simonw/sqlite-utils"&gt;sqlite-utils&lt;/a&gt;&lt;/strong&gt;: &lt;a href="https://github.com/simonw/sqlite-utils/releases/tag/3.12"&gt;3.12&lt;/a&gt; - (&lt;a href="https://github.com/simonw/sqlite-utils/releases"&gt;80 releases total&lt;/a&gt;) - 2021-06-25
&lt;br /&gt;Python CLI utility and library for manipulating SQLite databases&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;a href="https://github.com/simonw/datasette"&gt;datasette&lt;/a&gt;&lt;/strong&gt;: &lt;a href="https://github.com/simonw/datasette/releases/tag/0.58a1"&gt;0.58a1&lt;/a&gt; - (&lt;a href="https://github.com/simonw/datasette/releases"&gt;92 releases total&lt;/a&gt;) - 2021-06-24
&lt;br /&gt;An open source multi-tool for exploring and publishing data&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;a href="https://github.com/simonw/asgi-csrf"&gt;asgi-csrf&lt;/a&gt;&lt;/strong&gt;: &lt;a href="https://github.com/simonw/asgi-csrf/releases/tag/0.9"&gt;0.9&lt;/a&gt; - (&lt;a href="https://github.com/simonw/asgi-csrf/releases"&gt;17 releases total&lt;/a&gt;) - 2021-06-23
&lt;br /&gt;ASGI middleware for protecting against CSRF attacks&lt;/li&gt;
&lt;/ul&gt;
&lt;h4&gt;TIL this week&lt;/h4&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href="https://til.simonwillison.net/reddit/scraping-reddit-json"&gt;Scraping Reddit via their JSON API&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
    
        &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/csrf"&gt;csrf&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/datasette"&gt;datasette&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/asgi"&gt;asgi&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/weeknotes"&gt;weeknotes&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/sqlite-utils"&gt;sqlite-utils&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/vaccinate-ca"&gt;vaccinate-ca&lt;/a&gt;&lt;/p&gt;
    

</summary><category term="csrf"/><category term="datasette"/><category term="asgi"/><category term="weeknotes"/><category term="sqlite-utils"/><category term="vaccinate-ca"/></entry><entry><title>Notes on streaming large API responses</title><link href="https://simonwillison.net/2021/Jun/25/streaming-large-api-responses/#atom-tag" rel="alternate"/><published>2021-06-25T16:26:49+00:00</published><updated>2021-06-25T16:26:49+00:00</updated><id>https://simonwillison.net/2021/Jun/25/streaming-large-api-responses/#atom-tag</id><summary type="html">
    &lt;p&gt;I started &lt;a href="https://twitter.com/simonw/status/1405554676993433605"&gt;a Twitter conversation&lt;/a&gt; last week about API endpoints that stream large amounts of data as an alternative to APIs that return 100 results at a time and require clients to paginate through all of the pages in order to retrieve all of the data:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p dir="ltr" lang="en"&gt;Any unexpected downsides to offering streaming HTTP API endpoints that serve up eg 100,000 JSON objects in a go rather than asking users to paginate 100 at a time over 1,000 requests, assuming efficient implementation of that streaming endpoint?&lt;/p&gt;— Simon Willison (@simonw) &lt;a href="https://twitter.com/simonw/status/1405554676993433605?ref_src=twsrc%5Etfw"&gt;June 17, 2021&lt;/a&gt;
&lt;/blockquote&gt;
&lt;p&gt;I got a ton of great replies. I tried to tie them together in a thread attached to the tweet, but I'm also going to synthesize them into some thoughts here.&lt;/p&gt;
&lt;h4&gt;Bulk exporting data&lt;/h4&gt;
&lt;p&gt;The more time I spend with APIs, especially with regard to my &lt;a href="https://datasette.io/"&gt;Datasette&lt;/a&gt; and &lt;a href="https://simonwillison.net/2020/Nov/14/personal-data-warehouses/"&gt;Dogsheep&lt;/a&gt; projects, the more I realize that my favourite APIs are the ones that let you extract &lt;em&gt;all&lt;/em&gt; of your data as quickly and easily as possible.&lt;/p&gt;
&lt;p&gt;There are generally three ways an API might provide this:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Click an "export everything" button, then wait for a while for an email to show up with a link to a downloadable zip file. This isn't really an API, in particular since it's usually hard if not impossible to automate that initial "click", but it's still better than nothing. Google's &lt;a href="https://takeout.google.com/"&gt;Takeout&lt;/a&gt; is one notable implementation of this pattern.&lt;/li&gt;
&lt;li&gt;Provide a JSON API which allows users to paginate through their data. This is a very common pattern, although it can run into difficulties: what happens if new data is added while you are paginating through the original data, for example? Some systems only allow access to the first N pages too, for performance reasons.&lt;/li&gt;
&lt;li&gt;Providing a single HTTP endpoint you can hit that will return ALL of your data - potentially dozens or hundreds of MBs of it - in one go.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;It's that last option that I'm interested in talking about today.&lt;/p&gt;
&lt;h4&gt;Efficiently streaming data&lt;/h4&gt;
&lt;p&gt;It used to be that most web engineers would quickly discount the idea of an API endpoint that streams out an unlimited number of rows. HTTP requests should be served as quickly as possible! Anything more than a couple of seconds spent processing a request is a red flag that something should be reconsidered.&lt;/p&gt;
&lt;p&gt;Almost everything in the web stack is optimized for quickly serving small requests. But over the past decade the tide has turned somewhat: Node.js made async web servers commonplace, WebSockets taught us to handle long-running connections and in the Python world asyncio and &lt;a href="https://asgi.readthedocs.io/"&gt;ASGI&lt;/a&gt; provided a firm foundation for handling long-running requests using smaller amounts of RAM and CPU.&lt;/p&gt;
&lt;p&gt;I've been experimenting in this area for a few years now.&lt;/p&gt;
&lt;p&gt;Datasette has the ability to &lt;a href="https://github.com/simonw/datasette/blob/0.57.1/datasette/views/base.py#L264-L428"&gt;use ASGI trickery&lt;/a&gt; to &lt;a href="https://docs.datasette.io/en/stable/csv_export.html#streaming-all-records"&gt;stream all rows from a table&lt;/a&gt; (or filtered table) as CSV, potentially returning hundreds of MBs of data.&lt;/p&gt;
&lt;p&gt;&lt;a href="https://django-sql-dashboard.datasette.io/"&gt;Django SQL Dashboard&lt;/a&gt; can export the full results of a SQL query as CSV or TSV, this time using Django's &lt;a href="https://docs.djangoproject.com/en/3.2/ref/request-response/#django.http.StreamingHttpResponse"&gt;StreamingHttpResponse&lt;/a&gt; (which does tie up a full worker process, but that's OK if you restrict it to a controlled number of authenticated users).&lt;/p&gt;
&lt;p&gt;&lt;a href="https://simonwillison.net/tags/vaccinateca/"&gt;VIAL&lt;/a&gt; implements streaming responses to offer an &lt;a href="https://github.com/CAVaccineInventory/vial/blob/cdaaab053a9cf1cef40104a2cdf480b7932d58f7/vaccinate/core/admin_actions.py"&gt;"export from the admin" feature&lt;/a&gt;. It also has an API-key-protected search API which can stream out all matching rows &lt;a href="https://github.com/CAVaccineInventory/vial/blob/cdaaab053a9cf1cef40104a2cdf480b7932d58f7/vaccinate/api/serialize.py#L38"&gt;in JSON or GeoJSON&lt;/a&gt;.&lt;/p&gt;
&lt;h4&gt;Implementation notes&lt;/h4&gt;
&lt;p&gt;The key thing to watch out for when implementing this pattern is memory usage: if your server buffers 100MB+ of data any time it needs to serve an export request you're going to run into trouble.&lt;/p&gt;
&lt;p&gt;Some export formats are friendlier for streaming than others. CSV and TSV are pretty easy to stream, as is newline-delimited JSON.&lt;/p&gt;
&lt;p&gt;Regular JSON requires a bit more thought: you can output a &lt;code&gt;[&lt;/code&gt; character, then output each row in a stream with a comma suffix, then skip the comma for the last row and output a &lt;code&gt;]&lt;/code&gt;. Doing that requires peeking ahead (looping two at a time) to verify that you haven't yet reached the end.&lt;/p&gt;
&lt;p&gt;Or... Martin De Wulf &lt;a href="https://twitter.com/madewulf/status/1405559088994467844"&gt;pointed out&lt;/a&gt; that you can output the first row, then output every other row with a preceeding comma - which avoids the whole "iterate two at a time" problem entirely.&lt;/p&gt;
&lt;p&gt;The next challenge is efficiently looping through every database result without first pulling them all into memory.&lt;/p&gt;
&lt;p&gt;PostgreSQL (and the &lt;code&gt;psycopg2&lt;/code&gt; Python module) offers &lt;a href="https://www.psycopg.org/docs/usage.html#server-side-cursors"&gt;server-side cursors&lt;/a&gt;, which means you can stream results through your code without loading them all at once. I use these &lt;a href="https://github.com/simonw/django-sql-dashboard/blob/dd1bb18e45b40ce8f3d0553a72b7ec3cdc329e69/django_sql_dashboard/views.py#L397-L399"&gt;in Django SQL Dashboard&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;Server-side cursors make me nervous though, because they seem like they likely tie up resources in the database itself. So the other technique I would consider here is &lt;a href="https://use-the-index-luke.com/no-offset"&gt;keyset pagination&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;Keyset pagination works against any data that is ordered by a unique column - it works especially well against a primary key (or other indexed column). Each page of data is retrieved using a query something like this:&lt;/p&gt;
&lt;div class="highlight highlight-source-sql"&gt;&lt;pre&gt;&lt;span class="pl-k"&gt;select&lt;/span&gt; &lt;span class="pl-k"&gt;*&lt;/span&gt; &lt;span class="pl-k"&gt;from&lt;/span&gt; items &lt;span class="pl-k"&gt;order by&lt;/span&gt; id &lt;span class="pl-k"&gt;limit&lt;/span&gt; &lt;span class="pl-c1"&gt;21&lt;/span&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;Note the &lt;code&gt;limit 21&lt;/code&gt; - if we are retrieving pages of 20 items we ask for 21, since then we can use the last returned item to tell if there is a next page or not.&lt;/p&gt;
&lt;p&gt;Then for subsequent pages take the 20th &lt;code&gt;id&lt;/code&gt; value and ask for things greater than that:&lt;/p&gt;
&lt;div class="highlight highlight-source-sql"&gt;&lt;pre&gt;&lt;span class="pl-k"&gt;select&lt;/span&gt; &lt;span class="pl-k"&gt;*&lt;/span&gt; &lt;span class="pl-k"&gt;from&lt;/span&gt; items &lt;span class="pl-k"&gt;where&lt;/span&gt; id &lt;span class="pl-k"&gt;&amp;gt;&lt;/span&gt; &lt;span class="pl-c1"&gt;20&lt;/span&gt; &lt;span class="pl-k"&gt;limit&lt;/span&gt; &lt;span class="pl-c1"&gt;21&lt;/span&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;Each of these queries is fast to respond (since it's against an ordered index) and uses a predictable, fixed amount of memory. Using keyset pagination we can loop through an abitrarily large table of data, streaming each page out one at a time, without exhausting any resources.&lt;/p&gt;
&lt;p&gt;And since each query is small and fast, we don't need to worry about huge queries tying up database resources either.&lt;/p&gt;
&lt;h4&gt;What can go wrong?&lt;/h4&gt;
&lt;p&gt;I really like these patterns. They haven't bitten me yet, though I've not deployed them for anything truly huge scale. So I &lt;a href="https://twitter.com/simonw/status/1405554676993433605"&gt;asked Twitter&lt;/a&gt; what kind of problems I should look for.&lt;/p&gt;
&lt;p&gt;Based on the Twitter conversation, here are some of the challenges that this approach faces.&lt;/p&gt;
&lt;h4&gt;Challenge: restarting servers&lt;/h4&gt;
&lt;blockquote&gt;
&lt;p dir="ltr" lang="en"&gt;If the stream takes a significantly long time to finish then rolling out updates becomes a problem. You don't want to interrupt a download but also don't want to wait forever for it to finish to spin down the server.&lt;/p&gt;— Adam Lowry (@robotadam) &lt;a href="https://twitter.com/robotadam/status/1405556544897384459?ref_src=twsrc%5Etfw"&gt;June 17, 2021&lt;/a&gt;
&lt;/blockquote&gt;
&lt;p&gt;This came up a few times, and is something I hadn't considered. If your deployment process involves restarting your servers (and it's hard to imagine one that doesn't) you need to take long-running connections into account when you do that. If there's a user half way through a 500MB stream you can either truncate their connection or wait for them to finish.&lt;/p&gt;
&lt;h4 id="challenge-errors"&gt;Challenge: how to return errors&lt;/h4&gt;
&lt;p&gt;If you're streaming a response, you start with an HTTP 200 code... but then what happens if an error occurs half-way through, potentially while paginating through the database?&lt;/p&gt;
&lt;p&gt;You've already started sending the request, so you can't change the status code to a 500. Instead, you need to write some kind of error to the stream that's being produced.&lt;/p&gt;
&lt;p&gt;If you're serving up a huge JSON document, you can at least make that JSON become invalid, which should indicate to your client that something went wrong.&lt;/p&gt;
&lt;p&gt;Formats like CSV are harder. How do you let your user know that their CSV data is incomplete?&lt;/p&gt;
&lt;p&gt;And what if someone's connection drops - are they definitely going to notice that they are missing something, or will they assume that the truncated file is all of the data?&lt;/p&gt;
&lt;h4&gt;Challenge: resumable downloads&lt;/h4&gt;
&lt;p&gt;If a user is paginating through your API, they get resumability for free: if something goes wrong they can start again at the last page that they fetched.&lt;/p&gt;
&lt;p&gt;Resuming a single stream is a lot harder.&lt;/p&gt;
&lt;p&gt;The &lt;a href="https://developer.mozilla.org/en-US/docs/Web/HTTP/Range_requests"&gt;HTTP range mechanism&lt;/a&gt; can be used to provide resumable downloads against large files, but it only works if you generate the entire file in advance.&lt;/p&gt;
&lt;p&gt;There is a way to design APIs to support this, provided the data in the stream is in a predictable order (which it has to be if you're using keyset pagination, described above).&lt;/p&gt;
&lt;p&gt;Have the endpoint that triggers the download take an optional &lt;code&gt;?since=&lt;/code&gt; parameter, like this:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;GET /stream-everything?since=b24ou34
[
    {"id": "m442ecc", "name": "..."},
    {"id": "c663qo2", "name": "..."},
    {"id": "z434hh3", "name": "..."},
]
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Here the &lt;code&gt;b24ou34&lt;/code&gt; is an identifier - it can be a deliberately opaque token, but it needs to be served up as part of the response.&lt;/p&gt;
&lt;p&gt;If the user is disconnected for any reason, they can start back where they left off by passing in the last ID that they successfully retrieved:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;GET /stream-everything?since=z434hh3
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;This still requires some level of intelligence from the client application, but it's a reasonably simple pattern both to implement on the server and as a client.&lt;/p&gt;
&lt;h4&gt;Easiest solution: generate and return from cloud storage&lt;/h4&gt;
&lt;p&gt;It seems the most robust way to implement this kind of API is the least technically exciting: spin off a background task that generates the large response and pushes it to cloud storage (S3 or GCS), then redirect the user to a signed URL to download the resulting file.&lt;/p&gt;
&lt;p&gt;This is easy to scale, gives users complete files with content-length headers that they know they can download (and even resume-downloading, since range headers are supported by S3 and GCS). It also avoids any issues with server restarts caused by long connections.&lt;/p&gt;
&lt;p&gt;This is how Mixpanel handle their export feature, and it's &lt;a href="https://seancoates.com/blogs/lambda-payload-size-workaround"&gt;the solution Sean Coates came to&lt;/a&gt; when trying to find a workaround for the AWS Lambda/API Gate response size limit.&lt;/p&gt;
&lt;p&gt;If your goal is to provide your users a robust, reliable bulk-export mechanism for their data, export to cloud storage is probably the way to go.&lt;/p&gt;
&lt;p&gt;But streaming dynamic responses are a really neat trick, and I plan to keep exploring them!&lt;/p&gt;
    
        &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/apis"&gt;apis&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/scaling"&gt;scaling&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/streaming"&gt;streaming&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/asgi"&gt;asgi&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/http-range-requests"&gt;http-range-requests&lt;/a&gt;&lt;/p&gt;
    

</summary><category term="apis"/><category term="scaling"/><category term="streaming"/><category term="asgi"/><category term="http-range-requests"/></entry><entry><title>Weeknotes: datasette-ics, datasette-upload-csvs, datasette-configure-fts, asgi-csrf</title><link href="https://simonwillison.net/2020/Mar/4/weeknotes-plethora/#atom-tag" rel="alternate"/><published>2020-03-04T02:27:47+00:00</published><updated>2020-03-04T02:27:47+00:00</updated><id>https://simonwillison.net/2020/Mar/4/weeknotes-plethora/#atom-tag</id><summary type="html">
    &lt;p&gt;I've been preparing for the &lt;a href="https://www.ire.org/events-and-training/conferences/nicar-2020"&gt;NICAR 2020&lt;/a&gt; Data Journalism conference this week which has lead me into a flurry of activity across a plethora of different projects and plugins.&lt;/p&gt;

&lt;h3 id="weeknotes-24-datasette-ics"&gt;datasette-ics&lt;/h3&gt;

&lt;p&gt;NICAR publish &lt;a href="https://github.com/ireapps/nicar-2020-schedule"&gt;their schedule&lt;/a&gt; as a CSV file. I couldn't resist loading it into &lt;a href="https://nicar-2020.glitch.me/"&gt;a Datasette on Glitch&lt;/a&gt;, which inspired me to put together a plugin I've been wanting for ages: &lt;a href="https://github.com/simonw/datasette-ics"&gt;datasette-ics&lt;/a&gt;, a &lt;a href="https://datasette.readthedocs.io/en/stable/plugins.html#register-output-renderer-datasette"&gt;register_output_renderer()&lt;/a&gt; plugin that can produce a subscribable iCalendar file from an arbitrary SQL query.&lt;/p&gt;

&lt;p&gt;It's based on &lt;a href="https://github.com/simonw/datasette-atom"&gt;datasette-atom&lt;/a&gt; and works in a similar way: you construct a query that outputs a required set of columns (&lt;code&gt;event_name&lt;/code&gt; and &lt;code&gt;event_dtstart&lt;/code&gt; as a minimum), then add the &lt;code&gt;.ics&lt;/code&gt; extension to get back an iCalendar file.&lt;/p&gt;

&lt;p&gt;You can optionally also include &lt;code&gt;event_dtend&lt;/code&gt;, &lt;code&gt;event_duration&lt;/code&gt;, &lt;code&gt;event_description&lt;/code&gt;, &lt;code&gt;event_uid&lt;/code&gt; and most importantly &lt;code&gt;event_tz&lt;/code&gt;, which can contain a timezone string. Figuring out how to handle timezones was &lt;a href="https://github.com/simonw/datasette-ics/issues/1"&gt;the fiddliest part of the project&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;If you're going to NICAR, subscribe to &lt;a href="https://nicar-2020.glitch.me/data/calendar.ics"&gt;https://nicar-2020.glitch.me/data/calendar.ics&lt;/a&gt; in a calendar application to get the full 261 item schedule.&lt;/p&gt;

&lt;p&gt;If you just want to see what the iCalendar feed looks like, add &lt;code&gt;?_plain=1&lt;/code&gt; to preview it with a &lt;code&gt;text/plain&lt;/code&gt; content type: &lt;a href="https://nicar-2020.glitch.me/data/calendar.ics?_plain=1"&gt;https://nicar-2020.glitch.me/data/calendar.ics?_plain=1&lt;/a&gt; - and here's &lt;a href="https://nicar-2020.glitch.me/data/calendar"&gt;the SQL query&lt;/a&gt; that powers it.&lt;/p&gt;

&lt;h3 id="weeknotes-24-datasette-upload-csvs"&gt;datasette-upload-csvs&lt;/h3&gt;

&lt;p&gt;My work on &lt;a href="https://simonwillison.net/tags/datasettecloud/"&gt;Datasette Cloud&lt;/a&gt; is inspiring all kinds of interesting work on plugins. I released &lt;a href="https://github.com/simonw/datasette-upload-csvs"&gt;datasette-upload-csvs&lt;/a&gt; a while ago, but now that Datasette has &lt;a href="https://simonwillison.net/2020/Feb/26/weeknotes-datasette-writes/"&gt;official write support&lt;/a&gt; I've been upgrading the plugin to hopefully achieve its full potential.&lt;/p&gt;

&lt;p&gt;In particular, I've been improving its usability. CSV files can be big - and if you're uploading 100MB of CSV it's not particularly reassuring if your browser just sits for a few minutes spinning on the status bar.&lt;/p&gt;

&lt;p&gt;So I added two progress bars to the plugins. The first is a client-side progress bar that shows you the progress of the initial file upload. I used the &lt;code&gt;XMLHttpRequest&lt;/code&gt; pattern (and the drag-and-drop recipe) from Joseph Zimmerman's useful article &lt;a href="https://www.smashingmagazine.com/2018/01/drag-drop-file-uploader-vanilla-js/"&gt;How To Make A Drag-and-Drop File Uploader With Vanilla JavaScript&lt;/a&gt; - &lt;code&gt;fetch()&lt;/code&gt; doesn't reliably report upload progres just yet.&lt;/p&gt;

&lt;p&gt;I'm using &lt;a href="https://www.starlette.io/"&gt;Starlette&lt;/a&gt; and &lt;code&gt;asyncio&lt;/code&gt; so uploading large files doesn't tie up server resources in the same way that it would if I was using processes and threads.&lt;/p&gt;

&lt;p&gt;The second progress bar relates to server-side processing of the file: churning through 100,000 rows of CSV data and inserting them into SQLite can take a while, and I wanted users to be able to see what was going on.&lt;/p&gt;

&lt;p&gt;Here's an animation screenshot of how the interface looks now:&lt;/p&gt;

&lt;p&gt;&lt;img src="https://static.simonwillison.net/static/2020/upload-csvs.gif" style="max-width: 100%!" alt="Uploading a CSV" /&gt;&lt;/p&gt;

&lt;p&gt;Implementing this was trickier. In the end I took advantage of the new dedicaed write thread made available by &lt;code&gt;datasette.execute_write_fn()&lt;/code&gt; - since that thread has exclusive access to write to the database, I create a SQLite table called &lt;code&gt;_csv_progress_&lt;/code&gt; and write a new record to it every 10 rows. I use the number of bytes in the CSV file as the total and track how far through that file Python's CSV parser has got using &lt;code&gt;file.tell()&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;It seems to work really well. The full &lt;a href="https://github.com/simonw/datasette-upload-csvs/blob/013d540797b2600bb34cfb8a923386d83f5ff25d/datasette_upload_csvs/app.py#L30-L123"&gt;server-side code is here&lt;/a&gt; - the progress bar itself then &lt;a href="https://github.com/simonw/datasette-upload-csvs/blob/013d540797b2600bb34cfb8a923386d83f5ff25d/datasette_upload_csvs/templates/upload_csv.html#L122-L145"&gt;polls Datasette's JSON API&lt;/a&gt; for the record in the &lt;code&gt;_csv_progress_&lt;/code&gt; table.&lt;/p&gt;

&lt;h3 id="weeknotes-24-datasette-configure-fts"&gt;datasette-configure-fts&lt;/h3&gt;

&lt;p&gt;SQLite ships with &lt;a href="https://www.sqlite.org/fts5.html"&gt;a decent implementation&lt;/a&gt; of full-text search. Datasette knows how to tell if a table has been configured for full-text search and adds a search box to the table page, &lt;a href="https://datasette.readthedocs.io/en/stable/full_text_search.html"&gt;documented here&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://github.com/simonw/datasette-configure-fts"&gt;datasette-configure-fts&lt;/a&gt; is a new plugin that provides an interface for configuring search against existing SQLite tables. Under the hood it uses the &lt;a href="https://sqlite-utils.readthedocs.io/en/stable/python-api.html#enabling-full-text-search"&gt;sqlite-utils full-text search methods&lt;/a&gt; to configure the table and set up triggers to keep the index updated as data in the table changes.&lt;/p&gt;

&lt;p&gt;It's pretty simple, but it means that users of Datasette Cloud can upload a potentially enormous CSV file and then click to set specific columns as searchable. It's a fun example of the kind of things that can be built with Datasette`s new write capabilities.&lt;/p&gt;

&lt;h3 id="weeknotes-24-asgi-csrf"&gt;asgi-csrf&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://simonwillison.net/tags/csrf/"&gt;CSRF&lt;/a&gt; is one of my favourite web application security vulnerabilties - I first wrote about it on this blog &lt;a href="https://simonwillison.net/2005/May/6/bad/"&gt;back in 2005&lt;/a&gt;!&lt;/p&gt;

&lt;p&gt;I was surprised to see that the Starlette/ASGI ecosystem doesn't yet have much in the way of CSRF prevention. The best option I could find to use &lt;a href="https://wtforms.readthedocs.io/en/stable/csrf.html"&gt;the WTForms library&lt;/a&gt; with Starlette.&lt;/p&gt;

&lt;p&gt;I don't need a full forms library for my purposes (at least not yet) but I needed CSRF protection for &lt;code&gt;datasete-configure-fts&lt;/code&gt;, so I've started working on a small ASGI middleware library called &lt;a href="https://github.com/simonw/asgi-csrf"&gt;asgi-csrf&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;It's modelled on a subset of Django's &lt;a href="https://github.com/django/django/blob/3.0.3/django/middleware/csrf.py"&gt;robust CSRF prevention&lt;/a&gt;. The README warns people NOT to trust it yet - there are still &lt;a href="https://owasp.org/www-project-cheat-sheets/cheatsheets/Cross-Site_Request_Forgery_Prevention_Cheat_Sheet#double-submit-cookie"&gt;some OWASP recommendations&lt;/a&gt; that it needs to apply (&lt;a href="https://github.com/simonw/asgi-csrf/issues/2"&gt;issue here&lt;/a&gt;) and I'm not yet ready to declare it robust and secure. It's a start though, and feels like exactly the kind of problem that ASGI middleware is meant to address.&lt;/p&gt;
    
        &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/csrf"&gt;csrf&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/data-journalism"&gt;data-journalism&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/icalendar"&gt;icalendar&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/plugins"&gt;plugins&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/projects"&gt;projects&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/search"&gt;search&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/security"&gt;security&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/datasette"&gt;datasette&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/asgi"&gt;asgi&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/weeknotes"&gt;weeknotes&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/datasette-cloud"&gt;datasette-cloud&lt;/a&gt;&lt;/p&gt;
    

</summary><category term="csrf"/><category term="data-journalism"/><category term="icalendar"/><category term="plugins"/><category term="projects"/><category term="search"/><category term="security"/><category term="datasette"/><category term="asgi"/><category term="weeknotes"/><category term="datasette-cloud"/></entry><entry><title>Async Support - HTTPX</title><link href="https://simonwillison.net/2020/Jan/10/httpx/#atom-tag" rel="alternate"/><published>2020-01-10T04:49:59+00:00</published><updated>2020-01-10T04:49:59+00:00</updated><id>https://simonwillison.net/2020/Jan/10/httpx/#atom-tag</id><summary type="html">
    
&lt;p&gt;&lt;strong&gt;&lt;a href="https://www.python-httpx.org/async/"&gt;Async Support - HTTPX&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;
HTTPX is the new async-friendly HTTP library for Python spearheaded by Tom Christie. It works in both async and non-async mode with an API very similar to requests. The async support is particularly interesting - it's a really clean API, and now that Jupyter supports top-level await you can run &lt;code&gt;(await httpx.AsyncClient().get(url)).text&lt;/code&gt; directly in a cell and get back the response. Most excitingly the library lets you pass an ASGI app directly to the client and then perform requests against it - ideal for unit tests.

    &lt;p&gt;&lt;small&gt;&lt;/small&gt;Via &lt;a href="https://twitter.com/_tomchristie/status/1215240517962870784"&gt;@_tomchristie&lt;/a&gt;&lt;/small&gt;&lt;/p&gt;


    &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/async"&gt;async&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/http"&gt;http&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/python"&gt;python&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/asgi"&gt;asgi&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/tom-christie"&gt;tom-christie&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/httpx"&gt;httpx&lt;/a&gt;&lt;/p&gt;



</summary><category term="async"/><category term="http"/><category term="python"/><category term="asgi"/><category term="tom-christie"/><category term="httpx"/></entry><entry><title>Logging to SQLite using ASGI middleware</title><link href="https://simonwillison.net/2019/Dec/16/logging-sqlite-asgi-middleware/#atom-tag" rel="alternate"/><published>2019-12-16T22:30:46+00:00</published><updated>2019-12-16T22:30:46+00:00</updated><id>https://simonwillison.net/2019/Dec/16/logging-sqlite-asgi-middleware/#atom-tag</id><summary type="html">
    &lt;p&gt;I had some fun playing around with &lt;a href="https://asgi.readthedocs.io/en/latest/specs/main.html#middleware"&gt;ASGI middleware&lt;/a&gt; and logging during our flight back to England for the holidays.&lt;/p&gt;
&lt;h3 id="asgi-log-to-sqlite"&gt;asgi-log-to-sqlite&lt;/h3&gt;
&lt;p&gt;I decided to experiment with SQLite as a logging mechanism. I wouldn’t use this on a high traffic site, but most of my Datasette related projects are small enough that logging HTTP traffic directly to a SQLite database feels like it should work reasonable well.&lt;/p&gt;
&lt;p&gt;Once your logs are in a SQLite database, you can use &lt;a href="https://datasette.readthedocs.io/"&gt;Datasette&lt;/a&gt; to analyze them. I think this could be a lot of fun.&lt;/p&gt;
&lt;p&gt;&lt;a href="https://github.com/simonw/asgi-log-to-sqlite"&gt;asgi-log-to-sqlite&lt;/a&gt; is my first exploration of this idea. It’s a piece of ASGI middleware which wraps an ASGI application and then logs relevant information from the request and response to an attached SQLite database.&lt;/p&gt;
&lt;p&gt;You use it like this:&lt;/p&gt;
&lt;pre class=" language-python"&gt;&lt;code class="prism  language-python"&gt;&lt;span class="token keyword"&gt;from&lt;/span&gt; asgi_log_to_sqlite &lt;span class="token keyword"&gt;import&lt;/span&gt; AsgiLogToSqlite
&lt;span class="token keyword"&gt;from&lt;/span&gt; my_asgi_app &lt;span class="token keyword"&gt;import&lt;/span&gt; app

app &lt;span class="token operator"&gt;=&lt;/span&gt; AsgiLogToSqlite&lt;span class="token punctuation"&gt;(&lt;/span&gt;app&lt;span class="token punctuation"&gt;,&lt;/span&gt; &lt;span class="token string"&gt;"/tmp/log.db"&lt;/span&gt;&lt;span class="token punctuation"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Here’s a demo Datasette instance showing logs from my testing: &lt;a href="https://asgi-log-demo-j7hipcg4aq-uc.a.run.app"&gt;asgi-log-demo-j7hipcg4aq-uc.a.run.app&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;As always with Datasette, the data is at its most interesting once you &lt;a href="https://asgi-log-demo-j7hipcg4aq-uc.a.run.app/asgi-log-demo/requests?_sort_desc=rowid&amp;amp;_facet=path&amp;amp;_facet=user_agent&amp;amp;_facet=content_type#facet-content_type"&gt;apply some facets&lt;/a&gt;.&lt;/p&gt;
&lt;h3 id="intercepting-requests-to-and-from-the-wrapped-asgi-app"&gt;Intercepting requests to and from the wrapped ASGI app&lt;/h3&gt;
&lt;p&gt;There are a couple of interesting parts of the implementation. The first is how the information is gathered from the request and response.&lt;/p&gt;
&lt;p&gt;This is a classic pattern for ASGI middleware. &lt;a href="https://asgi.readthedocs.io/en/latest/specs/main.html#applications"&gt;The ASGI protocol&lt;/a&gt; has three key components; a &lt;code&gt;scope&lt;/code&gt; dictionary describing the incoming request, and two async functions called &lt;code&gt;receive&lt;/code&gt; and &lt;code&gt;send&lt;/code&gt; which are used to retrieve and send data to the connected client (usually a browser).&lt;/p&gt;
&lt;p&gt;Most middleware works by wrapping those functions with custom replacements. That’s what I’m doing here:&lt;/p&gt;
&lt;pre class=" language-python"&gt;&lt;code class="prism  language-python"&gt;&lt;span class="token keyword"&gt;class&lt;/span&gt; &lt;span class="token class-name"&gt;AsgiLogToSqlite&lt;/span&gt;&lt;span class="token punctuation"&gt;:&lt;/span&gt;
    &lt;span class="token keyword"&gt;def&lt;/span&gt; &lt;span class="token function"&gt;__init__&lt;/span&gt;&lt;span class="token punctuation"&gt;(&lt;/span&gt;self&lt;span class="token punctuation"&gt;,&lt;/span&gt; app&lt;span class="token punctuation"&gt;,&lt;/span&gt; &lt;span class="token builtin"&gt;file&lt;/span&gt;&lt;span class="token punctuation"&gt;)&lt;/span&gt;&lt;span class="token punctuation"&gt;:&lt;/span&gt;
        self&lt;span class="token punctuation"&gt;.&lt;/span&gt;app &lt;span class="token operator"&gt;=&lt;/span&gt; app
        self&lt;span class="token punctuation"&gt;.&lt;/span&gt;db &lt;span class="token operator"&gt;=&lt;/span&gt; sqlite_utils&lt;span class="token punctuation"&gt;.&lt;/span&gt;Database&lt;span class="token punctuation"&gt;(&lt;/span&gt;&lt;span class="token builtin"&gt;file&lt;/span&gt;&lt;span class="token punctuation"&gt;)&lt;/span&gt;
    &lt;span class="token comment"&gt;# ...&lt;/span&gt;
    &lt;span class="token keyword"&gt;async&lt;/span&gt; &lt;span class="token keyword"&gt;def&lt;/span&gt; &lt;span class="token function"&gt;__call__&lt;/span&gt;&lt;span class="token punctuation"&gt;(&lt;/span&gt;self&lt;span class="token punctuation"&gt;,&lt;/span&gt; scope&lt;span class="token punctuation"&gt;,&lt;/span&gt; receive&lt;span class="token punctuation"&gt;,&lt;/span&gt; send&lt;span class="token punctuation"&gt;)&lt;/span&gt;&lt;span class="token punctuation"&gt;:&lt;/span&gt;
        response_headers &lt;span class="token operator"&gt;=&lt;/span&gt; &lt;span class="token punctuation"&gt;[&lt;/span&gt;&lt;span class="token punctuation"&gt;]&lt;/span&gt;
        body_size &lt;span class="token operator"&gt;=&lt;/span&gt; &lt;span class="token number"&gt;0&lt;/span&gt;
        http_status &lt;span class="token operator"&gt;=&lt;/span&gt; &lt;span class="token boolean"&gt;None&lt;/span&gt;

        &lt;span class="token keyword"&gt;async&lt;/span&gt; &lt;span class="token keyword"&gt;def&lt;/span&gt; &lt;span class="token function"&gt;wrapped_send&lt;/span&gt;&lt;span class="token punctuation"&gt;(&lt;/span&gt;message&lt;span class="token punctuation"&gt;)&lt;/span&gt;&lt;span class="token punctuation"&gt;:&lt;/span&gt;
            &lt;span class="token keyword"&gt;nonlocal&lt;/span&gt; body_size&lt;span class="token punctuation"&gt;,&lt;/span&gt; response_headers&lt;span class="token punctuation"&gt;,&lt;/span&gt; http_status
            &lt;span class="token keyword"&gt;if&lt;/span&gt; message&lt;span class="token punctuation"&gt;[&lt;/span&gt;&lt;span class="token string"&gt;"type"&lt;/span&gt;&lt;span class="token punctuation"&gt;]&lt;/span&gt; &lt;span class="token operator"&gt;==&lt;/span&gt; &lt;span class="token string"&gt;"http.response.start"&lt;/span&gt;&lt;span class="token punctuation"&gt;:&lt;/span&gt;
                response_headers &lt;span class="token operator"&gt;=&lt;/span&gt; message&lt;span class="token punctuation"&gt;[&lt;/span&gt;&lt;span class="token string"&gt;"headers"&lt;/span&gt;&lt;span class="token punctuation"&gt;]&lt;/span&gt;
                http_status &lt;span class="token operator"&gt;=&lt;/span&gt; message&lt;span class="token punctuation"&gt;[&lt;/span&gt;&lt;span class="token string"&gt;"status"&lt;/span&gt;&lt;span class="token punctuation"&gt;]&lt;/span&gt;

            &lt;span class="token keyword"&gt;if&lt;/span&gt; message&lt;span class="token punctuation"&gt;[&lt;/span&gt;&lt;span class="token string"&gt;"type"&lt;/span&gt;&lt;span class="token punctuation"&gt;]&lt;/span&gt; &lt;span class="token operator"&gt;==&lt;/span&gt; &lt;span class="token string"&gt;"http.response.body"&lt;/span&gt;&lt;span class="token punctuation"&gt;:&lt;/span&gt;
                body_size &lt;span class="token operator"&gt;+=&lt;/span&gt; &lt;span class="token builtin"&gt;len&lt;/span&gt;&lt;span class="token punctuation"&gt;(&lt;/span&gt;message&lt;span class="token punctuation"&gt;[&lt;/span&gt;&lt;span class="token string"&gt;"body"&lt;/span&gt;&lt;span class="token punctuation"&gt;]&lt;/span&gt;&lt;span class="token punctuation"&gt;)&lt;/span&gt;

            &lt;span class="token keyword"&gt;await&lt;/span&gt; send&lt;span class="token punctuation"&gt;(&lt;/span&gt;message&lt;span class="token punctuation"&gt;)&lt;/span&gt;

        start &lt;span class="token operator"&gt;=&lt;/span&gt; time&lt;span class="token punctuation"&gt;.&lt;/span&gt;time&lt;span class="token punctuation"&gt;(&lt;/span&gt;&lt;span class="token punctuation"&gt;)&lt;/span&gt;
        &lt;span class="token keyword"&gt;await&lt;/span&gt; self&lt;span class="token punctuation"&gt;.&lt;/span&gt;app&lt;span class="token punctuation"&gt;(&lt;/span&gt;scope&lt;span class="token punctuation"&gt;,&lt;/span&gt; receive&lt;span class="token punctuation"&gt;,&lt;/span&gt; wrapped_send&lt;span class="token punctuation"&gt;)&lt;/span&gt;
        end &lt;span class="token operator"&gt;=&lt;/span&gt; time&lt;span class="token punctuation"&gt;.&lt;/span&gt;time&lt;span class="token punctuation"&gt;(&lt;/span&gt;&lt;span class="token punctuation"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;My &lt;code&gt;wrapped_send()&lt;/code&gt; function replaces the original &lt;code&gt;send()&lt;/code&gt; function with one that pulls out some of the data I want to log from the messages that are being sent to the client.&lt;/p&gt;
&lt;p&gt;I record a start time, then &lt;code&gt;await&lt;/code&gt; the original ASGI application, then record an end time when it finishes.&lt;/p&gt;
&lt;h3 id="logging-to-sqlite-using-sqlite-utils"&gt;Logging to SQLite using sqlite-utils&lt;/h3&gt;
&lt;p&gt;I’m using my &lt;a href="https://sqlite-utils.readthedocs.io/en/stable/python-api.html"&gt;sqlite-utils library&lt;/a&gt; to implement the logging. My first version looked like this:&lt;/p&gt;
&lt;pre class=" language-python"&gt;&lt;code class="prism  language-python"&gt;db&lt;span class="token punctuation"&gt;[&lt;/span&gt;&lt;span class="token string"&gt;"requests"&lt;/span&gt;&lt;span class="token punctuation"&gt;]&lt;/span&gt;&lt;span class="token punctuation"&gt;.&lt;/span&gt;insert&lt;span class="token punctuation"&gt;(&lt;/span&gt;&lt;span class="token punctuation"&gt;{&lt;/span&gt;
    &lt;span class="token string"&gt;"path"&lt;/span&gt;&lt;span class="token punctuation"&gt;:&lt;/span&gt; scope&lt;span class="token punctuation"&gt;.&lt;/span&gt;get&lt;span class="token punctuation"&gt;(&lt;/span&gt;&lt;span class="token string"&gt;"path"&lt;/span&gt;&lt;span class="token punctuation"&gt;)&lt;/span&gt;&lt;span class="token punctuation"&gt;,&lt;/span&gt;
    &lt;span class="token string"&gt;"response_headers"&lt;/span&gt;&lt;span class="token punctuation"&gt;:&lt;/span&gt; &lt;span class="token builtin"&gt;str&lt;/span&gt;&lt;span class="token punctuation"&gt;(&lt;/span&gt;response_headers&lt;span class="token punctuation"&gt;)&lt;/span&gt;&lt;span class="token punctuation"&gt;,&lt;/span&gt;
    &lt;span class="token string"&gt;"body_size"&lt;/span&gt;&lt;span class="token punctuation"&gt;:&lt;/span&gt; body_size&lt;span class="token punctuation"&gt;,&lt;/span&gt;
    &lt;span class="token string"&gt;"http_status"&lt;/span&gt;&lt;span class="token punctuation"&gt;:&lt;/span&gt; http_status&lt;span class="token punctuation"&gt;,&lt;/span&gt;
    &lt;span class="token string"&gt;"scope"&lt;/span&gt;&lt;span class="token punctuation"&gt;:&lt;/span&gt; &lt;span class="token builtin"&gt;str&lt;/span&gt;&lt;span class="token punctuation"&gt;(&lt;/span&gt;scope&lt;span class="token punctuation"&gt;)&lt;/span&gt;&lt;span class="token punctuation"&gt;,&lt;/span&gt;
&lt;span class="token punctuation"&gt;}&lt;/span&gt;&lt;span class="token punctuation"&gt;,&lt;/span&gt; alter&lt;span class="token operator"&gt;=&lt;/span&gt;&lt;span class="token boolean"&gt;True&lt;/span&gt;&lt;span class="token punctuation"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;code&gt;sqlite-utils&lt;/code&gt; automatically creates a table with the correct schema the first time you try to insert a record into it. This makes it ideal for rapid prototyping. In this case I captured stringified versions of various data structures so I could look at them in my browser with Datasette.&lt;/p&gt;
&lt;p&gt;The &lt;code&gt;alter=True&lt;/code&gt; argument here means that if I attempt to insert a new shape of record into an existing tables any missing columns will be added automatically as well. Again, handy for prototyping.&lt;/p&gt;
&lt;p&gt;Based on the above, I evolved the code into recording the values I wanted to see in my logs - the full URL path, the User-Agent, the HTTP referrer, the IP and so on.&lt;/p&gt;
&lt;p&gt;This resulted in a LOT of duplicative data. Values like the path, user-agent and HTTP referrer are the same across many different requests.&lt;/p&gt;
&lt;p&gt;Regular plain text logs can solve this with gzip compression, but you can’t gzip a SQLite database and still expect it to work.&lt;/p&gt;
&lt;p&gt;Since we are logging to a relational database, we can solve for duplicate values using normalization. We can extract out those lengthy strings into separate lookup tables - that way we can store mostly integer foreign key references in the requests table itself.&lt;/p&gt;
&lt;p&gt;After a few iterations, my database code ended up looking like this:&lt;/p&gt;
&lt;pre class=" language-python"&gt;&lt;code class="prism  language-python"&gt;&lt;span class="token keyword"&gt;with&lt;/span&gt; db&lt;span class="token punctuation"&gt;.&lt;/span&gt;conn&lt;span class="token punctuation"&gt;:&lt;/span&gt;  &lt;span class="token comment"&gt;# Use a transaction&lt;/span&gt;
    db&lt;span class="token punctuation"&gt;[&lt;/span&gt;&lt;span class="token string"&gt;"requests"&lt;/span&gt;&lt;span class="token punctuation"&gt;]&lt;/span&gt;&lt;span class="token punctuation"&gt;.&lt;/span&gt;insert&lt;span class="token punctuation"&gt;(&lt;/span&gt;
        &lt;span class="token punctuation"&gt;{&lt;/span&gt;
            &lt;span class="token string"&gt;"start"&lt;/span&gt;&lt;span class="token punctuation"&gt;:&lt;/span&gt; start&lt;span class="token punctuation"&gt;,&lt;/span&gt;
            &lt;span class="token string"&gt;"method"&lt;/span&gt;&lt;span class="token punctuation"&gt;:&lt;/span&gt; scope&lt;span class="token punctuation"&gt;[&lt;/span&gt;&lt;span class="token string"&gt;"method"&lt;/span&gt;&lt;span class="token punctuation"&gt;]&lt;/span&gt;&lt;span class="token punctuation"&gt;,&lt;/span&gt;
            &lt;span class="token string"&gt;"path"&lt;/span&gt;&lt;span class="token punctuation"&gt;:&lt;/span&gt; lookup&lt;span class="token punctuation"&gt;(&lt;/span&gt;db&lt;span class="token punctuation"&gt;,&lt;/span&gt; &lt;span class="token string"&gt;"paths"&lt;/span&gt;&lt;span class="token punctuation"&gt;,&lt;/span&gt; path&lt;span class="token punctuation"&gt;)&lt;/span&gt;&lt;span class="token punctuation"&gt;,&lt;/span&gt;
            &lt;span class="token string"&gt;"query_string"&lt;/span&gt;&lt;span class="token punctuation"&gt;:&lt;/span&gt; lookup&lt;span class="token punctuation"&gt;(&lt;/span&gt;db&lt;span class="token punctuation"&gt;,&lt;/span&gt; &lt;span class="token string"&gt;"query_strings"&lt;/span&gt;&lt;span class="token punctuation"&gt;,&lt;/span&gt; query_string&lt;span class="token punctuation"&gt;)&lt;/span&gt;&lt;span class="token punctuation"&gt;,&lt;/span&gt;
            &lt;span class="token string"&gt;"user_agent"&lt;/span&gt;&lt;span class="token punctuation"&gt;:&lt;/span&gt; lookup&lt;span class="token punctuation"&gt;(&lt;/span&gt;db&lt;span class="token punctuation"&gt;,&lt;/span&gt; &lt;span class="token string"&gt;"user_agents"&lt;/span&gt;&lt;span class="token punctuation"&gt;,&lt;/span&gt; user_agent&lt;span class="token punctuation"&gt;)&lt;/span&gt;&lt;span class="token punctuation"&gt;,&lt;/span&gt;
            &lt;span class="token string"&gt;"referer"&lt;/span&gt;&lt;span class="token punctuation"&gt;:&lt;/span&gt; lookup&lt;span class="token punctuation"&gt;(&lt;/span&gt;db&lt;span class="token punctuation"&gt;,&lt;/span&gt; &lt;span class="token string"&gt;"referers"&lt;/span&gt;&lt;span class="token punctuation"&gt;,&lt;/span&gt; referer&lt;span class="token punctuation"&gt;)&lt;/span&gt;&lt;span class="token punctuation"&gt;,&lt;/span&gt;
            &lt;span class="token string"&gt;"accept_language"&lt;/span&gt;&lt;span class="token punctuation"&gt;:&lt;/span&gt; lookup&lt;span class="token punctuation"&gt;(&lt;/span&gt;db&lt;span class="token punctuation"&gt;,&lt;/span&gt; &lt;span class="token string"&gt;"accept_languages"&lt;/span&gt;&lt;span class="token punctuation"&gt;,&lt;/span&gt; accept_language&lt;span class="token punctuation"&gt;)&lt;/span&gt;&lt;span class="token punctuation"&gt;,&lt;/span&gt;
            &lt;span class="token string"&gt;"http_status"&lt;/span&gt;&lt;span class="token punctuation"&gt;:&lt;/span&gt; http_status&lt;span class="token punctuation"&gt;,&lt;/span&gt;
            &lt;span class="token string"&gt;"content_type"&lt;/span&gt;&lt;span class="token punctuation"&gt;:&lt;/span&gt; lookup&lt;span class="token punctuation"&gt;(&lt;/span&gt;db&lt;span class="token punctuation"&gt;,&lt;/span&gt; &lt;span class="token string"&gt;"content_types"&lt;/span&gt;&lt;span class="token punctuation"&gt;,&lt;/span&gt; content_type&lt;span class="token punctuation"&gt;)&lt;/span&gt;&lt;span class="token punctuation"&gt;,&lt;/span&gt;
            &lt;span class="token string"&gt;"client_ip"&lt;/span&gt;&lt;span class="token punctuation"&gt;:&lt;/span&gt; scope&lt;span class="token punctuation"&gt;.&lt;/span&gt;get&lt;span class="token punctuation"&gt;(&lt;/span&gt;&lt;span class="token string"&gt;"client"&lt;/span&gt;&lt;span class="token punctuation"&gt;,&lt;/span&gt; &lt;span class="token punctuation"&gt;(&lt;/span&gt;&lt;span class="token boolean"&gt;None&lt;/span&gt;&lt;span class="token punctuation"&gt;,&lt;/span&gt; &lt;span class="token boolean"&gt;None&lt;/span&gt;&lt;span class="token punctuation"&gt;)&lt;/span&gt;&lt;span class="token punctuation"&gt;)&lt;/span&gt;&lt;span class="token punctuation"&gt;[&lt;/span&gt;&lt;span class="token number"&gt;0&lt;/span&gt;&lt;span class="token punctuation"&gt;]&lt;/span&gt;&lt;span class="token punctuation"&gt;,&lt;/span&gt;
            &lt;span class="token string"&gt;"duration"&lt;/span&gt;&lt;span class="token punctuation"&gt;:&lt;/span&gt; end &lt;span class="token operator"&gt;-&lt;/span&gt; start&lt;span class="token punctuation"&gt;,&lt;/span&gt;
            &lt;span class="token string"&gt;"body_size"&lt;/span&gt;&lt;span class="token punctuation"&gt;:&lt;/span&gt; body_size&lt;span class="token punctuation"&gt;,&lt;/span&gt;
        &lt;span class="token punctuation"&gt;}&lt;/span&gt;&lt;span class="token punctuation"&gt;,&lt;/span&gt;
        alter&lt;span class="token operator"&gt;=&lt;/span&gt;&lt;span class="token boolean"&gt;True&lt;/span&gt;&lt;span class="token punctuation"&gt;,&lt;/span&gt;
        foreign_keys&lt;span class="token operator"&gt;=&lt;/span&gt;self&lt;span class="token punctuation"&gt;.&lt;/span&gt;lookup_columns&lt;span class="token punctuation"&gt;,&lt;/span&gt;
    &lt;span class="token punctuation"&gt;)&lt;/span&gt;


&lt;span class="token keyword"&gt;def&lt;/span&gt; &lt;span class="token function"&gt;lookup&lt;/span&gt;&lt;span class="token punctuation"&gt;(&lt;/span&gt;db&lt;span class="token punctuation"&gt;,&lt;/span&gt; table&lt;span class="token punctuation"&gt;,&lt;/span&gt; value&lt;span class="token punctuation"&gt;)&lt;/span&gt;&lt;span class="token punctuation"&gt;:&lt;/span&gt;
    &lt;span class="token keyword"&gt;return&lt;/span&gt; db&lt;span class="token punctuation"&gt;[&lt;/span&gt;table&lt;span class="token punctuation"&gt;]&lt;/span&gt;&lt;span class="token punctuation"&gt;.&lt;/span&gt;lookup&lt;span class="token punctuation"&gt;(&lt;/span&gt;&lt;span class="token punctuation"&gt;{&lt;/span&gt;
        &lt;span class="token string"&gt;"name"&lt;/span&gt;&lt;span class="token punctuation"&gt;:&lt;/span&gt; value
    &lt;span class="token punctuation"&gt;}&lt;/span&gt;&lt;span class="token punctuation"&gt;)&lt;/span&gt; &lt;span class="token keyword"&gt;if&lt;/span&gt; value &lt;span class="token keyword"&gt;else&lt;/span&gt; &lt;span class="token boolean"&gt;None&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;The &lt;a href="https://sqlite-utils.readthedocs.io/en/stable/python-api.html#working-with-lookup-tables"&gt;table.lookup() method&lt;/a&gt; in &lt;code&gt;sqlite-utils&lt;/code&gt; is designed for exactly this use-case. If you pass it a value (or multiple values) it will ensure the underlying table has those columns with a unique index on them, then get-or-insert your data and return you the primary key.&lt;/p&gt;
&lt;p&gt;Automatically creating tables is fine for an initial prototype, but it starts getting a little messy once you have foreign keys relationships that you need to be able to rely on. I moved to explicit table creation in &lt;a href="https://github.com/simonw/asgi-log-to-sqlite/blob/5e58e577fea4bd99a7ae5e61b8d389684d55389c/asgi_log_to_sqlite.py#L21-L43"&gt;an ensure_tables() method&lt;/a&gt; that’s called once when the middleware class is used to wrap the underlying ASGI app:&lt;/p&gt;
&lt;pre class=" language-python"&gt;&lt;code class="prism  language-python"&gt;    lookup_columns &lt;span class="token operator"&gt;=&lt;/span&gt; &lt;span class="token punctuation"&gt;(&lt;/span&gt;
        &lt;span class="token string"&gt;"path"&lt;/span&gt;&lt;span class="token punctuation"&gt;,&lt;/span&gt;
        &lt;span class="token string"&gt;"user_agent"&lt;/span&gt;&lt;span class="token punctuation"&gt;,&lt;/span&gt;
        &lt;span class="token string"&gt;"referer"&lt;/span&gt;&lt;span class="token punctuation"&gt;,&lt;/span&gt;
        &lt;span class="token string"&gt;"accept_language"&lt;/span&gt;&lt;span class="token punctuation"&gt;,&lt;/span&gt;
        &lt;span class="token string"&gt;"content_type"&lt;/span&gt;&lt;span class="token punctuation"&gt;,&lt;/span&gt;
        &lt;span class="token string"&gt;"query_string"&lt;/span&gt;&lt;span class="token punctuation"&gt;,&lt;/span&gt;
    &lt;span class="token punctuation"&gt;)&lt;/span&gt;

    &lt;span class="token keyword"&gt;def&lt;/span&gt; &lt;span class="token function"&gt;ensure_tables&lt;/span&gt;&lt;span class="token punctuation"&gt;(&lt;/span&gt;self&lt;span class="token punctuation"&gt;)&lt;/span&gt;&lt;span class="token punctuation"&gt;:&lt;/span&gt;
        &lt;span class="token keyword"&gt;for&lt;/span&gt; column &lt;span class="token keyword"&gt;in&lt;/span&gt; self&lt;span class="token punctuation"&gt;.&lt;/span&gt;lookup_columns&lt;span class="token punctuation"&gt;:&lt;/span&gt;
            table &lt;span class="token operator"&gt;=&lt;/span&gt; &lt;span class="token string"&gt;"{}s"&lt;/span&gt;&lt;span class="token punctuation"&gt;.&lt;/span&gt;&lt;span class="token builtin"&gt;format&lt;/span&gt;&lt;span class="token punctuation"&gt;(&lt;/span&gt;column&lt;span class="token punctuation"&gt;)&lt;/span&gt;
            &lt;span class="token keyword"&gt;if&lt;/span&gt; &lt;span class="token operator"&gt;not&lt;/span&gt; self&lt;span class="token punctuation"&gt;.&lt;/span&gt;db&lt;span class="token punctuation"&gt;[&lt;/span&gt;table&lt;span class="token punctuation"&gt;]&lt;/span&gt;&lt;span class="token punctuation"&gt;.&lt;/span&gt;exists&lt;span class="token punctuation"&gt;:&lt;/span&gt;
                self&lt;span class="token punctuation"&gt;.&lt;/span&gt;db&lt;span class="token punctuation"&gt;[&lt;/span&gt;table&lt;span class="token punctuation"&gt;]&lt;/span&gt;&lt;span class="token punctuation"&gt;.&lt;/span&gt;create&lt;span class="token punctuation"&gt;(&lt;/span&gt;&lt;span class="token punctuation"&gt;{&lt;/span&gt;
                    &lt;span class="token string"&gt;"id"&lt;/span&gt;&lt;span class="token punctuation"&gt;:&lt;/span&gt; &lt;span class="token builtin"&gt;int&lt;/span&gt;&lt;span class="token punctuation"&gt;,&lt;/span&gt;
                    &lt;span class="token string"&gt;"name"&lt;/span&gt;&lt;span class="token punctuation"&gt;:&lt;/span&gt; &lt;span class="token builtin"&gt;str&lt;/span&gt;
                &lt;span class="token punctuation"&gt;}&lt;/span&gt;&lt;span class="token punctuation"&gt;,&lt;/span&gt; pk&lt;span class="token operator"&gt;=&lt;/span&gt;&lt;span class="token string"&gt;"id"&lt;/span&gt;&lt;span class="token punctuation"&gt;)&lt;/span&gt;
        &lt;span class="token keyword"&gt;if&lt;/span&gt; &lt;span class="token operator"&gt;not&lt;/span&gt; self&lt;span class="token punctuation"&gt;.&lt;/span&gt;db&lt;span class="token punctuation"&gt;[&lt;/span&gt;&lt;span class="token string"&gt;"requests"&lt;/span&gt;&lt;span class="token punctuation"&gt;]&lt;/span&gt;&lt;span class="token punctuation"&gt;.&lt;/span&gt;exists&lt;span class="token punctuation"&gt;:&lt;/span&gt;
            self&lt;span class="token punctuation"&gt;.&lt;/span&gt;db&lt;span class="token punctuation"&gt;[&lt;/span&gt;&lt;span class="token string"&gt;"requests"&lt;/span&gt;&lt;span class="token punctuation"&gt;]&lt;/span&gt;&lt;span class="token punctuation"&gt;.&lt;/span&gt;create&lt;span class="token punctuation"&gt;(&lt;/span&gt;&lt;span class="token punctuation"&gt;{&lt;/span&gt;
                &lt;span class="token string"&gt;"start"&lt;/span&gt;&lt;span class="token punctuation"&gt;:&lt;/span&gt; &lt;span class="token builtin"&gt;float&lt;/span&gt;&lt;span class="token punctuation"&gt;,&lt;/span&gt;
                &lt;span class="token string"&gt;"method"&lt;/span&gt;&lt;span class="token punctuation"&gt;:&lt;/span&gt; &lt;span class="token builtin"&gt;str&lt;/span&gt;&lt;span class="token punctuation"&gt;,&lt;/span&gt;
                &lt;span class="token string"&gt;"path"&lt;/span&gt;&lt;span class="token punctuation"&gt;:&lt;/span&gt; &lt;span class="token builtin"&gt;int&lt;/span&gt;&lt;span class="token punctuation"&gt;,&lt;/span&gt;
                &lt;span class="token string"&gt;"query_string"&lt;/span&gt;&lt;span class="token punctuation"&gt;:&lt;/span&gt; &lt;span class="token builtin"&gt;int&lt;/span&gt;&lt;span class="token punctuation"&gt;,&lt;/span&gt;
                &lt;span class="token string"&gt;"user_agent"&lt;/span&gt;&lt;span class="token punctuation"&gt;:&lt;/span&gt; &lt;span class="token builtin"&gt;int&lt;/span&gt;&lt;span class="token punctuation"&gt;,&lt;/span&gt;
                &lt;span class="token string"&gt;"referer"&lt;/span&gt;&lt;span class="token punctuation"&gt;:&lt;/span&gt; &lt;span class="token builtin"&gt;int&lt;/span&gt;&lt;span class="token punctuation"&gt;,&lt;/span&gt;
                &lt;span class="token string"&gt;"accept_language"&lt;/span&gt;&lt;span class="token punctuation"&gt;:&lt;/span&gt; &lt;span class="token builtin"&gt;int&lt;/span&gt;&lt;span class="token punctuation"&gt;,&lt;/span&gt;
                &lt;span class="token string"&gt;"http_status"&lt;/span&gt;&lt;span class="token punctuation"&gt;:&lt;/span&gt; &lt;span class="token builtin"&gt;int&lt;/span&gt;&lt;span class="token punctuation"&gt;,&lt;/span&gt;
                &lt;span class="token string"&gt;"content_type"&lt;/span&gt;&lt;span class="token punctuation"&gt;:&lt;/span&gt; &lt;span class="token builtin"&gt;int&lt;/span&gt;&lt;span class="token punctuation"&gt;,&lt;/span&gt;
                &lt;span class="token string"&gt;"client_ip"&lt;/span&gt;&lt;span class="token punctuation"&gt;:&lt;/span&gt; &lt;span class="token builtin"&gt;str&lt;/span&gt;&lt;span class="token punctuation"&gt;,&lt;/span&gt;
                &lt;span class="token string"&gt;"duration"&lt;/span&gt;&lt;span class="token punctuation"&gt;:&lt;/span&gt; &lt;span class="token builtin"&gt;float&lt;/span&gt;&lt;span class="token punctuation"&gt;,&lt;/span&gt;
                &lt;span class="token string"&gt;"body_size"&lt;/span&gt;&lt;span class="token punctuation"&gt;:&lt;/span&gt; &lt;span class="token builtin"&gt;int&lt;/span&gt;&lt;span class="token punctuation"&gt;,&lt;/span&gt;
            &lt;span class="token punctuation"&gt;}&lt;/span&gt;&lt;span class="token punctuation"&gt;,&lt;/span&gt; foreign_keys&lt;span class="token operator"&gt;=&lt;/span&gt;self&lt;span class="token punctuation"&gt;.&lt;/span&gt;lookup_columns&lt;span class="token punctuation"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;I’m increasingly using this pattern in my &lt;code&gt;sqlite-utils&lt;/code&gt; projects. It’s not a full-grown migrations system but it’s a pretty low-effort way of creating tables correctly provided they don’t yet exist.&lt;/p&gt;
&lt;p&gt;Here’s &lt;a href="https://github.com/simonw/asgi-log-to-sqlite/blob/5e58e577fea4bd99a7ae5e61b8d389684d55389c/asgi_log_to_sqlite.py"&gt;the full implementation of the middleware&lt;/a&gt;.&lt;/p&gt;
&lt;h3 id="configuring-the-middleware-for-use-with-datasette"&gt;Configuring the middleware for use with Datasette&lt;/h3&gt;
&lt;p&gt;Publishing standalone ASGI middleware for this kind of thing is neat because it can be used with any ASGI application, not just with Datasette.&lt;/p&gt;
&lt;p&gt;To make it as usable as possible with Datasette I want it made available as a plugin.&lt;/p&gt;
&lt;p&gt;I’ve tried two different patterns for this in the past.&lt;/p&gt;
&lt;p&gt;My first ASGI middleware was &lt;a href="https://github.com/simonw/asgi-cors"&gt;asgi-cors&lt;/a&gt;. I published that as two separate packages to PyPI: &lt;code&gt;asgi-cors&lt;/code&gt; is the middleware itself, and &lt;a href="https://github.com/simonw/asgi-cors"&gt;datasette-cors&lt;/a&gt; is a very thin plugin wrapper around it that hooks into Datasette’s &lt;a href="https://datasette.readthedocs.io/en/0.32/plugins.html#plugin-configuration"&gt;plugin configuration mechanism&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;For &lt;a href="https://github.com/simonw/datasette-auth-github"&gt;datasette-auth-github&lt;/a&gt; I decided not to publish two packages. Instead I published a single plugin package and then described how to use it as standalone ASGI middleware in its documentation.&lt;/p&gt;
&lt;p&gt;This lazier approach is confusing: it’s not at all clear that a package called &lt;code&gt;datasette-auth-github&lt;/code&gt; can be used independently of Datasette. But I did get to avoid having to publish two packages.&lt;/p&gt;
&lt;h3 id="datasette-configure-asgi"&gt;datasette-configure-asgi&lt;/h3&gt;
&lt;p&gt;Since I want to do a lot more experiments with ASGI plugins in the future, I decided to try solving the ASGI configuration issue once and for all. I built a new experimental plugin, &lt;a href="https://github.com/simonw/datasette-configure-asgi"&gt;datasette-configure-asgi&lt;/a&gt; which can be used to configure ANY ASGI middleware that conforms to an expected protocol.&lt;/p&gt;
&lt;p&gt;Here’s what that looks like at the configuration level, using &lt;a href="https://datasette.readthedocs.io/en/0.32/metadata.html"&gt;a metadata.json&lt;/a&gt; settings file (which &lt;a href="https://github.com/simonw/datasette/issues/493"&gt;I should really rename&lt;/a&gt; since it’s more about configuration than metadata these days):&lt;/p&gt;
&lt;pre class=" language-json"&gt;&lt;code class="prism  language-json"&gt;&lt;span class="token punctuation"&gt;{&lt;/span&gt;
  &lt;span class="token string"&gt;"plugins"&lt;/span&gt;&lt;span class="token punctuation"&gt;:&lt;/span&gt; &lt;span class="token punctuation"&gt;{&lt;/span&gt;
    &lt;span class="token string"&gt;"datasette-configure-asgi"&lt;/span&gt;&lt;span class="token punctuation"&gt;:&lt;/span&gt; &lt;span class="token punctuation"&gt;[&lt;/span&gt;
      &lt;span class="token punctuation"&gt;{&lt;/span&gt;
        &lt;span class="token string"&gt;"class"&lt;/span&gt;&lt;span class="token punctuation"&gt;:&lt;/span&gt; &lt;span class="token string"&gt;"asgi_log_to_sqlite.AsgiLogToSqlite"&lt;/span&gt;&lt;span class="token punctuation"&gt;,&lt;/span&gt;
        &lt;span class="token string"&gt;"args"&lt;/span&gt;&lt;span class="token punctuation"&gt;:&lt;/span&gt; &lt;span class="token punctuation"&gt;{&lt;/span&gt;
          &lt;span class="token string"&gt;"file"&lt;/span&gt;&lt;span class="token punctuation"&gt;:&lt;/span&gt; &lt;span class="token string"&gt;"/tmp/log.db"&lt;/span&gt;
        &lt;span class="token punctuation"&gt;}&lt;/span&gt;
      &lt;span class="token punctuation"&gt;}&lt;/span&gt;
    &lt;span class="token punctuation"&gt;]&lt;/span&gt;
  &lt;span class="token punctuation"&gt;}&lt;/span&gt;
&lt;span class="token punctuation"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;The implementation of this plugin is very simple: here’s the entire thing:&lt;/p&gt;
&lt;pre class=" language-python"&gt;&lt;code class="prism  language-python"&gt;&lt;span class="token keyword"&gt;from&lt;/span&gt; datasette &lt;span class="token keyword"&gt;import&lt;/span&gt; hookimpl
&lt;span class="token keyword"&gt;import&lt;/span&gt; importlib


@hookimpl
&lt;span class="token keyword"&gt;def&lt;/span&gt; &lt;span class="token function"&gt;asgi_wrapper&lt;/span&gt;&lt;span class="token punctuation"&gt;(&lt;/span&gt;datasette&lt;span class="token punctuation"&gt;)&lt;/span&gt;&lt;span class="token punctuation"&gt;:&lt;/span&gt;
    &lt;span class="token keyword"&gt;def&lt;/span&gt; &lt;span class="token function"&gt;wrap_with_classes&lt;/span&gt;&lt;span class="token punctuation"&gt;(&lt;/span&gt;app&lt;span class="token punctuation"&gt;)&lt;/span&gt;&lt;span class="token punctuation"&gt;:&lt;/span&gt;
        configs &lt;span class="token operator"&gt;=&lt;/span&gt; datasette&lt;span class="token punctuation"&gt;.&lt;/span&gt;plugin_config&lt;span class="token punctuation"&gt;(&lt;/span&gt;&lt;span class="token string"&gt;"datasette-configure-asgi"&lt;/span&gt;&lt;span class="token punctuation"&gt;)&lt;/span&gt; &lt;span class="token operator"&gt;or&lt;/span&gt; &lt;span class="token punctuation"&gt;[&lt;/span&gt;&lt;span class="token punctuation"&gt;]&lt;/span&gt;
        &lt;span class="token keyword"&gt;for&lt;/span&gt; config &lt;span class="token keyword"&gt;in&lt;/span&gt; configs&lt;span class="token punctuation"&gt;:&lt;/span&gt;
            module_path&lt;span class="token punctuation"&gt;,&lt;/span&gt; class_name &lt;span class="token operator"&gt;=&lt;/span&gt; config&lt;span class="token punctuation"&gt;[&lt;/span&gt;&lt;span class="token string"&gt;"class"&lt;/span&gt;&lt;span class="token punctuation"&gt;]&lt;/span&gt;&lt;span class="token punctuation"&gt;.&lt;/span&gt;rsplit&lt;span class="token punctuation"&gt;(&lt;/span&gt;&lt;span class="token string"&gt;"."&lt;/span&gt;&lt;span class="token punctuation"&gt;,&lt;/span&gt; &lt;span class="token number"&gt;1&lt;/span&gt;&lt;span class="token punctuation"&gt;)&lt;/span&gt;
            mod &lt;span class="token operator"&gt;=&lt;/span&gt; importlib&lt;span class="token punctuation"&gt;.&lt;/span&gt;import_module&lt;span class="token punctuation"&gt;(&lt;/span&gt;module_path&lt;span class="token punctuation"&gt;)&lt;/span&gt;
            klass &lt;span class="token operator"&gt;=&lt;/span&gt; &lt;span class="token builtin"&gt;getattr&lt;/span&gt;&lt;span class="token punctuation"&gt;(&lt;/span&gt;mod&lt;span class="token punctuation"&gt;,&lt;/span&gt; class_name&lt;span class="token punctuation"&gt;)&lt;/span&gt;
            args &lt;span class="token operator"&gt;=&lt;/span&gt; config&lt;span class="token punctuation"&gt;.&lt;/span&gt;get&lt;span class="token punctuation"&gt;(&lt;/span&gt;&lt;span class="token string"&gt;"args"&lt;/span&gt;&lt;span class="token punctuation"&gt;)&lt;/span&gt; &lt;span class="token operator"&gt;or&lt;/span&gt; &lt;span class="token punctuation"&gt;{&lt;/span&gt;&lt;span class="token punctuation"&gt;}&lt;/span&gt;
            app &lt;span class="token operator"&gt;=&lt;/span&gt; klass&lt;span class="token punctuation"&gt;(&lt;/span&gt;app&lt;span class="token punctuation"&gt;,&lt;/span&gt; &lt;span class="token operator"&gt;**&lt;/span&gt;args&lt;span class="token punctuation"&gt;)&lt;/span&gt;
        &lt;span class="token keyword"&gt;return&lt;/span&gt; app

    &lt;span class="token keyword"&gt;return&lt;/span&gt; wrap_with_classes
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;It hooks into the &lt;a href="https://datasette.readthedocs.io/en/0.32/plugins.html#asgi-wrapper-datasette"&gt;asgi_wrapper plugin hook&lt;/a&gt;, reads its configuration from the &lt;code&gt;datasette&lt;/code&gt; object (using &lt;a href="https://datasette.readthedocs.io/en/0.32/plugins.html#writing-plugins-that-accept-configuration"&gt;plugin_config()&lt;/a&gt;), then loops through the list of configured plugins and dynamically loads each implementation using &lt;a href="https://docs.python.org/3/library/importlib.html#importlib.import_module"&gt;importlib&lt;/a&gt;. Then it wraps the ASGI app with each of them in turn.&lt;/p&gt;
&lt;h3 id="open-questions"&gt;Open questions&lt;/h3&gt;
&lt;p&gt;This is where I’ve got to with my experiments so far. Should you use this stuff in production? Almost certainly not! I wrote it on a plane just now. It definitely needs a bit more thought.&lt;/p&gt;
&lt;p&gt;A couple of obvious open questions:&lt;/p&gt;
&lt;p&gt;Python async functions &lt;strong&gt;shouldn’t make blocking calls&lt;/strong&gt;, since doing so will block the entire event loop for everyone else.&lt;/p&gt;
&lt;p&gt;Interacting with SQLite is a blocking call. Datasette works around this by &lt;a href="https://github.com/simonw/datasette/blob/d6b6c9171f3fd945c4e5e4144923ac831c43c208/datasette/database.py#L56-L67"&gt;running SQL queries in a thread pool&lt;/a&gt;; my logging plugin doesn’t bother with that.&lt;/p&gt;
&lt;p&gt;Maybe it should? My hunch is that inserting into SQLite in this way is so fast it won’t actually cause any noticeable overhead. It would be nice to test that assumption thoroughly though.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Log rotation&lt;/strong&gt;. This is an important detail for any well designed logging system, and I’ve punted on it entirely. Figuring out an elegant way to handle this with underlying SQLite databases files would be an interesting design challenge - &lt;a href="https://github.com/simonw/asgi-log-to-sqlite/issues/1"&gt;relevant issue&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;Would my SQLite logging middleware work with &lt;strong&gt;Django 3.0&lt;/strong&gt;? I don’t see why not - the documentation covers &lt;a href="https://docs.djangoproject.com/en/3.0/howto/deployment/asgi/#applying-asgi-middleware"&gt;how to wrap entire Django applications with ASGI middleware&lt;/a&gt;. I should try that out!&lt;/p&gt;
&lt;h3 id="this-weeks-niche-museums"&gt;This week’s Niche Museums&lt;/h3&gt;
&lt;p&gt;These are technically my weeknotes, but logging experiments aside it’s been a quiet week for me.&lt;/p&gt;
&lt;p&gt;I finally added paragraph breaks to &lt;a href="https://www.niche-museums.com/"&gt;Niche Museums&lt;/a&gt; (using &lt;a href="https://github.com/simonw/datasette-render-markdown"&gt;datasette-render-markdown&lt;/a&gt;, implementation &lt;a href="https://github.com/simonw/museums/commit/a9e105196e4987710bc982837bfda24a7aefebeb"&gt;here&lt;/a&gt;) As a result my descriptions have been getting a whole lot longer. Added this week:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href="https://www.niche-museums.com/browse/museums/61"&gt;The Tonga Room&lt;/a&gt; in San Francisco&lt;/li&gt;
&lt;li&gt;&lt;a href="https://www.niche-museums.com/browse/museums/62"&gt;London Silver Vaults&lt;/a&gt; in London&lt;/li&gt;
&lt;li&gt;&lt;a href="https://www.niche-museums.com/browse/museums/63"&gt;Rosie the Riveter National Historical Park&lt;/a&gt; in Richmond, CA&lt;/li&gt;
&lt;li&gt;&lt;a href="https://www.niche-museums.com/browse/museums/64"&gt;LA Bureau of Street Lighting Museum&lt;/a&gt; in Los Angeles&lt;/li&gt;
&lt;li&gt;&lt;a href="https://www.niche-museums.com/browse/museums/65"&gt;Aye-Aye Island&lt;/a&gt; in Madagascar&lt;/li&gt;
&lt;li&gt;&lt;a href="https://www.niche-museums.com/browse/museums/66"&gt;Monarch Bear Grove&lt;/a&gt; in San Francisco&lt;/li&gt;
&lt;li&gt;&lt;a href="https://www.niche-museums.com/browse/museums/67"&gt;Alverstone Mead Red Squirrel Hide&lt;/a&gt; on the Isle of Wight&lt;/li&gt;
&lt;/ul&gt;
    
        &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/django"&gt;django&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/logging"&gt;logging&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/projects"&gt;projects&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/prototyping"&gt;prototyping&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/sqlite"&gt;sqlite&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/datasette"&gt;datasette&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/asgi"&gt;asgi&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/weeknotes"&gt;weeknotes&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/sqlite-utils"&gt;sqlite-utils&lt;/a&gt;&lt;/p&gt;
    

</summary><category term="django"/><category term="logging"/><category term="projects"/><category term="prototyping"/><category term="sqlite"/><category term="datasette"/><category term="asgi"/><category term="weeknotes"/><category term="sqlite-utils"/></entry><entry><title>Datasette 0.31</title><link href="https://simonwillison.net/2019/Nov/12/datasette/#atom-tag" rel="alternate"/><published>2019-11-12T06:11:57+00:00</published><updated>2019-11-12T06:11:57+00:00</updated><id>https://simonwillison.net/2019/Nov/12/datasette/#atom-tag</id><summary type="html">
    
&lt;p&gt;&lt;strong&gt;&lt;a href="https://datasette.readthedocs.io/en/stable/changelog.html#v0-31"&gt;Datasette 0.31&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;
Released today: this version adds compatibility with Python 3.8 and breaks compatibility with Python 3.5. Since Glitch support Python 3.7.3 now I decided I could finally give up on 3.5. This means Datasette can use f-strings now, but more importantly it opens up the opportunity to start taking advantage of Starlette, which makes all kinds of interesting new ASGI-based plugins much easier to build.


    &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/glitch"&gt;glitch&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/projects"&gt;projects&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/python"&gt;python&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/datasette"&gt;datasette&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/asgi"&gt;asgi&lt;/a&gt;&lt;/p&gt;



</summary><category term="glitch"/><category term="projects"/><category term="python"/><category term="datasette"/><category term="asgi"/></entry><entry><title>Single sign-on against GitHub using ASGI middleware</title><link href="https://simonwillison.net/2019/Jul/14/sso-asgi/#atom-tag" rel="alternate"/><published>2019-07-14T01:18:56+00:00</published><updated>2019-07-14T01:18:56+00:00</updated><id>https://simonwillison.net/2019/Jul/14/sso-asgi/#atom-tag</id><summary type="html">
    &lt;p&gt;I released &lt;a href="https://datasette.readthedocs.io/en/stable/changelog.html#v0-29"&gt;Datasette 0.29&lt;/a&gt; last weekend, the first version of Datasette to be built on top of ASGI (discussed previously in &lt;a href="https://simonwillison.net/2019/Jun/23/datasette-asgi/"&gt;Porting Datasette to ASGI, and Turtles all the way down&lt;/a&gt;).&lt;/p&gt;
&lt;p&gt;This also marked the introduction of the new &lt;a href="https://datasette.readthedocs.io/en/stable/plugins.html#plugin-asgi-wrapper"&gt;asgi_wrapper&lt;/a&gt; plugin hook, which allows plugins to wrap the entire Datasette application in their own piece of ASGI middleware.&lt;/p&gt;
&lt;p&gt;To celebrate this new capability, I also released two new plugins: &lt;a href="https://github.com/simonw/datasette-cors"&gt;datasette-cors&lt;/a&gt;, which provides fine-grained control over CORS headers (using my &lt;a href="https://github.com/simonw/asgi-cors"&gt;asgi-cors&lt;/a&gt; library from a few months ago) and &lt;a href="https://github.com/simonw/datasette-auth-github"&gt;datasette-auth-github&lt;/a&gt;, the first of hopefully many authentication plugins for Datasette.&lt;/p&gt;
&lt;h3&gt;&lt;a id="datasetteauthgithub_8"&gt;&lt;/a&gt;datasette-auth-github&lt;/h3&gt;
&lt;p&gt;The new plugin is best illustrated with a demo.&lt;/p&gt;
&lt;p&gt;Visit &lt;a href="https://datasette-auth-demo.now.sh/"&gt;https://datasette-auth-demo.now.sh/&lt;/a&gt; and you will be redirected to GitHub and asked to approve access to your account (just your e-mail address, not repository access).&lt;/p&gt;
&lt;p&gt;Agree, and you’ll be redirected back to the demo with a new element in the Datasette header: your GitHub username, plus a “log out” link in the navigation bar at the top of the screen.&lt;/p&gt;
&lt;h3&gt;&lt;a id="Controlling_who_can_access_16"&gt;&lt;/a&gt;Controlling who can access&lt;/h3&gt;
&lt;p&gt;The default behaviour of the plugin is to allow in anyone with a GitHub account. Since the primary use-case for the plugin (at least for the moment) is restricting access to view data to a trusted subset of people,  the plugin lets you configure who is allowed to view your data in three different ways:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;You can restrict access to a specific list of GitHub accounts, using the &lt;code&gt;allow_users&lt;/code&gt; configuration option.&lt;/li&gt;
&lt;li&gt;You can restrict access to members of one or more GitHub &lt;a href="https://help.github.com/en/articles/about-organizations"&gt;organizations&lt;/a&gt;, with &lt;code&gt;allow_orgs&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;You can restrict access to members of specific &lt;a href="https://help.github.com/en/articles/about-teams"&gt;teams&lt;/a&gt; within an organization, using &lt;code&gt;allow_teams&lt;/code&gt;.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Datasette inherits quite a sophisticated user management system from GitHub, with very little effort required from the plugin. The &lt;code&gt;user_is_allowed()&lt;/code&gt; method that implements all three of the above options against the GitHub API in &lt;a href="https://github.com/simonw/datasette-auth-github/blob/f69781d11115b2685ff48bdaa2ab0367d4f8d306/datasette_auth_github/github_auth.py#L145-L188"&gt;just 40 lines of code&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;These options can be set using the &lt;code&gt;&amp;quot;plugins&amp;quot;&lt;/code&gt; section of the Datasette &lt;code&gt;metadata.json&lt;/code&gt; configuration file. Here’s an example:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;{
  &amp;quot;plugins&amp;quot;: {
    &amp;quot;datasette-auth-github&amp;quot;: {
      &amp;quot;client_id&amp;quot;: {&amp;quot;$env&amp;quot;: &amp;quot;GITHUB_CLIENT_ID&amp;quot;},
      &amp;quot;client_secret&amp;quot;: {&amp;quot;$env&amp;quot;: &amp;quot;GITHUB_CLIENT_SECRET&amp;quot;},
      &amp;quot;allow_users&amp;quot;: [&amp;quot;simonw&amp;quot;]
    }
  }
}
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;This also illustrates a new Datasette feature: the ability to set &lt;a href="https://datasette.readthedocs.io/en/stable/plugins.html#secret-configuration-values"&gt;secret plugin configuration values&lt;/a&gt;. &lt;code&gt;{&amp;quot;$env&amp;quot;: &amp;quot;GITHUB_CLIENT_SECRET&amp;quot;}&lt;/code&gt; means &amp;quot;read this configuration option from the environment variable &lt;code&gt;GITHUB_CLIENT_SECRET&lt;/code&gt;&amp;quot;.&lt;/p&gt;
&lt;h3&gt;&lt;a id="Automatic_log_in_40"&gt;&lt;/a&gt;Automatic log in&lt;/h3&gt;
&lt;p&gt;Like many OAuth providers, GitHub only asks the user for their approval the first time they log into a given app. Any subsequent times they are redirected to GitHub it will skip the permission screen and redirect them right back again with a token.&lt;/p&gt;
&lt;p&gt;This means we can implement automatic log in: any time a visitor arrives who does not have a cookie we can bounce them directly to GitHub, and if they have already consented they will be logged in instantly.&lt;/p&gt;
&lt;p&gt;This is a great user-experience - provided the user is logged into GitHub they will be treated as if they are logged into your application - but it does come with a downside: what if the user clicks the “log out” link?&lt;/p&gt;
&lt;p&gt;For the moment I’ve implemented this using another cookie: if the user clicks “log out”, I set an &lt;code&gt;asgi_auth_logout&lt;/code&gt; cookie marking the user as having explicitly logged out. While they have that cookie they won’t be logged in automatically, instead having to click an explicit link. See &lt;a href="https://githiub.com/simonw/datasette-auth-github/issues/41"&gt;issue 41&lt;/a&gt; for thoughts on how this could be further improved.&lt;/p&gt;
&lt;p&gt;One pleasant side-effect of all of this is that &lt;code&gt;datasette-auth-github&lt;/code&gt; doesn’t need to persist the users GitHub &lt;code&gt;access_token&lt;/code&gt; anywhere - it uses it during initil authentication check for any required organizations or teams, but then it deliberately forgets the token entirely.&lt;/p&gt;
&lt;p&gt;OAuth access tokens are like passwords, so the most resonsible thing for a piece of softare to do with them is avoid storing them anywhere at all unless they are explicitly needed.&lt;/p&gt;
&lt;h3&gt;&lt;a id="What_happens_when_a_user_leaves_an_organization_54"&gt;&lt;/a&gt;What happens when a user leaves an organization?&lt;/h3&gt;
&lt;p&gt;When building against a single sign-in provider, consideration needs to be given to offboarding: when a user is removed from a team or organization they should also lose access to their SSO applications.&lt;/p&gt;
&lt;p&gt;This is difficult when an application sets its own authentication cookies, like &lt;code&gt;datasette-auth-github&lt;/code&gt; does.&lt;/p&gt;
&lt;p&gt;One solution would be to make an API call on every request to the application, to verify that the user should still have access. This would slow everything down and is likely to blow through rate limits as well, so we need a more efficient solution.&lt;/p&gt;
&lt;p&gt;I ended up solving this with two mechanisms. Since we have automatic log in, our cookies don’t actually need to last very long - so by default the signed cookies set by the plugin last for just one hour. When a user’s cookie has expired they will be redirected back through GitHub - they probably won’t even notice the redirect, and their permissions will be re-verified as part of that flow.&lt;/p&gt;
&lt;p&gt;But what if you need to invalidate those cookies instantly?&lt;/p&gt;
&lt;p&gt;To cover that case, I’ve incorporated an optional &lt;code&gt;cookie_version&lt;/code&gt; configuration option into the signatures on the cookies. If you need to invalidate &lt;em&gt;every&lt;/em&gt; signed cookie that is out there - to lock out a compromised GitHub account owner for example - you can do so by changing the &lt;code&gt;cookie_version&lt;/code&gt; configuration option and restarting (or re-deploying) Datasette.&lt;/p&gt;
&lt;p&gt;These options are all described in detail in the &lt;a href="https://github.com/simonw/datasette-auth-github/blob/master/README.md"&gt;project README&lt;/a&gt;.&lt;/p&gt;
&lt;h3&gt;&lt;a id="Integration_with_datasette_publish_71"&gt;&lt;/a&gt;Integration with datasette publish&lt;/h3&gt;
&lt;p&gt;The &lt;a href="https://datasette.readthedocs.io/en/stable/publish.html#datasette-publish"&gt;datasette publish&lt;/a&gt; command-line tool lets users instantly publish a SQLite database to the internet, using Heroku, Cloud Run or Zeit Now v1. I’ve added suppor for setting secret plugin configuration directly to that tool, which means you can publish an authentication-protected SQLite database to the internet with a shell one-liner, using &lt;code&gt;--install=datasette-auth-github&lt;/code&gt; to install the plugin and &lt;code&gt;--plugin-secret&lt;/code&gt; to configure it:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;$ datasette publish cloudrun fixtures.db \
   --install=datasette-auth-github \
   --name datasette-auth-protected \
   --service datasette-auth-protected \
   --plugin-secret datasette-auth-github allow_users simonw \
   --plugin-secret datasette-auth-github client_id 85f6224cb2a44bbad3fa \
   --plugin-secret datasette-auth-github client_secret ...
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;This creates a Cloud Run instance which only allows GitHub user &lt;a href="https://github.com/simonw"&gt;simonw&lt;/a&gt; to log in. You could instead use &lt;code&gt;--plugin-secret datasette-auth-github allow_orgs my-org&lt;/code&gt; to allow any users from a specific GitHub organization.&lt;/p&gt;
&lt;p&gt;Note that Cloud Run does not yet give you full control over the URL that will be assigned to your deployment. In this case it gave me &lt;code&gt;https://datasette-auth-protected-j7hipcg4aq-uc.a.run.app&lt;/code&gt; - which works fine, but I needed to update my GitHub OAuth application’s callback URL manually to &lt;code&gt;https://datasette-auth-protected-j7hipcg4aq-uc.a.run.app/-/auth-callback&lt;/code&gt; after deploying the application in order to get the authentication flow to work correctly.&lt;/p&gt;
&lt;h3&gt;&lt;a id="Add_GitHub_authentication_to_any_ASGI_application_87"&gt;&lt;/a&gt;Add GitHub authentication to any ASGI application!&lt;/h3&gt;
&lt;p&gt;&lt;code&gt;datasette-auth-github&lt;/code&gt; isn’t just for Datasette: I deliberately wrote the plugin as ASGI middleware first, with only a very thin layer of extra code to turn it into an installable plugin.&lt;/p&gt;
&lt;p&gt;This means that if you are building any other kind of ASGI app (or using an ASGI-compatible framework such as Starlette or Sanic) you can wrap your application directly with the middleware and get the same authentication behaviour as when the plugin is added to Datasette!&lt;/p&gt;
&lt;p&gt;Here’s what that looks like:&lt;/p&gt;
&lt;pre&gt;&lt;code class="language-python"&gt;&lt;span class="hljs-keyword"&gt;from&lt;/span&gt; datasette_auth_github &lt;span class="hljs-keyword"&gt;import&lt;/span&gt; GitHubAuth
&lt;span class="hljs-keyword"&gt;from&lt;/span&gt; starlette.applications &lt;span class="hljs-keyword"&gt;import&lt;/span&gt; Starlette
&lt;span class="hljs-keyword"&gt;from&lt;/span&gt; starlette.responses &lt;span class="hljs-keyword"&gt;import&lt;/span&gt; HTMLResponse
&lt;span class="hljs-keyword"&gt;import&lt;/span&gt; uvicorn

app = Starlette(debug=&lt;span class="hljs-keyword"&gt;True&lt;/span&gt;)


&lt;span class="hljs-decorator"&gt;@app.route("/")&lt;/span&gt;
&lt;span class="hljs-keyword"&gt;async&lt;/span&gt; &lt;span class="hljs-function"&gt;&lt;span class="hljs-keyword"&gt;def&lt;/span&gt; &lt;span class="hljs-title"&gt;homepage&lt;/span&gt;&lt;span class="hljs-params"&gt;(request)&lt;/span&gt;:&lt;/span&gt;
    &lt;span class="hljs-keyword"&gt;return&lt;/span&gt; HTMLResponse(&lt;span class="hljs-string"&gt;"Hello, {}"&lt;/span&gt;.format(
        repr(request.scope[&lt;span class="hljs-string"&gt;"auth"&lt;/span&gt;])
    ))


authenticated_app = GitHubAuth(
    app,
    client_id=&lt;span class="hljs-string"&gt;"986f5d837b45e32ee6dd"&lt;/span&gt;,
    client_secret=&lt;span class="hljs-string"&gt;"..."&lt;/span&gt;,
    require_auth=&lt;span class="hljs-keyword"&gt;True&lt;/span&gt;,
    allow_users=[&lt;span class="hljs-string"&gt;"simonw"&lt;/span&gt;],
)

&lt;span class="hljs-keyword"&gt;if&lt;/span&gt; __name__ == &lt;span class="hljs-string"&gt;"__main__"&lt;/span&gt;:
    uvicorn.run(authenticated_app, host=&lt;span class="hljs-string"&gt;"0.0.0.0"&lt;/span&gt;, port=&lt;span class="hljs-number"&gt;8000&lt;/span&gt;)
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;The middleware adds a &lt;code&gt;scope[&amp;quot;auth&amp;quot;]&lt;/code&gt; key describing the logged in user, which is then passed through to your application. More on this &lt;a href="https://github.com/simonw/datasette-auth-github#using-this-as-asgi-middleware-without-datasette"&gt;in the README&lt;/a&gt;.&lt;/p&gt;
&lt;h3&gt;&lt;a id="Your_security_reviews_needed_125"&gt;&lt;/a&gt;Your security reviews needed!&lt;/h3&gt;
&lt;p&gt;Since &lt;code&gt;datasette-auth-github&lt;/code&gt; adds authentication to Datasette, it is an extremely security-sensitive piece of code. So far I’m the only person who has looked at it: before I start widely recommending it to people I’d really like to get some more eyes on it to check for any potential security problems.&lt;/p&gt;
&lt;p&gt;I’ve opened &lt;a href="https://github.com/simonw/datasette-auth-github/issues/44"&gt;issue #44&lt;/a&gt; encouraging security-minded developers to have a dig through the code and see if there’s anything that can be tightened up or any potential vulnerabilities that need to be addressed. Please get involved!&lt;/p&gt;
&lt;p&gt;It’s a pretty small codebase, but here are some areas you might want to inspect:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;At a high level: is the way I’m verifying the user through the GitHub API and then storing their identity in a signed cookie the right way to go?&lt;/li&gt;
&lt;li&gt;The cookie signing secret is derived from the GitHub OAuth application’s &lt;code&gt;client_id&lt;/code&gt; and &lt;code&gt;client_secret&lt;/code&gt; (because that secret is already meant to be a secret), combined with the &lt;code&gt;cookie_version&lt;/code&gt; option described above - &lt;a href="https://github.com/simonw/datasette-auth-github/blob/bf01f8f01b87a6cb09c47380ba0a86e0546ebb38/datasette_auth_github/github_auth.py#L81-L88"&gt;implementation here&lt;/a&gt;. Since this is a derived secret I’m using &lt;a href="https://docs.python.org/3/library/hashlib.html#hashlib.pbkdf2_hmac"&gt;pbkdf2_hmac&lt;/a&gt; with 100,000 iterations. This is by far the most cryptographically interesting part of the code, and could definitely do with some second opinions.&lt;/li&gt;
&lt;li&gt;The code used &lt;a href="https://github.com/simonw/datasette-auth-github/blob/bf01f8f01b87a6cb09c47380ba0a86e0546ebb38/datasette_auth_github/utils.py#L16-L36"&gt;to sign and verify cookies&lt;/a&gt; is based on Django’s (thoroughly reviewed) implementation, but could benefit from a sanity check.&lt;/li&gt;
&lt;li&gt;I wanted this library to work on &lt;a href="https://glitch.com/"&gt;Glitch&lt;/a&gt;, which currently &lt;a href="https://support.glitch.com/t/can-you-upgrade-python-to-latest-version/7980"&gt;only provides Python 3.5.2&lt;/a&gt;. Python’s asyncio HTTP librarys such as &lt;a href="https://github.com/encode/http3"&gt;http3&lt;/a&gt; and &lt;a href="https://aiohttp.readthedocs.io/en/stable/"&gt;aiohttp&lt;/a&gt; both require more modern Pythons, so I ended up &lt;a href="https://github.com/simonw/datasette-auth-github/pull/40"&gt;rolling my own&lt;/a&gt; very simple async HTTP function which uses &lt;code&gt;urllib.request&lt;/code&gt; inside a &lt;code&gt;loop.run_in_executor&lt;/code&gt; thread pool. Is that approach sound? Rolling my own HTTP client in this way feels a little hairy.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;This has been a really fun project so far, and I’m very excited about the potential for authenticated Datasette moving forward - not to mention the possibilites unlocked by an ASGI middleware ecosystem with strong support for wrapping any application in an authentication layer.&lt;/p&gt;
    
        &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/github"&gt;github&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/middleware"&gt;middleware&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/projects"&gt;projects&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/security"&gt;security&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/datasette"&gt;datasette&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/asgi"&gt;asgi&lt;/a&gt;&lt;/p&gt;
    

</summary><category term="github"/><category term="middleware"/><category term="projects"/><category term="security"/><category term="datasette"/><category term="asgi"/></entry><entry><title>datasette-cors</title><link href="https://simonwillison.net/2019/Jul/8/datasette-cors/#atom-tag" rel="alternate"/><published>2019-07-08T04:30:53+00:00</published><updated>2019-07-08T04:30:53+00:00</updated><id>https://simonwillison.net/2019/Jul/8/datasette-cors/#atom-tag</id><summary type="html">
    
&lt;p&gt;&lt;strong&gt;&lt;a href="https://github.com/simonw/datasette-cors"&gt;datasette-cors&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;
My other Datasette ASGI plugin: this one wraps my asgi-cors project and lets you configure CORS access from a list of domains (or a set of domain wildcards) so you can make JavaScript calls to a Datasette instance from a specific set of other hosts.

    &lt;p&gt;&lt;small&gt;&lt;/small&gt;Via &lt;a href="https://twitter.com/simonw/status/1148084447687786498"&gt;@simonw&lt;/a&gt;&lt;/small&gt;&lt;/p&gt;


    &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/projects"&gt;projects&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/datasette"&gt;datasette&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/asgi"&gt;asgi&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/cors"&gt;cors&lt;/a&gt;&lt;/p&gt;



</summary><category term="projects"/><category term="datasette"/><category term="asgi"/><category term="cors"/></entry><entry><title>datasette-auth-github</title><link href="https://simonwillison.net/2019/Jul/8/datasette-auth-github/#atom-tag" rel="alternate"/><published>2019-07-08T04:28:17+00:00</published><updated>2019-07-08T04:28:17+00:00</updated><id>https://simonwillison.net/2019/Jul/8/datasette-auth-github/#atom-tag</id><summary type="html">
    
&lt;p&gt;&lt;strong&gt;&lt;a href="https://github.com/simonw/datasette-auth-github"&gt;datasette-auth-github&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;
My first big ASGI plugin for Datasette: datasette-auth-github adds the ability to require users to authenticate against the GitHub OAuth API. You can whitelist specific users, or you can restrict access to members of specific GitHub organizations or teams. While it’s structured as a Datasette plugin it also includes ASGI middleware which can be applied to any ASGI application.

    &lt;p&gt;&lt;small&gt;&lt;/small&gt;Via &lt;a href="https://twitter.com/simonw/status/1148085030448578561"&gt;@simonw&lt;/a&gt;&lt;/small&gt;&lt;/p&gt;


    &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/github"&gt;github&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/oauth"&gt;oauth&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/projects"&gt;projects&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/datasette"&gt;datasette&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/asgi"&gt;asgi&lt;/a&gt;&lt;/p&gt;



</summary><category term="github"/><category term="oauth"/><category term="projects"/><category term="datasette"/><category term="asgi"/></entry><entry><title>Porting Datasette to ASGI, and Turtles all the way down</title><link href="https://simonwillison.net/2019/Jun/23/datasette-asgi/#atom-tag" rel="alternate"/><published>2019-06-23T21:39:00+00:00</published><updated>2019-06-23T21:39:00+00:00</updated><id>https://simonwillison.net/2019/Jun/23/datasette-asgi/#atom-tag</id><summary type="html">
    &lt;p&gt;This evening I finally closed a &lt;a href="https://simonwillison.net/tags/datasette/"&gt;Datasette&lt;/a&gt; issue that I opened more than 13 months ago: &lt;a href="https://github.com/simonw/datasette/issues/272"&gt;#272: Port Datasette to ASGI&lt;/a&gt;. A few notes on why this is such an important step for the project.&lt;/p&gt;
&lt;p&gt;&lt;a href="https://asgi.readthedocs.io/"&gt;ASGI&lt;/a&gt; is the Asynchronous Server Gateway Interface standard. It’s been evolving steadily over the past few years under the guidance of Andrew Godwin. It’s intended as an asynchronous replacement for the venerable &lt;a href="https://wsgi.readthedocs.io/"&gt;WSGI&lt;/a&gt;.&lt;/p&gt;
&lt;h3&gt;&lt;a id="Turtles_all_the_way_down_6"&gt;&lt;/a&gt;Turtles all the way down&lt;/h3&gt;
&lt;p&gt;Ten years ago at EuroDjangoCon 2009 in Prague I gave a talk entitled &lt;a href="https://www.slideshare.net/simon/django-heresies"&gt;Django Heresies&lt;/a&gt;. After discussing some of the design decisions in Django that I didn’t think had aged well, I spent the last part of the talk talking about &lt;em&gt;Turtles all the way down&lt;/em&gt;. I &lt;a href="https://simonwillison.net/2009/May/19/djng/?#turtles-all-the-way-down"&gt;wrote that idea up here&lt;/a&gt; on my blog (see also &lt;a href="https://www.slideshare.net/simon/django-heresies/65-The_Django_Contract_A_view"&gt;these slides&lt;/a&gt;).&lt;/p&gt;
&lt;p&gt;The key idea was that Django would be more interesting if the core Django contract - a function that takes a request and returns a response - was extended to more places in the framework. The top level site, the reusable applications, middleware and URL routing could all share that same contract. Everything could be composed from the same raw building blocks.&lt;/p&gt;
&lt;p&gt;I’m excited about ASGI because it absolutely fits the &lt;em&gt;turtles all the way down&lt;/em&gt; model.&lt;/p&gt;
&lt;p&gt;The ASGI contract is an asynchronous function that takes three arguments:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;async def application(scope, receive, send):
    ...
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;code&gt;scope&lt;/code&gt; is a serializable dictionary providing the context for the current connection. &lt;code&gt;receive&lt;/code&gt; is an awaitable which can be used to recieve incoming messages. &lt;code&gt;send&lt;/code&gt; is an awaitable that can be used to send replies.&lt;/p&gt;
&lt;p&gt;It’s a pretty low-level set of primitives (and less obvious than a simple request/response) - and that’s because ASGI is about more than just the standard HTTP request/response cycle. This contract works for HTTP, WebSockets and potentially any other protocol that needs to asynchronously send and receive data.&lt;/p&gt;
&lt;p&gt;It’s an extremely elegant piece of protocol design, informed by Andrew’s experience with Django Channels, SOA protocols (we are co-workers at Eventbrite where we’ve both been heavily involved in Eventbrite’s &lt;a href="https://github.com/eventbrite/pysoa"&gt;SOA mechanism&lt;/a&gt;) and Andrew’s extensive conversations with other maintainers in the Python web community.&lt;/p&gt;
&lt;p&gt;The ASGI protocol really is turtles all the way down - it’s a simple, well defined contract which can be composed together to implement all kinds of interesting web architectural patterns.&lt;/p&gt;
&lt;p&gt;My &lt;a href="https://github.com/simonw/asgi-cors/"&gt;asgi-cors library&lt;/a&gt; was my first attempt at building an ASGI turtle. &lt;a href="https://github.com/simonw/asgi-cors/blob/master/asgi_cors.py"&gt;The implementation&lt;/a&gt; is a simple Python decorator which, when applied to another ASGI callable, adds HTTP CORS headers based on the parameters you pass to the decorator. The library has zero installation dependencies (it has test dependencies on pytest and friends) and can be used on any HTTP ASGI project.&lt;/p&gt;
&lt;p&gt;Building &lt;code&gt;asgi-cors&lt;/code&gt; completely sold me on ASGI as the turtle pattern I had been desiring for over a decade!&lt;/p&gt;
&lt;h3&gt;&lt;a id="Datasette_plugins_and_ASGI_31"&gt;&lt;/a&gt;Datasette plugins and ASGI&lt;/h3&gt;
&lt;p&gt;Which brings me to Datasette.&lt;/p&gt;
&lt;p&gt;One of the most promising components of Datasette is its plugin mechanism. Based on &lt;a href="https://pluggy.readthedocs.io/en/latest/"&gt;pluggy&lt;/a&gt; (extracted from pytest), &lt;a href="https://datasette.readthedocs.io/en/stable/plugins.html"&gt;Datasette Plugins&lt;/a&gt; allow new features to be added to Datasette without needing to change the underlying code. This means new features can be built, packaged and shipped entirely independently of the core project. A list of currently available plugins &lt;a href="https://datasette.readthedocs.io/en/latest/ecosystem.html#datasette-plugins"&gt;can be found here&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;WordPress is very solid blogging engine. Add in the plugin ecosystem around it and it can be used to build literally any CMS you can possibly imagine.&lt;/p&gt;
&lt;p&gt;My dream for Datasette is to apply the same model: I want a strong core for publishing and exploring data that’s enhanced by plugins to solve a huge array of data analysis, visualization and API-backed problems.&lt;/p&gt;
&lt;p&gt;Datasette has &lt;a href="https://datasette.readthedocs.io/en/latest/plugins.html#plugin-hooks"&gt;a range of plugin hooks already&lt;/a&gt;, but I’ve so far held back on implementing the most useful class of hooks: hooks that allow developers to add entirely new URL routes exposing completely custom functionality.&lt;/p&gt;
&lt;p&gt;The reason I held back is that I wanted to be confident that the contract I was offering was something I would continue to support moving forward. A plugin system isn’t much good if the core implementation keeps on changing in backwards-incompatible ways.&lt;/p&gt;
&lt;p&gt;ASGI is the exact contract I’ve been waiting for. It’s not quite ready yet, but you can follow &lt;a href="https://github.com/simonw/datasette/issues/520"&gt;#520: prepare_asgi plugin hook&lt;/a&gt; (thoughts and suggestions welcome!) to be the first to hear about this hook when it lands. I’m planning to use it to make my asgi-cors library available as a plugin, after which I’m excited to start exploring the idea of bringing authentication plugins to Datasette (and to the wider ASGI world in general).&lt;/p&gt;
&lt;p&gt;I’m hoping that many Datasette ASGI plugins will exist in a form that allows them to be used by other ASGI applications as well.&lt;/p&gt;
&lt;p&gt;I also plan to use ASGI to make components of Datasette itself available to other ASGI applications. If you just want a single instance of Datasette’s &lt;a href="https://datasette.readthedocs.io/en/stable/pages.html#table"&gt;table view&lt;/a&gt; to be embedded somewhere in your URL configuration you should be able to do that by routing traffic directly to the ASGI-compatible view class.&lt;/p&gt;
&lt;p&gt;I’m really excited about exploring the intersection of ASGI turtles-all-the-way-down and pluggy’s powerful mechanism for gluing components together. Both WSGI and Django’s reusable apps have attempted to create a reusable ecosystem in the past, to limited levels of success. Let’s see if ASGI can finally make the turtle dream come true.&lt;/p&gt;

&lt;h3&gt;&lt;a id="Further_reading_53"&gt;&lt;/a&gt;Further reading&lt;/h3&gt;
&lt;p&gt;&lt;a href="https://www.encode.io/articles/hello-asgi/"&gt;Hello ASGI&lt;/a&gt; by Tom Christie is the best introduction to ASGI I’ve seen. Tom is the author of the &lt;a href="https://www.uvicorn.org/"&gt;Uvicorn&lt;/a&gt; ASGI server (used by Datasette as-of this evening) and &lt;a href="https://www.starlette.io/"&gt;Starlette&lt;/a&gt;, a delightfully well-designd ASGI web framework. I’ve learned an enormous amount about ASGI by reading Tom’s code. Tom also gave &lt;a href="https://www.youtube.com/watch?v=u8GSFEg5lnU"&gt;a talk about ASGI&lt;/a&gt; at DjangoCon Europe a few months ago.&lt;/p&gt;
&lt;p&gt;If you haven’t read &lt;a href="https://www.aeracode.org/2018/06/04/django-async-roadmap/"&gt;A Django Async Roadmap&lt;/a&gt; by Andrew Godwin last year you should absolutely catch up. More than just talking about ASGI, Andrew sketches out a detailed and actionable plan for bringing asyncio to Django core. Andrew landeded &lt;a href="https://github.com/django/django/pull/11209"&gt;the first Django core ASGI code&lt;/a&gt; based on the plan just a few days ago.&lt;/p&gt;
&lt;p&gt;If you're interested in the details of Datasette's ASGI implementation, I posted &lt;a href="https://github.com/simonw/datasette/issues/272"&gt;detailed commentary on issue #272&lt;/a&gt; over the past thirteen months as I researched and finalized my approach. I added further commentary to &lt;a href="https://github.com/simonw/datasette/pull/518"&gt;the associated pull request&lt;/a&gt;, which gathers together the 34 commits it took to ship the feature (squashed into a single commit to master).&lt;/p&gt;
    
        &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/andrew-godwin"&gt;andrew-godwin&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/projects"&gt;projects&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/datasette"&gt;datasette&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/asgi"&gt;asgi&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/tom-christie"&gt;tom-christie&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/pytest"&gt;pytest&lt;/a&gt;&lt;/p&gt;
    

</summary><category term="andrew-godwin"/><category term="projects"/><category term="datasette"/><category term="asgi"/><category term="tom-christie"/><category term="pytest"/></entry><entry><title>asgi-cors</title><link href="https://simonwillison.net/2019/May/7/asgi-cors/#atom-tag" rel="alternate"/><published>2019-05-07T00:12:37+00:00</published><updated>2019-05-07T00:12:37+00:00</updated><id>https://simonwillison.net/2019/May/7/asgi-cors/#atom-tag</id><summary type="html">
    
&lt;p&gt;&lt;strong&gt;&lt;a href="https://github.com/simonw/asgi-cors"&gt;asgi-cors&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;
I’ve been trying out the new ASGI 3.0 spec and I just released my first piece of ASGI middleware: asgi-cors, which lets you wrap an ASGI application with Access-Control-Allow-Origin CORS headers (either “*” or dynamic headers based on an origin whitelist).

    &lt;p&gt;&lt;small&gt;&lt;/small&gt;Via &lt;a href="https://twitter.com/simonw/status/1125553970007568384"&gt;@simonw&lt;/a&gt;&lt;/small&gt;&lt;/p&gt;


    &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/projects"&gt;projects&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/security"&gt;security&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/asgi"&gt;asgi&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/cors"&gt;cors&lt;/a&gt;&lt;/p&gt;



</summary><category term="projects"/><category term="security"/><category term="asgi"/><category term="cors"/></entry><entry><title>Hello world for ASGI running on Glitch</title><link href="https://simonwillison.net/2019/Apr/26/hello-world-asgi-running-glitch/#atom-tag" rel="alternate"/><published>2019-04-26T05:06:12+00:00</published><updated>2019-04-26T05:06:12+00:00</updated><id>https://simonwillison.net/2019/Apr/26/hello-world-asgi-running-glitch/#atom-tag</id><summary type="html">
    
&lt;p&gt;&lt;strong&gt;&lt;a href="https://glitch.com/~asgi"&gt;Hello world for ASGI running on Glitch&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;
I’m continuing to experiment with Python 3 running on Glitch. This evening on my walk home from work I built this “hello world” demo on my phone, partly to see if Glitch was a workable mobile development environment—it passed with flying colours! The demo is a simple hello world implemented using the new ASGI 3.0 specification, running on the daphne reference server. Click the “via” link for my accompanying thread on Twitter, which includes a short screencast (also recorded on my phone) showing Glitch in action.

    &lt;p&gt;&lt;small&gt;&lt;/small&gt;Via &lt;a href="https://twitter.com/simonw/status/1121572280637595648"&gt;@simonw&lt;/a&gt;&lt;/small&gt;&lt;/p&gt;


    &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/glitch"&gt;glitch&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/projects"&gt;projects&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/asgi"&gt;asgi&lt;/a&gt;&lt;/p&gt;



</summary><category term="glitch"/><category term="projects"/><category term="asgi"/></entry><entry><title>Quoting Tom Christie</title><link href="https://simonwillison.net/2018/Oct/8/tom-christie/#atom-tag" rel="alternate"/><published>2018-10-08T14:43:16+00:00</published><updated>2018-10-08T14:43:16+00:00</updated><id>https://simonwillison.net/2018/Oct/8/tom-christie/#atom-tag</id><summary type="html">
    &lt;blockquote cite="https://www.encode.io/articles/hello-asgi/"&gt;&lt;p&gt;The ASGI specification provides an opportunity for Python to hit a productivity/performance sweet-spot for a wide range of use-cases, from writing high-volume proxy servers through to bringing large-scale web applications to market at speed.&lt;/p&gt;&lt;/blockquote&gt;
&lt;p class="cite"&gt;&amp;mdash; &lt;a href="https://www.encode.io/articles/hello-asgi/"&gt;Tom Christie&lt;/a&gt;&lt;/p&gt;

    &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/async"&gt;async&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/python"&gt;python&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/asgi"&gt;asgi&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/tom-christie"&gt;tom-christie&lt;/a&gt;&lt;/p&gt;



</summary><category term="async"/><category term="python"/><category term="asgi"/><category term="tom-christie"/></entry></feed>