Simon Willison's Weblog: Datasette: The annotated release noteshttp://simonwillison.net/2024-02-07T16:37:46+00:00Simon WillisonDatasette 1.0a8: JavaScript plugins, new plugin hooks and plugin configuration in datasette.yaml2024-02-07T16:37:46+00:002024-02-07T16:37:46+00:00https://simonwillison.net/2024/Feb/7/datasette-1a8/#atom-series
<p>I just released <a href="https://docs.datasette.io/en/1.0a8/changelog.html#a8-2024-02-07">Datasette 1.0a8</a>. These are the <a href="https://simonwillison.net/tags/annotatedreleasenotes/">annotated release notes</a>.</p>
<blockquote>
<p>This alpha release continues the migration of Datasette's configuration from <code>metadata.yaml</code> to the new <code>datasette.yaml</code> configuration file, introduces a new system for JavaScript plugins and adds several new plugin hooks.</p>
</blockquote>
<p>My plan is for this to be the last alpha that adds new features - the new plugin hooks, in this case. The next release will focus on wrapping up the stable APIs for 1.0, with a particular focus on template stability (so users can customize Datasette without fear of it breaking in future minor releases) and wrapping up the work on the stable JSON API.</p>
<h4 id="datasette-10a8-config">Configuration</h4>
<blockquote>
<ul>
<li><p>Plugin configuration now lives in the <a href="https://docs.datasette.io/en/1.0a8/configuration.html#configuration">datasette.yaml configuration file</a>, passed to Datasette using the <code>-c/--config</code> option. Thanks, Alex Garcia. (<a href="https://github.com/simonw/datasette/issues/2093">#2093</a>)</p>
<div class="highlight highlight-source-shell"><pre>datasette -c datasette.yaml</pre></div>
<p>Where <code>datasette.yaml</code> contains configuration that looks like this:</p>
<div class="highlight highlight-source-yaml"><pre><span class="pl-ent">plugins</span>:
<span class="pl-ent">datasette-cluster-map</span>:
<span class="pl-ent">latitude_column</span>: <span class="pl-s">xlat</span>
<span class="pl-ent">longitude_column</span>: <span class="pl-s">xlon</span>
</pre></div>
</li>
<li>Previously plugins were configured in <code>metadata.yaml</code>, which was confusing as plugin settings were unrelated to database and table metadata.</li>
</ul>
</blockquote>
<p>This almost concludes the work (driven mainly by Alex Garcia) to clean up how Datasette is configured prior to the 1.0 release. Moving things that aren't metadata out of the <code>metadata.yaml/json</code> file is a big conceptual improvement, and one that absolutely needed to happen before 1.0.</p>
<blockquote>
<ul>
<li>
<p>The <code>-s/--setting</code> option can now be used to set plugin configuration as well. See <a href="https://docs.datasette.io/en/1.0a8/configuration.html#configuration-cli">Configuration via the command-line</a> for details. (<a href="https://github.com/simonw/datasette/issues/2252">#2252</a>)</p>
<p>The above YAML configuration example using <code>-s/--setting</code> looks like this:</p>
<div class="highlight highlight-source-shell"><pre>datasette mydatabase.db\
-s plugins.datasette-cluster-map.latitude_column xlat \
-s plugins.datasette-cluster-map.longitude_column xlon</pre></div>
</li>
</ul>
</blockquote>
<p>This feature is mainly for me. I start new Datasette instances dozens of times a day to try things out, and having to manually edit a <code>datasette.yaml</code> file before trying something new is an annoying little piece of friction.</p>
<p>With the <code>-s</code> option anything that can be represented in JSON or YAML can also be passed on the command-line.</p>
<p>I mainly love this as a copy-and-paste mechanism: my notes are crammed with <code>datasette</code> shell one-liners, and being able to paste something into my terminal to recreate a Datasette instance with a specific configuration is a big win.</p>
<p>The <code>-s</code> command uses dot-notation to specify nested keys, but it has a simple mechanism for representing more complex objects too: you can pass them in as JSON literal strings and Datasette will parse them. The <a href="">--setting documentation</a> includes this example of configuring <a href="https://datasette.io/plugins/datasette-proxy-url">datasette-proxy-url</a>:</p>
<div class="highlight highlight-source-shell"><pre>datasette mydatabase.db \
-s plugins.datasette-proxy-url.paths <span class="pl-s"><span class="pl-pds">'</span>[{"path": "/proxy", "backend": "http://example.com/"}]<span class="pl-pds">'</span></span></pre></div>
<p>Which is equivalent to the following <code>datasette.yaml</code> file:</p>
<div class="highlight highlight-source-yaml"><pre><span class="pl-ent">plugins</span>:
<span class="pl-ent">datasette-proxy-url</span>:
<span class="pl-ent">paths</span>:
- <span class="pl-ent">path</span>: <span class="pl-s">/proxy</span>
<span class="pl-ent">backend</span>: <span class="pl-s">http://example.com/</span></pre></div>
<blockquote>
<ul>
<li>The new <code>/-/config</code> page shows the current instance configuration, after redacting keys that could contain sensitive data such as API keys or passwords. (<a href="https://github.com/simonw/datasette/issues/2254">#2254</a>)</li>
</ul>
</blockquote>
<p>Datasette has a set of <a href="https://docs.datasette.io/en/1.0a8/introspection.html">introspection endpoints</a> like this - <code>/-/metadata</code> and <code>/-/settings</code> and <code>/-/threads</code>, all of which can have <code>.json</code> added to get back the raw JSON. I find them really useful for debugging instances and understanding how they have been configured.</p>
<p>The redaction is new: previously I had designed a mechanism for passing secrets as environment variables in a way that would avoid them being exposed here, but I realized automated redaction is less likely to cause people to leak secrets by accident.</p>
<blockquote>
<ul>
<li>Existing Datasette installations may already have configuration set in <code>metadata.yaml</code> that should be migrated to <code>datasette.yaml</code>. To avoid breaking these installations, Datasette will silently treat table configuration, plugin configuration and allow blocks in metadata as if they had been specified in configuration instead. (<a href="https://github.com/simonw/datasette/issues/2247">#2247</a>) (<a href="https://github.com/simonw/datasette/issues/2248">#2248</a>) (<a href="https://github.com/simonw/datasette/issues/2249">#2249</a>)</li>
</ul>
</blockquote>
<p>Originally the plan was to have Datasette fail to load if it spotted configuration in <code>metadata.yaml</code> that should have been migrated to <code>datasette.yaml</code>.</p>
<p>I changed my mind about this mainly as I experienced the enormous inconvenience of updating all of my Datasette instances to the new format - including rewriting the automated tests for my plugins.</p>
<p>I think my philosophy on this going forward is going to be that Datasette will take extra effort to keep older things working provided the additional code complexity in doing so is low enough to make it worth the trade-off. In this case I think it is.</p>
<blockquote>
<p>Note that the <code>datasette publish</code> command has not yet been updated to accept a <code>datasette.yaml</code> configuration file. This will be addressed in <a href="https://github.com/simonw/datasette/issues/2195">#2195</a> but for the moment you can include those settings in <code>metadata.yaml</code> instead.</p>
</blockquote>
<p>I promised myself I would ship 1.0a8 today no matter what, so I cut this feature at the last moment.</p>
<h3>JavaScript plugins</h3>
<blockquote>
<p>Datasette now includes a <a href="https://docs.datasette.io/en/1.0a8/javascript_plugins.html#javascript-plugins">JavaScript plugins mechanism</a>, allowing JavaScript to customize Datasette in a way that can collaborate with other plugins.</p>
<p>This provides two initial hooks, with more to come in the future:</p>
<ul>
<li>
<a href="https://docs.datasette.io/en/1.0a8/javascript_plugins.html#javascript-plugins-makeabovetablepanelconfigs">makeAboveTablePanelConfigs()</a> can add additional panels to the top of the table page.</li>
<li>
<a href="https://docs.datasette.io/en/1.0a8/javascript_plugins.html#javascript-plugins-makecolumnactions">makeColumnActions()</a> can add additional actions to the column menu.</li>
</ul>
<p>Thanks <a href="https://github.com/hydrosquall">Cameron Yick</a> for contributing this feature. (<a href="https://github.com/simonw/datasette/pull/2052">#2052</a>)</p>
</blockquote>
<p>The core problem we are trying to solve here comes from what happens when multiple plugins all try to customize the Datasette instance at the same time.</p>
<p>This is particularly important for visualization plugins.</p>
<p>An example: <a href="https://datasette.io/plugins/datasette-cluster-map">datasette-cluster-map</a> and <a href="https://datasette.io/plugins/datasette-geojson-map">datasette-geojson-map</a> both add a map to the top of the table page. This means if you have both plugins installed you can end up with two maps!</p>
<p>The new mechanism allows plugins to collaborate: each plugin can contribute one or more "panels" which will then be shown above the table view in an interface with toggles to switch between them.</p>
<p>The column actions mechanism is similar: it allows plugins to contribute additional actions to the column menu, which appears when you click the cog icon in the header of a table column.</p>
<p>Cameron Yick did a great job with this feature. I've been slow in getting a release out with it though - my hope is that we can iterate more productively on it now that it's in an alpha release.</p>
<h4 id="datasette-10a8-plugin-hooks">Plugin hooks</h4>
<blockquote>
<ul>
<li>New <a href="https://docs.datasette.io/en/1.0a8/plugin_hooks.html#plugin-hook-jinja2-environment-from-request">jinja2_environment_from_request(datasette, request, env)</a> plugin hook, which can be used to customize the current Jinja environment based on the incoming request. This can be used to modify the template lookup path based on the incoming request hostname, among other things. (<a href="https://github.com/simonw/datasette/issues/2225">#2225</a>)</li>
</ul>
</blockquote>
<p>I wrote about my need for this in <a href="https://simonwillison.net/2024/Jan/7/page-caching-and-custom-templates-for-datasette-cloud/">Page caching and custom templates for Datasette Cloud</a>: I wanted a way to modify the Jinja environment based on the requested HTTP host, and this lets me do that.</p>
<blockquote>
<ul>
<li>New <a href="https://docs.datasette.io/en/1.0a8/plugin_hooks.html#plugin-hook-slots">family of template slot plugin hooks</a>: <code>top_homepage</code>, <code>top_database</code>, <code>top_table</code>, <code>top_row</code>, <code>top_query</code>, <code>top_canned_query</code>. Plugins can use these to provide additional HTML to be injected at the top of the corresponding pages. (<a href="https://github.com/simonw/datasette/issues/1191">#1191</a>)</li>
</ul>
</blockquote>
<p>Another long-running need (<a href="https://github.com/simonw/datasette/issues/1191">the issue</a> is from January 2021). Similar to the JavaScript plugin mechanism, this allows multiple plugins to add content to the page without one plugin overwriting the other.</p>
<h4 id="datasette-events">The new Datasette Events system</h4>
<blockquote>
<ul>
<li>
<p>New <a href="https://docs.datasette.io/en/1.0a8/plugin_hooks.html#plugin-event-tracking">track_event() mechanism</a> for plugins to emit and receive events when certain events occur within Datasette. (<a href="https://github.com/simonw/datasette/issues/2240">#2240</a>)</p>
<ul>
<li>Plugins can register additional event classes using <a href="https://docs.datasette.io/en/1.0a8/plugin_hooks.html#plugin-hook-register-events">register_events(datasette)</a>.</li>
<li>They can then trigger those events with the <a href="https://docs.datasette.io/en/1.0a8/internals.html#datasette-track-event">datasette.track_event(event)</a> internal method.</li>
<li>Plugins can subscribe to notifications of events using the <a href="https://docs.datasette.io/en/1.0a8/plugin_hooks.html#plugin-hook-track-event">track_event(datasette, event)</a> plugin hook.</li>
<li>Datasette core now emits
<code>login</code>, <code>logout</code>, <code>create-token</code>, <code>create-table</code>, <code>drop-table</code>, <code>insert-rows</code>, <code>upsert-rows</code>, <code>update-row</code>, <code>delete-row</code> events, <a href="https://docs.datasette.io/en/1.0a8/events.html">documented here</a>.</li>
</ul>
</li>
</ul>
</blockquote>
<p>Another hook inspired by Datasette Cloud. I want better analytics for that product to help track which features are being used, but I also wanted to do that in a privacy-forward manner. I decided to bake it into Datasette core and I intend to make it visible to the administrators of Datasette Cloud instances - so that it doubles as an audit log for what's happening in their instances.</p>
<p>I realized that this has uses beyond analytics: if a plugin wants to do something extra any time a new table is created within Datasette it can use the <code>track_events()</code> plugin hook to listen out for the <code>create-table</code> event and take action when it occurs.</p>
<blockquote>
<ul>
<li>New internal function for plugin authors: <a href="https://docs.datasette.io/en/1.0a8/internals.html#database-execute-isolated-fn">await db.execute_isolated_fn(fn)</a>, for creating a new SQLite connection, executing code and then closing that connection, all while preventing other code from writing to that particular database. This connection will not have the <a href="https://docs.datasette.io/en/1.0a8/plugin_hooks.html#plugin-hook-prepare-connection">prepare_connection()</a> plugin hook executed against it, allowing plugins to perform actions that might otherwise be blocked by existing connection configuration. (<a href="https://github.com/simonw/datasette/issues/2218">#2218</a>)</li>
</ul>
</blockquote>
<p>This came about because I was trying to figure out a way to use <code>prepare_connection()</code> hook to add authorizers that prevent users from deleting certain tables, but found that doing this prevented <code>VACUUM</code> from working.</p>
<p>The new internal function provides a clean slate for plugins to do anything they like with a SQLite connection, while simultaneously preventing any write operations from other code from executing (even against other connections) until that isolated operation is complete.</p>
<h4 id="datasette-10a8-docs">Documentation</h4>
<blockquote>
<ul>
<li>Documentation describing <a href="https://docs.datasette.io/en/1.0a8/testing_plugins.html#testing-datasette-client">how to write tests that use signed actor cookies</a> using <code>datasette.client.actor_cookie()</code>. (<a href="https://github.com/simonw/datasette/issues/1830">#1830</a>)</li>
<li>Documentation on how to <a href="https://docs.datasette.io/en/1.0a8/testing_plugins.html#testing-plugins-register-in-test">register a plugin for the duration of a test</a>. (<a href="https://github.com/simonw/datasette/issues/2234">#2234</a>)</li>
<li>The <a href="https://docs.datasette.io/en/1.0a8/configuration.html#configuration">configuration documentation</a> now shows examples of both YAML and JSON for each setting.</li>
</ul>
</blockquote>
<p>I like including links to new documentation in the release notes, to give people a chance to catch useful new documentation that they might otherwise miss.</p>
<h4 id="datasette-10a8-fixes">Minor fixes</h4>
<blockquote>
<ul>
<li>Datasette no longer attempts to run SQL queries in parallel when rendering a table page, as this was leading to some rare crashing bugs. (<a href="https://github.com/simonw/datasette/issues/2189">#2189</a>)</li>
<li>Fixed warning: <code>DeprecationWarning: pkg_resources is deprecated as an API</code> (<a href="https://github.com/simonw/datasette/issues/2057">#2057</a>)</li>
<li>Fixed bug where <code>?_extra=columns</code> parameter returned an incorrectly shaped response. (<a href="https://github.com/simonw/datasette/issues/2230">#2230</a>)</li>
</ul>
</blockquote>
<p>Surprisingly few bug fixes in this alpha - most of the work in the last few months has been new features. I think this is a good sign in terms of working towards a stable 1.0.</p>
Datasette 1.0a4 and 1.0a5, plus weeknotes2023-08-30T14:33:35+00:002023-08-30T14:33:35+00:00https://simonwillison.net/2023/Aug/30/datasette-plus-weeknotes/#atom-series
<p>Two new alpha releases of Datasette, plus a keynote at WordCamp, a new LLM release, two new LLM plugins and a flurry of TILs.</p>
<h4>Datasette 1.0a5</h4>
<p>Released this morning, <a href="https://docs.datasette.io/en/1.0a5/changelog.html">Datasette 1.0a5</a> has some exciting new changes driven by Datasette Cloud and the ongoing march towards Datasette 1.0.</p>
<p><a href="https://alexgarcia.xyz/">Alex Garcia</a> is working with me on Datasette Cloud and Datasette generally, generously sponsored by <a href="https://fly.io/">Fly</a>.</p>
<p>Two of the changes in 1.0a5 were driven by Alex:</p>
<blockquote>
<p>New <code>datasette.yaml</code> (or <code>.json</code>) configuration file, which can be specified using <code>datasette -c path-to-file</code>. The goal here to consolidate settings, plugin configuration, permissions, canned queries, and other Datasette configuration into a single single file, separate from <code>metadata.yaml</code>. The legacy <code>settings.json</code> config file used for <a href="https://docs.datasette.io/en/1.0a5/settings.html#config-dir">Configuration directory mode</a> has been removed, and <code>datasette.yaml</code> has a <code>"settings"</code> section where the same settings key/value pairs can be included. In the next future alpha release, more configuration such as plugins/permissions/canned queries will be moved to the <code>datasette.yaml</code> file. See <a href="https://github.com/simonw/datasette/issues/2093">#2093</a> for more details.</p>
</blockquote>
<p>Right from the very start of the project, Datasette has supported specifying metadata about databases - sources, licenses, etc, as a <code>metadata.json</code> file that can be passed to Datasette like this:</p>
<div class="highlight highlight-source-shell"><pre>datasette data.db -m metadata.json</pre></div>
<p>Over time, the purpose and uses of that file has expanded in all kinds of different directions. It can be used <a href="https://docs.datasette.io/en/1.0a5/plugins.html#plugin-configuration">for plugin settings</a>, and to set preferences for a table default page size, <a href="https://docs.datasette.io/en/1.0a5/facets.html#facets-in-metadata">default facets</a> etc), and even to <a href="https://docs.datasette.io/en/1.0a5/authentication.html#access-permissions-in-metadata">configure access permissions</a> for who can view what.</p>
<p>The name <code>metadata.json</code> is entirely inappropriate for what the file actually does. It's a mess.</p>
<p>I've always had a desire to fix this before Datasette 1.0, but it never quite got high up enough the priority list for me to spend time on it.</p>
<p>Alex <a href="https://github.com/simonw/datasette/issues/2093">expressed interest in fixing it</a>, and has started to put a plan into motion for cleaning it up.</p>
<p>More details <a href="https://github.com/simonw/datasette/issues/2093">in the issue</a>.</p>
<blockquote>
<p>The Datasette <code>_internal</code> database has had some changes. It no longer shows up in the <code>datasette.databases</code> list by default, and is now instead available to plugins using the <code>datasette.get_internal_database()</code>. Plugins are invited to use this as a private database to store configuration and settings and secrets that should not be made visible through the default Datasette interface. Users can pass the new <code>--internal internal.db</code> option to persist that internal database to disk. (<a href="https://github.com/simonw/datasette/issues/2157">#2157</a>).</p>
</blockquote>
<p>This was the other initiative driven by Alex. In working on Datasette Cloud we realized that it's actually quite common for plugins to need somewhere to store data that shouldn't necessarily be visible to regular users of a Datasette instance - things like tokens created by <a href="https://datasette.io/plugins/datasette-auth-tokens">datasette-auth-tokens</a>, or the progress bar mechanism used by <a href="https://datasette.io/plugins/datasette-upload-csvs">datasette-upload-csvs</a>.</p>
<p>Alex pointed out that the existing <code>_internal</code> database for Datasette could be expanded to cover these use-cases as well. <a href="https://github.com/simonw/datasette/issues/2157">#2157</a> has more details on how we agreed this should work.</p>
<p>The other changes in 1.0a5 were driven by me:</p>
<blockquote>
<p>When restrictions are applied to <a href="https://docs.datasette.io/en/1.0a5/authentication.html#createtokenview">API tokens</a>, those restrictions now behave slightly differently: applying the <code>view-table</code> restriction will imply the ability to <code>view-database</code> for the database containing that table, and both <code>view-table</code> and <code>view-database</code> will imply <code>view-instance</code>. Previously you needed to create a token with restrictions that explicitly listed <code>view-instance</code> and <code>view-database</code> and <code>view-table</code> in order to view a table without getting a permission denied error. (<a href="https://github.com/simonw/datasette/issues/2102">#2102</a>)</p>
</blockquote>
<p>I <a href="https://simonwillison.net/2022/Dec/15/datasette-1a2/#finely-grained-permissions">described finely-grained permissions</a> for access tokens in my annotated release notes for 1.0a2.</p>
<p>They provide a mechanism for creating an API token that's only allowed to perform a subset of actions on behalf of the user.</p>
<p>In trying these out for Datasette Cloud I came across a nasty usability flaw. You could create a token that was restricted to <code>view-table</code> access for a specific table... and it wouldn't work. Because the access code for that view would check for <code>view-instance</code> and <code>view-database</code> permission first.</p>
<p>1.0a5 fixes that, by adding logic that says that if a token can <code>view-table</code> that implies it can <code>view-database</code> for the database containing that table, and <code>view-instance</code> for the overall instance.</p>
<p>This change took quite some time to develop, because any time I write code involving permissions I like to also include extremely comprehensive automated tests.</p>
<blockquote>
<p>The <code>-s/--setting</code> option can now take dotted paths to nested settings. These will then be used to set or over-ride the same options as are present in the new configuration file. (<a href="https://github.com/simonw/datasette/issues/2156">#2156</a>)</p>
</blockquote>
<p>This is a fun little detail inspired by Alex's configuration work.</p>
<p>I run a lot of different Datasette instances, often on an ad-hoc basis.</p>
<p>I sometimes find it frustrating that to use certain features I need to create a <code>metadata.json</code> (soon to be <code>datasette.yml</code>) configuration file, just to get something to work.</p>
<p>Wouldn't it be neat if every possible setting for Datasette could be provided both in a configuration file or as command-line options?</p>
<p>That's what the new <code>--setting</code> option aims to solve. Anything that can be represented as a JSON or YAML configuration can now also be represented as key/value pairs on the command-line.</p>
<p>Here's an example <a href="https://github.com/simonw/datasette/issues/2143#issuecomment-1690792514">from my initial issue comment</a>:</p>
<div class="highlight highlight-source-shell"><pre>datasette \
-s settings.sql_time_limit_ms 1000 \
-s plugins.datasette-auth-tokens.manage_tokens <span class="pl-c1">true</span> \
-s plugins.datasette-auth-tokens.manage_tokens_database tokens \
-s plugins.datasette-ripgrep.path <span class="pl-s"><span class="pl-pds">"</span>/home/simon/code-to-search<span class="pl-pds">"</span></span> \
-s databases.mydatabase.tables.example_table.sort created \
mydatabase.db tokens.db</pre></div>
<p>Once this feature is complete, the above will behave the same as a <code>datasette.yml</code> file containing this:</p>
<div class="highlight highlight-source-yaml"><pre><span class="pl-ent">plugins</span>:
<span class="pl-ent">datasette-auth-tokens</span>:
<span class="pl-ent">manage_tokens</span>: <span class="pl-c1">true</span>
<span class="pl-ent">manage_tokens_database</span>: <span class="pl-s">tokens</span>
<span class="pl-ent">datasette-ripgrep</span>:
<span class="pl-ent">path</span>: <span class="pl-s">/home/simon/code-to-search</span>
<span class="pl-ent">databases</span>:
<span class="pl-ent">mydatabase</span>:
<span class="pl-ent">tables</span>:
<span class="pl-ent">example_table</span>:
<span class="pl-ent">sort</span>: <span class="pl-s">created</span>
<span class="pl-ent">settings</span>:
<span class="pl-ent">sql_time_limit_ms</span>: <span class="pl-c1">1000</span></pre></div>
<p>I've experimented with ways of turning key/value pairs into nested JSON objects before, with my <a href="https://github.com/simonw/json-flatten">json-flatten</a> library.</p>
<p>This time I took a slightly different approach. In particular, if you need to pass a nested JSON object (such as an array) which isn't easily represented using <code>key.nested</code> notation, you can pass it like this instead:</p>
<div class="highlight highlight-source-shell"><pre>datasette data.db \
-s plugins.datasette-complex-plugin.configs \
<span class="pl-s"><span class="pl-pds">'</span>{"foo": [1,2,3], "bar": "baz"}<span class="pl-pds">'</span></span></pre></div>
<p>Which would convert to the following equivalent YAML:</p>
<div class="highlight highlight-source-yaml"><pre><span class="pl-ent">plugins</span>:
<span class="pl-ent">datasette-complex-plugin</span>:
<span class="pl-ent">configs</span>:
<span class="pl-ent">foo</span>:
- <span class="pl-c1">1</span>
- <span class="pl-c1">2</span>
- <span class="pl-c1">3</span>
<span class="pl-ent">bar</span>: <span class="pl-s">baz</span></pre></div>
<p>These examples don't quite work yet, because the plugin configuration hasn't migrated to <code>datasette.yml</code> - but it should work for the next alpha.</p>
<blockquote>
<p>New <code>--actor '{"id": "json-goes-here"}'</code> option for use with <code>datasette --get</code> to treat the simulated request as being made by a specific actor, see <a href="https://docs.datasette.io/en/1.0a5/cli-reference.html#cli-datasette-get">datasette --get</a>. (<a href="https://github.com/simonw/datasette/issues/2153">#2153</a>)</p>
</blockquote>
<p>This is a fun little debug helper I built while working on restricted tokens.</p>
<p>The <code>datasette --get /...</code> option is a neat trick that can be used to simulate an HTTP request through the Datasette instance, without even starting a server running on a port.</p>
<p>I use it for things like <a href="https://til.simonwillison.net/shot-scraper/social-media-cards">generating social media card images</a> for my TILs website.</p>
<p>The new <code>--actor</code> option lets you add a simulated <a href="https://docs.datasette.io/en/latest/authentication.html#actors">actor</a> to the request, which is useful for testing out things like configured authentication and permissions.</p>
<h4>A security fix in Datasette 1.0a4</h4>
<p><a href="https://docs.datasette.io/en/latest/changelog.html#a4-2023-08-21">Datasette 1.0a4</a> has a security fix: I realized that the API explorer I added in the 1.0 alpha series was exposing the names of databases and tables (though not their actual content) to unauthenticated users, even for Datasette instances that were protected by authentication.</p>
<p>I issued a GitHub security advisory for this: <a href="https://github.com/simonw/datasette/security/advisories/GHSA-7ch3-7pp7-7cpq">Datasette 1.0 alpha series leaks names of databases and tables to unauthenticated users</a>, which has since been issued a CVE, <a href="https://nvd.nist.gov/vuln/detail/CVE-2023-40570">CVE-2023-40570</a> - GitHub is <a href="https://docs.github.com/en/code-security/security-advisories/repository-security-advisories/about-repository-security-advisories#cve-identification-numbers">a CVE Numbering Authority</a> which means their security team are trusted to review such advisories and issue CVEs where necessary.</p>
<p>I expect the impact of this vulnerability to be very small: outside of <a href="https://www.datasette.cloud/">Datasette Cloud</a> very few people are running the Datasette 1.0 alphas on the public internet, and it's possible that the set of those users who are also authenticating their instances to provide authenticated access to private data - especially where just the database and table names of that data is considered sensitive - is an empty set.</p>
<p>Datasette Cloud itself has detailed access logs primarily to help evaluate this kind of threat. I'm pleased to report that those logs showed no instances of an unauthenticated user accessing the pages in question prior to the bug being fixed.</p>
<h4>A keynote at WordCamp US</h4>
<p>Last Friday I gave a keynote at <a href="https://us.wordcamp.org/2023/">WordCamp US</a> on the subject of Large Language Models.</p>
<p>I used <a href="https://goodsnooze.gumroad.com/l/macwhisper">MacWhisper</a> and my <a href="https://simonwillison.net/2023/Aug/6/annotated-presentations/">annotated presentation tool</a> to turn that into a detailed transcript, complete with additional links and context: <a href="https://simonwillison.net/2023/Aug/27/wordcamp-llms/">Making Large Language Models work for you</a>.</p>
<h4>llm-openrouter and llm-anyscale-endpoints</h4>
<p>I released two new plugins for <a href="https://llm.datasette.io/">LLM</a>, which lets you run large language models either locally or via APIs, as both a CLI tool and a Python library.</p>
<p>Both plugins provide access to API-hosted models:</p>
<ul>
<li>
<strong><a href="https://github.com/simonw/llm-openrouter">llm-openrouter</a></strong> provides access to <a href="https://openrouter.ai/docs#models">models</a> hosted by <a href="https://openrouter.ai/">OpenRouter</a>. Of particular interest here is Claude - I'm still on the waiting list for the official Claude API, but in the meantime I can pay for access to it via OpenRouter and it works just fine. Claude has a 100,000 token context, making it a really great option for working with larger documents.</li>
<li>
<strong><a href="https://github.com/simonw/llm-anyscale-endpoints">llm-anyscale-endpoints</a></strong> is a similar plugin that instead works with <a href="https://app.endpoints.anyscale.com/">Anyscale Endpoints</a>. Anyscale provide Llama 2 and Code Llama at extremely low prices - between $0.25 and $1 per million tokens, depending on the model.</li>
</ul>
<p>These plugins were very quick to develop.</p>
<p>Both OpenRouter and Anyscale Endpoints provide API endpoints that emulate the official OpenAI APIs, including the way the handle streaming tokens.</p>
<p>LLM already has code for talking to those endpoints via the <a href="https://github.com/openai/openai-python">openai</a> Python library, which can be re-pointed to another backend using the officially supported <code>api_base</code> parameter.</p>
<p>So the core code for the plugins ended up being less than 30 lines each: <a href="https://github.com/simonw/llm-openrouter/blob/main/llm_openrouter.py">llm_openrouter.py</a> and <a href="https://github.com/simonw/llm-anyscale-endpoints/blob/main/llm_anyscale_endpoints.py">llm_anyscale_endpoints.py</a>.</p>
<h4>llm 0.8</h4>
<p>I shipped <a href="https://llm.datasette.io/en/stable/changelog.html#v0-8">LLM 0.8</a> a week and a half ago, with a bunch of small changes.</p>
<p>The most significant of these was a change to the default <code>llm logs</code> output, which shows the logs (recorded in SQLite) of the previous prompts and responses you have sent through the tool.</p>
<p>This output used to be JSON. It's <a href="https://github.com/simonw/llm/issues/160#issuecomment-1682991314">now Markdown</a>, which is both easier to read and can be pasted into GitHub Issue comments or Gists or similar to share the results with other people.</p>
<p><a href="https://llm.datasette.io/en/stable/changelog.html#v0-8">The release notes for 0.8</a> describe all of the other improvements.</p>
<h4>sqlite-utils 3.35</h4>
<p>The <a href="https://github.com/simonw/sqlite-utils/releases/tag/3.35">3.35 release of sqlite-utils</a> was driven by LLM.</p>
<p><code>sqlite-utils</code> has a mechanism for adding foreign keys to an existing table - something that's not supported by SQLite out of the box.</p>
<p>That implementation used to work using a deeply gnarly hack: it would switch the <code>sqlite_master</code> table over to being writable (using <code>PRAGMA writable_schema = 1</code>), update that schema in place to reflect the new foreign keys and then toggle <code>writable_schema = 0</code> back again.</p>
<p>It turns out there are Python installations out there - most notably the system Python on macOS - which completely disable the ability to write to that table, no matter what the status of the various pragmas.</p>
<p>I was getting bug reports from LLM users who were running into this. I realized that I had a solution for this mostly implemented already: the <a href="https://sqlite-utils.datasette.io/en/stable/python-api.html#transforming-a-table">sqlite-utils transform() method</a>, which can apply all sorts of complex schema changes by creating a brand new table, copying across the old data and then renaming it to replace the old one.</p>
<p>So I dropped the old <code>writable_schema</code> mechanism entirely in favour of <code>.transform()</code> - it's slower, because it requires copying the entire table, but it doesn't have weird edge-cases where it doesn't work.</p>
<p>Since <a href="https://simonwillison.net/2023/Jul/24/sqlite-utils-plugins/">sqlite-utils supports plugins now</a>, I realized I could set a healthy precedent by making the removed feature available in a new plugin: <a href="https://github.com/simonw/sqlite-utils-fast-fks">sqlite-utils-fast-fks</a>, which provides the following command for adding foreign keys the fast, old way (provided your installation supports it):</p>
<div class="highlight highlight-source-shell"><pre>sqlite-utils install sqlite-utils-fast-fks
sqlite-utils fast-fks my_database.db places country_id country id</pre></div>
<p>I've always admired how <a href="https://jquery.com/">jQuery</a> uses plugins to keep old features working on an opt-in basis after major version upgrades. I'm excited to be able to apply the same pattern for <code>sqlite-utils</code>.</p>
<h4>paginate-json 1.0</h4>
<p><a href="https://github.com/simonw/paginate-json">paginate-json</a> is a tiny tool I first released a few years ago to solve a very specific problem.</p>
<p>There's a neat pattern in some JSON APIs where the <a href="https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/Link">HTTP link header</a> is used to indicate subsequent pages of results.</p>
<p>The best example I know of this is the GitHub API. Run this to see what it looks like here I'm using the <a href="https://docs.github.com/en/rest/activity/events?apiVersion=2022-11-28#list-public-events-for-a-user">events API</a>):</p>
<div class="highlight highlight-source-shell"><pre>curl -i \
https://api.github.com/users/simonw/events</pre></div>
<p>Here's a truncated example of the output:</p>
<pre><code>HTTP/2 200
server: GitHub.com
content-type: application/json; charset=utf-8
link: <https://api.github.com/user/9599/events?page=2>; rel="next", <https://api.github.com/user/9599/events?page=9>; rel="last"
[
{
"id": "31467177730",
"type": "PushEvent",
</code></pre>
<p>The <code>link</code> header there specifies a <code>next</code> and <code>last</code> URL that can be used for pagination.</p>
<p>To fetch all available items, you can follow the <code>next</code> link repeatedly until it runs out.</p>
<p>My <code>paginate-json</code> tool can follow these links for you. If you run it like this:</p>
<div class="highlight highlight-source-shell"><pre>paginate-json \
https://api.github.com/users/simonw/events</pre></div>
<p>It will output a single JSON array consisting of the results from every available page.</p>
<p>The 1.0 release adds <a href="https://github.com/simonw/paginate-json/releases/tag/1.0">a bunch of small features</a>, but also marks my confidence in the stability of the design of the tool.</p>
<p>The <a href="https://docs.datasette.io/en/latest/json_api.html">Datasette JSON API</a> has supported <a href="https://docs.datasette.io/en/latest/json_api.html#pagination">link pagination</a> for a while - you can use <code>paginate-json</code> with Datasette like this, taking advantage of the new <code>--key</code> option to paginate over the array of objects returned in the <code>"rows"</code> key:</p>
<div class="highlight highlight-source-shell"><pre>paginate-json \
<span class="pl-s"><span class="pl-pds">'</span>https://datasette.io/content/pypi_releases.json?_labels=on<span class="pl-pds">'</span></span> \
--key rows \
--nl</pre></div>
<p>The <code>--nl</code> option here causes <code>paginate-json</code> to output the results as newline-delimited JSON, instead of bundling them together into a JSON array.</p>
<p>Here's how to use <a href="https://sqlite-utils.datasette.io/en/stable/cli.html#inserting-newline-delimited-json">sqlite-utils insert</a> to insert that data directly into a fresh SQLite database:</p>
<div class="highlight highlight-source-shell"><pre>paginate-json \
<span class="pl-s"><span class="pl-pds">'</span>https://datasette.io/content/pypi_releases.json?_labels=on<span class="pl-pds">'</span></span> \
--key rows \
--nl <span class="pl-k">|</span> \
sqlite-utils insert data.db releases - \
--nl --flatten</pre></div>
<h4>Releases this week</h4>
<ul>
<li>
<strong><a href="https://github.com/simonw/paginate-json/releases/tag/1.0">paginate-json 1.0</a></strong> - 2023-08-30<br />Command-line tool for fetching JSON from paginated APIs</li>
<li>
<strong><a href="https://github.com/simonw/datasette-auth-tokens/releases/tag/0.4a2">datasette-auth-tokens 0.4a2</a></strong> - 2023-08-29<br />Datasette plugin for authenticating access using API tokens</li>
<li>
<strong><a href="https://github.com/simonw/datasette/releases/tag/1.0a5">datasette 1.0a5</a></strong> - 2023-08-29<br />An open source multi-tool for exploring and publishing data</li>
<li>
<strong><a href="https://github.com/simonw/llm-anyscale-endpoints/releases/tag/0.2">llm-anyscale-endpoints 0.2</a></strong> - 2023-08-25<br />LLM plugin for models hosted by Anyscale Endpoints</li>
<li>
<strong><a href="https://github.com/simonw/datasette-jellyfish/releases/tag/2.0">datasette-jellyfish 2.0</a></strong> - 2023-08-24<br />Datasette plugin adding SQL functions for fuzzy text matching powered by Jellyfish</li>
<li>
<strong><a href="https://github.com/simonw/datasette-configure-fts/releases/tag/1.1.2">datasette-configure-fts 1.1.2</a></strong> - 2023-08-23<br />Datasette plugin for enabling full-text search against selected table columns</li>
<li>
<strong><a href="https://github.com/simonw/datasette-ripgrep/releases/tag/0.8.1">datasette-ripgrep 0.8.1</a></strong> - 2023-08-21<br />Web interface for searching your code using ripgrep, built as a Datasette plugin</li>
<li>
<strong><a href="https://github.com/simonw/datasette-publish-fly/releases/tag/1.3.1">datasette-publish-fly 1.3.1</a></strong> - 2023-08-21<br />Datasette plugin for publishing data using Fly</li>
<li>
<strong><a href="https://github.com/simonw/llm-openrouter/releases/tag/0.1">llm-openrouter 0.1</a></strong> - 2023-08-21<br />LLM plugin for models hosted by OpenRouter</li>
<li>
<strong><a href="https://github.com/simonw/llm/releases/tag/0.8">llm 0.8</a></strong> - 2023-08-21<br />Access large language models from the command-line</li>
<li>
<strong><a href="https://github.com/simonw/sqlite-utils-fast-fks/releases/tag/0.1">sqlite-utils-fast-fks 0.1</a></strong> - 2023-08-18<br />Fast foreign key addition for sqlite-utils</li>
<li>
<strong><a href="https://github.com/simonw/datasette-edit-schema/releases/tag/0.5.3">datasette-edit-schema 0.5.3</a></strong> - 2023-08-18<br />Datasette plugin for modifying table schemas</li>
<li>
<strong><a href="https://github.com/simonw/sqlite-utils/releases/tag/3.35">sqlite-utils 3.35</a></strong> - 2023-08-18<br />Python CLI utility and library for manipulating SQLite databases</li>
</ul>
<h4>TIL this week</h4>
<ul>
<li>
<a href="https://til.simonwillison.net/json/streaming-indented-json-array">Streaming output of an indented JSON array</a> - 2023-08-30</li>
<li>
<a href="https://til.simonwillison.net/macos/downloading-partial-youtube-videos">Downloading partial YouTube videos with ffmpeg</a> - 2023-08-26</li>
<li>
<a href="https://til.simonwillison.net/sqlite/sqlite-version-macos-python">Compile and run a new SQLite version with the existing sqlite3 Python library on macOS</a> - 2023-08-22</li>
<li>
<a href="https://til.simonwillison.net/fly/django-sql-dashboard">Configuring Django SQL Dashboard for Fly PostgreSQL</a> - 2023-08-22</li>
<li>
<a href="https://til.simonwillison.net/sqlite/database-file-size">Calculating the size of a SQLite database file using SQL</a> - 2023-08-21</li>
<li>
<a href="https://til.simonwillison.net/readthedocs/stable-docs">Updating stable docs in ReadTheDocs without pushing a release</a> - 2023-08-21</li>
<li>
<a href="https://til.simonwillison.net/bash/go-script">A shell script for running Go one-liners</a> - 2023-08-20</li>
<li>
<a href="https://til.simonwillison.net/sqlite/python-sqlite-environment">A one-liner to output details of the current Python's SQLite</a> - 2023-08-19</li>
<li>
<a href="https://til.simonwillison.net/python/inlining-binary-data">A simple pattern for inlining binary content in a Python script</a> - 2023-08-19</li>
<li>
<a href="https://til.simonwillison.net/bash/multiple-servers">Running multiple servers in a single Bash script</a> - 2023-08-17</li>
</ul>
Datasette 0.64, with a warning about SpatiaLite2023-01-09T21:22:51+00:002023-01-09T21:22:51+00:00https://simonwillison.net/2023/Jan/9/datasette-064/#atom-series
<p>I release Datasette 0.64 this morning. This release is mainly a response to the realization that it's not safe to run Datasette with the SpatiaLite extension loaded if that Datasette instance is configured to enable arbitrary SQL queries from untrusted users.</p>
<p>Here are <a href="https://docs.datasette.io/en/stable/changelog.html#v0-64">the release notes</a> quoted in full:</p>
<blockquote>
<ul>
<li>Datasette now <strong>strongly recommends against allowing arbitrary SQL queries if you are using SpatiaLite</strong>. SpatiaLite includes SQL functions that could cause the Datasette server to crash. See <a href="https://docs.datasette.io/en/stable/spatialite.html#spatialite">SpatiaLite</a> for more details.</li>
<li>New <a href="https://docs.datasette.io/en/stable/settings.html#setting-default-allow-sql">default_allow_sql</a> setting, providing an easier way to disable all arbitrary SQL execution by end users: <code>datasette --setting default_allow_sql off</code>. See also <a href="https://docs.datasette.io/en/stable/authentication.html#authentication-permissions-execute-sql">Controlling the ability to execute arbitrary SQL</a>. (<a href="https://github.com/simonw/datasette/issues/1409">#1409</a>)</li>
<li>
<a href="https://datasette.io/tutorials/spatialite">Building a location to time zone API with SpatiaLite</a> is a new Datasette tutorial showing how to safely use SpatiaLite to create a location to time zone API.</li>
<li>New documentation about <a href="https://docs.datasette.io/en/stable/installation.html#installation-extensions">how to debug problems loading SQLite extensions</a>. The error message shown when an extension cannot be loaded has also been improved. (<a href="https://github.com/simonw/datasette/issues/1979">#1979</a>)</li>
<li>Fixed an accessibility issue: the <code><select></code> elements in the table filter form now show an outline when they are currently focused. (<a href="https://github.com/simonw/datasette/issues/1771">#1771</a>)</li>
</ul>
</blockquote>
<h4>The problem with SpatiaLite</h4>
<p>Datasette allows arbitrary SQL execution <a href="https://docs.datasette.io/en/stable/sql_queries.html">as a core feature</a>. It takes a bunch of steps to provide this safely: database connections are opened in read-only mode, it imposes a strict time limit on SQL queries and Datasette is designed to be run in containers for a further layer of protection.</p>
<p>SQLite itself is an excellent platform for this feature: it has a set of default functionality that supports this well, protected by a <a href="https://www.sqlite.org/testing.html">legendarily thorough test suite</a>.</p>
<p><a href="https://www.gaia-gis.it/fossil/libspatialite/index">SpatiaLite</a> is a long running third-party extension for SQLite that adds a <a href="https://www.gaia-gis.it/gaia-sins/spatialite-sql-5.0.1.html">bewildering array</a> of additional functionality to SQLite - much of it around GIS, but with a whole host of extras as well. It includes debugging routines, XML parsers and even it's own implementation of stored procedures!</p>
<p>Unfortunately, not all of this functionality is safe to expose to untrusted queries - even for databases that have been opened in read-only mode.</p>
<p>After identifying functions which could crash the Datasette instance, I decided that Datasette should make a strong recommendation not to expose SpatiaLite in an unprotected manner.</p>
<p>In addition to the new documentation, I also added a feature I've been planning for a while: a <a href="https://docs.datasette.io/en/stable/settings.html#setting-default-allow-sql">simple setting</a> for disabling arbitrary SQL queries entirely:</p>
<pre><code>datasette --setting default_allow_sql off
</code></pre>
<p>Prior to 0.64 you could achieve the same thing by adding the following line to your <a href="https://docs.datasette.io/en/stable/metadata.html">metadata.json</a> file:</p>
<div class="highlight highlight-source-json"><pre>{
<span class="pl-ent">"allow_sql"</span>: <span class="pl-c1">false</span>
}</pre></div>
<p>Or in <code>metadata.yml</code>:</p>
<div class="highlight highlight-source-yaml"><pre><span class="pl-ent">allow_sql</span>: <span class="pl-c1">false</span></pre></div>
<p>The new setting achieves the same thing, but is more obvious and can be easily applied even for Datasette instances that don't use metadata.</p>
<h4>A new SpatiaLite tutorial</h4>
<p>The documentation now recommends running SpatiaLite instances with pre-approved SQL implemented using Datasette's canned queries feature.</p>
<p>To help clarify how this works, I decided to publish a new entry in the official series of <a href="https://datasette.io/tutorials">Datasette tutorials</a>:</p>
<p><strong><a href="https://datasette.io/tutorials/spatialite">Building a location to time zone API with SpatiaLite</a></strong></p>
<p>This is an updated version of <a href="https://simonwillison.net/2017/Dec/12/location-time-zone-api/">a tutorial</a> I first wrote back in 2017.</p>
<p>The new tutorial now includes material on Chris Amico's <a href="https://datasette.io/plugins/datasette-geojson-map">datasette-geojson-map</a> plugin, SpatiaLite point-in-polygon queries, polygon intersection queries, spatial indexes and how to use the <code>simplify()</code> function to reduce huge polygons down to a size that is more practical to display on a map.</p>
<p>I'm really happy with this new tutorial. Not only does it show a safe way to run SpatiaLite, but it also illustrates a powerful pattern for using Datasette to create and deploy custom APIs.</p>
<p>The resulting API can be accessed here:</p>
<p><a href="https://timezones.datasette.io/timezones">https://timezones.datasette.io/timezones</a></p>
<p>It's hosted on <a href="https://fly.io/">Fly</a>, using their $1.94/month instance size with 256MB of RAM - easily powerful enough to host this class of application.</p>
<p>I also updated the <a href="https://datasette.io/plugins/datasette-publish-fly">datasette-publish-fly</a> plugin to make it easier to deploy instances with SQL execution disabled, see the <a href="https://github.com/simonw/datasette-publish-fly/releases/tag/1.3">1.3 release notes</a>.</p>
Datasette 1.0a2: Upserts and finely grained permissions2022-12-15T17:58:19+00:002022-12-15T17:58:19+00:00https://simonwillison.net/2022/Dec/15/datasette-1a2/#atom-series
<p>I've released the third alpha of Datasette 1.0. The <a href="https://docs.datasette.io/en/latest/changelog.html#a2-2022-12-14">1.0a2 release</a> introduces upsert support to the new JSON API and makes some major improvements to the Datasette permissions system.</p>
<p>Here are the annotated releases (<a href="https://simonwillison.net/series/datasette-release-notes/">see previous</a>).</p>
<p>You can install and try out the alpha using:</p>
<pre><code>pip install datasette==1.0a2
</code></pre>
<h4>Upserts for the JSON API</h4>
<blockquote>
<p>New <code>/db/table/-/upsert</code> API, <a href="https://docs.datasette.io/en/latest/json_api.html#tableupsertview">documented here</a>. Upsert is an update-or-insert: existing rows will have specified keys updated, but if no row matches the incoming primary key a brand new row will be inserted instead. (<a href="https://github.com/simonw/datasette/issues/1878">#1878</a>)</p>
</blockquote>
<p>I <a href="https://simonwillison.net/2022/Dec/2/datasette-write-api/">wrote about the new JSON Write API</a> when I released the first alpha a couple of weeks ago.</p>
<p>The API can be used to create and drop tables, and to insert, update and delete rows in those tables.</p>
<p>The new <code>/db/table/-/upsert</code> API adds <a href="https://docs.datasette.io/en/latest/json_api.html#tableupsertview">upsert support</a> to Datasette.</p>
<p>An <em>upsert</em> is a update-or-insert. Consider the following:</p>
<pre><code>POST /books/authors/-/upsert
Authorization: Bearer $TOKEN
Content-Type: application/json
</code></pre>
<div class="highlight highlight-source-json"><pre>{
<span class="pl-ent">"rows"</span>: [
{
<span class="pl-ent">"id"</span>: <span class="pl-c1">1</span>,
<span class="pl-ent">"name"</span>: <span class="pl-s"><span class="pl-pds">"</span>Ursula K. Le Guin<span class="pl-pds">"</span></span>,
<span class="pl-ent">"born"</span>: <span class="pl-s"><span class="pl-pds">"</span>1929-10-21<span class="pl-pds">"</span></span>
},
{
<span class="pl-ent">"id"</span>: <span class="pl-c1">2</span>,
<span class="pl-ent">"name"</span>: <span class="pl-s"><span class="pl-pds">"</span>Terry Pratchett<span class="pl-pds">"</span></span>,
<span class="pl-ent">"born"</span>: <span class="pl-s"><span class="pl-pds">"</span>1948-04-28<span class="pl-pds">"</span></span>
},
{
<span class="pl-ent">"id"</span>: <span class="pl-c1">3</span>,
<span class="pl-ent">"name"</span>: <span class="pl-s"><span class="pl-pds">"</span>Neil Gaiman<span class="pl-pds">"</span></span>,
<span class="pl-ent">"born"</span>: <span class="pl-s"><span class="pl-pds">"</span>1960-11-10<span class="pl-pds">"</span></span>
}
]
}</pre></div>
<p>This table has a primary key of <code>id</code>. The above API call will create three records if the table is empty. But if the table already has records matching any of those primary keys, their <code>name</code> and <code>born</code> columns will be updated to match the incoming data.</p>
<p>Upserts can be a really convenient way of synchronizing data with an external data source. I had a couple of enquiries about them when I published the first alpha, so I decided to make them a key feature for this release.</p>
<h4>Ignore and replace for the create table API</h4>
<blockquote>
<ul>
<li>The <code>/db/-/create</code> API for <a href="https://docs.datasette.io/en/latest/json_api.html#tablecreateview">creating a table</a> now accepts <code>"ignore": true</code> and <code>"replace": true</code> options when called with the <code>"rows"</code> property that creates a new table based on an example set of rows. This means the API can be called multiple times with different rows, setting rules for what should happen if a primary key collides with an existing row. (<a href="https://github.com/simonw/datasette/issues/1927">#1927</a>)</li>
<li>
<code>/db/-/create</code> API now requires actor to have <code>insert-row</code> permission in order to use the <code>"row"</code> or <code>"rows"</code> properties. (<a href="https://github.com/simonw/datasette/issues/1937">#1937</a>)</li>
</ul>
</blockquote>
<p>This feature is a little less obvious, but I think it's going to be really useful.</p>
<p>The <code>/db/-/create</code> API can be used to create a new table. You can feed it <a href="https://docs.datasette.io/en/latest/json_api.html#creating-a-table">an explicit list of columns</a>, but you can also give it one or more rows and have it <a href="https://docs.datasette.io/en/latest/json_api.html#creating-a-table-from-example-data">infer the correct schema</a> based on those examples.</p>
<p>Datasette inherits this feature <a href="https://sqlite-utils.datasette.io/en/stable/cli.html#inserting-json-data">from sqlite-utils</a> - I've been finding this an incredibly productive way to work with SQLite databases for a few years now.</p>
<p>The real magic of this feature is that you can pipe data into Datasette without even needing to first check that the appropriate table has been created. It's a really fast way of getting from data to a populated database and a working API.</p>
<p>Prior to 1.0a2 you could call <code>/db/-/create</code> with <code>"rows"</code> more than once and it would <em>probably</em> work... unless you attempted to insert rows with primary keys that were already in use - in which case you would get an error. This limited the utility of the feature.</p>
<p>Now you can pass <code>"ignore": true</code> or <code>"replace": true</code> to the API call, to tell Datasette what to do if it encounters a primary key that already exists in the table.</p>
<p>Heres an example using the author data from above:</p>
<pre><code>POST /books/-/create
Authorization: Bearer $TOKEN
Content-Type: application/json
</code></pre>
<div class="highlight highlight-source-json"><pre>{
<span class="pl-ent">"table"</span>: <span class="pl-s"><span class="pl-pds">"</span>authors<span class="pl-pds">"</span></span>,
<span class="pl-ent">"pk"</span>: <span class="pl-s"><span class="pl-pds">"</span>id<span class="pl-pds">"</span></span>,
<span class="pl-ent">"replace"</span>: <span class="pl-c1">true</span>,
<span class="pl-ent">"rows"</span>: [
{
<span class="pl-ent">"id"</span>: <span class="pl-c1">1</span>,
<span class="pl-ent">"name"</span>: <span class="pl-s"><span class="pl-pds">"</span>Ursula K. Le Guin<span class="pl-pds">"</span></span>,
<span class="pl-ent">"born"</span>: <span class="pl-s"><span class="pl-pds">"</span>1929-10-21<span class="pl-pds">"</span></span>
},
{
<span class="pl-ent">"id"</span>: <span class="pl-c1">2</span>,
<span class="pl-ent">"name"</span>: <span class="pl-s"><span class="pl-pds">"</span>Terry Pratchett<span class="pl-pds">"</span></span>,
<span class="pl-ent">"born"</span>: <span class="pl-s"><span class="pl-pds">"</span>1948-04-28<span class="pl-pds">"</span></span>
},
{
<span class="pl-ent">"id"</span>: <span class="pl-c1">3</span>,
<span class="pl-ent">"name"</span>: <span class="pl-s"><span class="pl-pds">"</span>Neil Gaiman<span class="pl-pds">"</span></span>,
<span class="pl-ent">"born"</span>: <span class="pl-s"><span class="pl-pds">"</span>1960-11-10<span class="pl-pds">"</span></span>
}
]
}</pre></div>
<p>This will create the <code>authors</code> table if it does not exist and ensure that those three rows exist in it, in their exact state. If a row already exists it will be replaced.</p>
<p>Note that this is subtly different from an upsert. An upsert will only update the columns that were provided in the incoming data, leaving any other columns unchanged. A replace will replace the entire row.</p>
<h4 id="finely-grained-permissions">Finely grained permissions</h4>
<p>This is the most significant area of improvement in this release.</p>
<blockquote>
<ul>
<li>New <a href="https://docs.datasette.io/en/latest/plugin_hooks.html#plugin-register-permissions">register_permissions(datasette)</a> plugin hook. Plugins can now register named permissions, which will then be listed in various interfaces that show available permissions. (<a href="https://github.com/simonw/datasette/issues/1940">#1940</a>)</li>
</ul>
</blockquote>
<p>Prior to this, permissions were just strings - things like <code>"view-instance"</code> or <code>"view-table"</code> or <code>"insert-row"</code>.</p>
<p>Plugins can introduce their own permissions - many do already, like <a href="https://datasette.io/plugins/datasette-edit-schema">datasette-edit-schema</a> which adds a <code>"edit-schema"</code> permission.</p>
<p>In order to start building UIs for managing permissions, I needed Datasette to know what they were!</p>
<p>The <a href="https://docs.datasette.io/en/latest/plugin_hooks.html#plugin-register-permissions">register_permissions() hook</a> lets them do exactly that, and Datasette core uses it to register its own default set of permissions too.</p>
<p>Permissions are registered using the following named tuple:</p>
<pre><span class="pl-v">Permission</span> <span class="pl-c1">=</span> <span class="pl-s1">collections</span>.<span class="pl-en">namedtuple</span>(
<span class="pl-s">"Permission"</span>, (
<span class="pl-s">"name"</span>, <span class="pl-s">"abbr"</span>, <span class="pl-s">"description"</span>,
<span class="pl-s">"takes_database"</span>, <span class="pl-s">"takes_resource"</span>, <span class="pl-s">"default"</span>
)
)</pre>
<p>The <code>abbr</code> is an abbreviation - e.g. <code>insert-row</code> can be abbreviated to <code>ir</code>. This is useful for creating things like signed API tokens where space is at a premium.</p>
<p><code>takes_database</code> and <code>takes_resource</code> are booleans that indicate whether the permission can optionally be applied to a specific database (e.g. <code>execute-sql</code>) or to a "resource", which is the name I'm now using for something that could be a SQL table, a SQL view or a <a href="https://docs.datasette.io/en/stable/sql_queries.html#canned-queries">canned query</a>.</p>
<p>The <code>insert-row</code> permission for example can be granted to the whole of Datasette, or to all tables in a specific database, or to specific tables.</p>
<p>Finally, the <code>default</code> value is a boolean that indicates whether the permission should be default-allow (<code>view-instance</code> for example) or default-deny (<code>create-table</code> and suchlike).</p>
<p>This next feature explains why I needed those permission names to be known to Datasette:</p>
<blockquote>
<ul>
<li>The <code>/-/create-token</code> page can now be used to create API tokens which are restricted to just a subset of actions, including against specific databases or resources. See <a href="https://docs.datasette.io/en/latest/authentication.html#createtokenview">API Tokens</a> for details. (<a href="https://github.com/simonw/datasette/issues/1947">#1947</a>)</li>
</ul>
</blockquote>
<p>Datasette now has finely grained permissions for API tokens!</p>
<p>This is the feature I always want when I'm working with other APIs: the ability to create a token that can only perform a restricted subset of actions.</p>
<p>When I'm working with the GitHub API for example I frequently find myself wanting to create a "personal access token" that only has the ability to read issues from a specific repository. It's infuriating how many APIs leave this ability out.</p>
<p>The <code>/-/create-token</code> interface (which you can try out on <a href="https://latest.datasette.io/">latest.datasette.io</a> by first <a href="https://latest.datasette.io/login-as-root">signing in as root</a> and then <a href="https://latest.datasette.io/-/create-token">visiting this page</a>) lets you create an API token that can act on your behalf... and then optionally specify a subset of actions that the token is allowed to perform.</p>
<p>Thanks to the new permissions registration system, the UI on that page knows which permissions can be applied to which entities within Datasette itself.</p>
<p>Here's a partial screenshot of the UI:</p>
<p><img src="https://static.simonwillison.net/static/2022/create-api-token.jpg" alt="Create an API token. This token will allow API access with the same abilities as your current user, root. Form lets you select Expires after X hours, 5 - and there's an expanded menu item for Restrict actions that can be performed using this token. Below that are lists of checkboxes - the first is for All databases and tables, with a list of every permission known to Datasette. Next is All tables in fixtures, which lists just permissions that make sense for a specific database. Finally is Specific tables, which lists fixtures: primary_key_multiple_columns with a much shorter list of permissions that can apply just to tables." style="max-width: 100%;" /></p>
<p>Select a subset of permissions, hit "Create token" and the result will be an access token you can copy and paste into another application, or use to call Datasette with <code>curl</code>.</p>
<p>Here's an example token I created that grants <code>view-instance</code> permission against all of Datasette, and <code>view-database</code> and <code>execute-sql</code> permission against the <code>ephemeral</code> database, which in that demo is hidden from anonymous Datasette users.</p>
<p><code>dstok_eyJhIjoicm9vdCIsInQiOjE2NzEwODUzMDIsIl9yIjp7ImEiOlsidmkiXSwiZCI6eyJlcGhlbWVyYWwiOlsidmQiLCJlcyJdfX19.1uw0xyx8UND_Y_vTVg5kEF6m3GU</code></p>
<p>Here's a screenshot of the screen I saw when I created it:</p>
<p><img src="https://static.simonwillison.net/static/2022/token-created.jpg" alt="Create an API token. Your API token: a copy-to-clipboard box with a long token in it. The token details area is expanded to show JSON that represents the permissions granted with that token." style="max-width: 100%;" /></p>
<p>I've expanded the "token details" section to show the JSON that is bundled inside the signed token. The <code>"_r"</code> block records the specific permissions granted by the token:</p>
<ul>
<li>
<code>"a": ["vi"]</code> indicates that the <code>view-instance</code> permission is granted against all of Datasette.</li>
<li>
<code>"d": {"ephemeral": ["vd", "es"]}</code> indicates that the <code>view-database</code> and <code>execute-sql</code> permissions are granted against the <code>ephemeral</code> database.</li>
</ul>
<p>The token also contains the ID of the user who created it (<code>"a": "root"</code>) and the time that the token was created (<code>"t": 1671085302</code>). If the token was set to expire that expiry duration would be baked in here as well.</p>
<p>You can see the effect this has on the command-line using <code>curl</code> like so:</p>
<p><code>curl 'https://latest.datasette.io/ephemeral.json?sql=select+3+*+5&_shape=array'</code></p>
<p>This will return a forbidden error. But if you add the signed token:</p>
<p><code>curl 'https://latest.datasette.io/ephemeral.json?sql=select+3+*+5&_shape=array' -H 'Authorization: Bearer dstok_eyJhIjoicm9vdCIsInQiOjE2NzEwODUzMDIsIl9yIjp7ImEiOlsidmkiXSwiZCI6eyJlcGhlbWVyYWwiOlsidmQiLCJlcyJdfX19.1uw0xyx8UND_Y_vTVg5kEF6m3GU'</code></p>
<p>You'll get back a JSON response:</p>
<div class="highlight highlight-source-json"><pre>[{<span class="pl-ent">"3 * 5"</span>: <span class="pl-c1">15</span>}]</pre></div>
<h4>The datasette create-token CLI tool</h4>
<blockquote>
<ul>
<li>Likewise, the <code>datasette create-token</code> CLI command can now create tokens with <a href="https://docs.datasette.io/en/latest/authentication.html#authentication-cli-create-token-restrict">a subset of permissions</a>. (<a href="https://github.com/simonw/datasette/issues/1855">#1855</a>)</li>
<li>New <a href="https://docs.datasette.io/en/latest/internals.html#create-token-actor-id-expires-after-none-restrict-all-none-restrict-database-none-restrict-resource-none">datasette.create_token()</a> API method for programmatically creating signed API tokens. (<a href="https://github.com/simonw/datasette/issues/1951">#1951</a>)</li>
</ul>
</blockquote>
<p>The other way you can create Datasette tokens is on the command-line, using <a href="https://docs.datasette.io/en/latest/authentication.html#datasette-create-token">the datasette create-token command</a>.</p>
<p>That's been upgraded to support finely grained permissions too.</p>
<p>Here's how you'd create a token for the same set of permissions as my ephemeral example above:</p>
<pre><code>datasette create-token root \
--all view-instance \
--database ephemeral view-database \
--database ephemeral execute-sql \
--secret MY_DATASETTE_SECRET
</code></pre>
<p>In order to sign the token you need to pass in the <code>--secret</code> used by the server - although it will pick that up from the <code>DATASETTE_SECRET</code> environment variable if it's available.</p>
<p>This has the interesting side-effect that you can use that command to create valid tokens for other Datasette instances, provided you know the secret they're using. I think this ability will be really useful for people like myself who run lots of different Datasette instances on stateless hosting platforms such as Vercel and Google Cloud Run.</p>
<h4>Configuring permissions in metadata.json/yaml</h4>
<blockquote>
<ul>
<li>Arbitrary permissions can now be configured at the instance, database and resource (table, SQL view or canned query) level in Datasette's <a href="https://docs.datasette.io/en/latest/metadata.html#metadata">Metadata</a> JSON and YAML files. The new <code>"permissions"</code> key can be used to specify which actors should have which permissions. See <a href="https://docs.datasette.io/en/latest/authentication.html#authentication-permissions-other">Other permissions in metadata</a> for details. (<a href="https://github.com/simonw/datasette/issues/1636">#1636</a>)</li>
</ul>
</blockquote>
<p>Datasette has long had the ability to set permissions for viewing databases and tables using blocks of configuration in the increasingly poorly named <a href="https://docs.datasette.io/en/stable/metadata.html">metadata.json/yaml files</a>.</p>
<p>As I've built new plugins that introduce new permissions, I've found myself wishing for an easier way to say "user X is allowed to perform action Y" for arbitrary other permissions.</p>
<p>The new <code>"permissions"</code> key in metadata.json/yaml files allows you to do that.</p>
<p>Here's how to specify that the user with <code>"id": "simon"</code> is allowed to use the API to create tables and insert data into the <code>docs</code> database:</p>
<div class="highlight highlight-source-yaml"><pre><span class="pl-ent">databases</span>:
<span class="pl-ent">docs</span>:
<span class="pl-ent">permissions</span>:
<span class="pl-ent">create-table</span>:
<span class="pl-ent">id</span>: <span class="pl-s">simon</span>
<span class="pl-ent">insert-row</span>:
<span class="pl-ent">id</span>: <span class="pl-s">simon</span></pre></div>
<p>Here's a demo you can run on your own machine. Save the above to <code>permissions.yaml</code> and run the following in one terminal window:</p>
<pre><code>datasette docs.db --create --secret sekrit -m permissions.yaml
</code></pre>
<p>This will create the <code>docs.db</code> database if it doesn't already exist, and start Datasette with the <code>permissions.yaml</code> metadata file.</p>
<p>It sets <code>--secret</code> to a known value (you should always use a random secure secret in production) so we can easily use it with <code>create-token</code> in the next step:</p>
<p>Then in another terminal window run:</p>
<pre><code>export TOKEN=$(
datasette create-token simon \
--secret sekrit
)
curl -XPOST http://localhost:8001/docs/-/create \
-H "Authorization: Bearer $TOKEN" \
-d '{
"table": "demo",
"row": {"id": 1, "name": "Simon"},
"pk": "id"
}'
</code></pre>
<p>The first line creates a token that can act on behalf of the <code>simon</code> actor. The second <code>curl</code> line then uses that token to create a table using the <code>/-/create</code> endpoint.</p>
<p>Run this, then visit <a href="http://localhost:8001/docs/demo">http://localhost:8001/docs/demo</a> to see the newly created table.</p>
<h4>What's next?</h4>
<p>With the 1.0a2 release I'm reasonably confident that Datasette 1.0 is new-feature-complete. There's still a <strong>lot</strong> of work to do before the final release, but the remaining work is far more intimidating: I need to make clean backwards-incompatible breakages to a whole host of existing features in order to ship a 1.0 that I can keep stable for as long as possible.</p>
<p>First up: I'm going to <a href="https://github.com/simonw/datasette/issues/1914">redesign Datasette's default API output</a>.</p>
<p>The current default JSON output for a simple table looks like this:</p>
<div class="highlight highlight-source-json"><pre>{
<span class="pl-ent">"database"</span>: <span class="pl-s"><span class="pl-pds">"</span>fixtures<span class="pl-pds">"</span></span>,
<span class="pl-ent">"table"</span>: <span class="pl-s"><span class="pl-pds">"</span>facet_cities<span class="pl-pds">"</span></span>,
<span class="pl-ent">"is_view"</span>: <span class="pl-c1">false</span>,
<span class="pl-ent">"human_description_en"</span>: <span class="pl-s"><span class="pl-pds">"</span>sorted by name<span class="pl-pds">"</span></span>,
<span class="pl-ent">"rows"</span>: [
[
<span class="pl-c1">3</span>,
<span class="pl-s"><span class="pl-pds">"</span>Detroit<span class="pl-pds">"</span></span>
],
[
<span class="pl-c1">2</span>,
<span class="pl-s"><span class="pl-pds">"</span>Los Angeles<span class="pl-pds">"</span></span>
],
[
<span class="pl-c1">4</span>,
<span class="pl-s"><span class="pl-pds">"</span>Memnonia<span class="pl-pds">"</span></span>
],
[
<span class="pl-c1">1</span>,
<span class="pl-s"><span class="pl-pds">"</span>San Francisco<span class="pl-pds">"</span></span>
]
],
<span class="pl-ent">"truncated"</span>: <span class="pl-c1">false</span>,
<span class="pl-ent">"filtered_table_rows_count"</span>: <span class="pl-c1">4</span>,
<span class="pl-ent">"expanded_columns"</span>: [],
<span class="pl-ent">"expandable_columns"</span>: [],
<span class="pl-ent">"columns"</span>: [
<span class="pl-s"><span class="pl-pds">"</span>id<span class="pl-pds">"</span></span>,
<span class="pl-s"><span class="pl-pds">"</span>name<span class="pl-pds">"</span></span>
],
<span class="pl-ent">"primary_keys"</span>: [
<span class="pl-s"><span class="pl-pds">"</span>id<span class="pl-pds">"</span></span>
],
<span class="pl-ent">"units"</span>: {},
<span class="pl-ent">"query"</span>: {
<span class="pl-ent">"sql"</span>: <span class="pl-s"><span class="pl-pds">"</span>select id, name from facet_cities order by name limit 101<span class="pl-pds">"</span></span>,
<span class="pl-ent">"params"</span>: {}
},
<span class="pl-ent">"facet_results"</span>: {},
<span class="pl-ent">"suggested_facets"</span>: [],
<span class="pl-ent">"next"</span>: <span class="pl-c1">null</span>,
<span class="pl-ent">"next_url"</span>: <span class="pl-c1">null</span>,
<span class="pl-ent">"private"</span>: <span class="pl-c1">false</span>,
<span class="pl-ent">"allow_execute_sql"</span>: <span class="pl-c1">true</span>,
<span class="pl-ent">"query_ms"</span>: <span class="pl-c1">6.718471999647591</span>,
<span class="pl-ent">"source"</span>: <span class="pl-s"><span class="pl-pds">"</span>tests/fixtures.py<span class="pl-pds">"</span></span>,
<span class="pl-ent">"source_url"</span>: <span class="pl-s"><span class="pl-pds">"</span>https://github.com/simonw/datasette/blob/main/tests/fixtures.py<span class="pl-pds">"</span></span>,
<span class="pl-ent">"license"</span>: <span class="pl-s"><span class="pl-pds">"</span>Apache License 2.0<span class="pl-pds">"</span></span>,
<span class="pl-ent">"license_url"</span>: <span class="pl-s"><span class="pl-pds">"</span>https://github.com/simonw/datasette/blob/main/LICENSE<span class="pl-pds">"</span></span>
}</pre></div>
<p>In addition to being <em>really</em> verbose, you'll note that the rows themselves are represented like this:</p>
<div class="highlight highlight-source-json"><pre>[
[
<span class="pl-c1">3</span>,
<span class="pl-s"><span class="pl-pds">"</span>Detroit<span class="pl-pds">"</span></span>
],
[
<span class="pl-c1">2</span>,
<span class="pl-s"><span class="pl-pds">"</span>Los Angeles<span class="pl-pds">"</span></span>
],
[
<span class="pl-c1">4</span>,
<span class="pl-s"><span class="pl-pds">"</span>Memnonia<span class="pl-pds">"</span></span>
],
[
<span class="pl-c1">1</span>,
<span class="pl-s"><span class="pl-pds">"</span>San Francisco<span class="pl-pds">"</span></span>
]
]</pre></div>
<p>I originally designed it this way because I thought saving on repeating the column names for every row would be more efficient.</p>
<p>In practice, every single time I've used Datasette's API I've found myself using the <code>?_shape=array</code> parameter, which outputs this format instead:</p>
<div class="highlight highlight-source-json"><pre>[
{
<span class="pl-ent">"id"</span>: <span class="pl-c1">3</span>,
<span class="pl-ent">"name"</span>: <span class="pl-s"><span class="pl-pds">"</span>Detroit<span class="pl-pds">"</span></span>
},
{
<span class="pl-ent">"id"</span>: <span class="pl-c1">2</span>,
<span class="pl-ent">"name"</span>: <span class="pl-s"><span class="pl-pds">"</span>Los Angeles<span class="pl-pds">"</span></span>
},
{
<span class="pl-ent">"id"</span>: <span class="pl-c1">4</span>,
<span class="pl-ent">"name"</span>: <span class="pl-s"><span class="pl-pds">"</span>Memnonia<span class="pl-pds">"</span></span>
},
{
<span class="pl-ent">"id"</span>: <span class="pl-c1">1</span>,
<span class="pl-ent">"name"</span>: <span class="pl-s"><span class="pl-pds">"</span>San Francisco<span class="pl-pds">"</span></span>
}
]</pre></div>
<p>It's just so much more convenient to work with!</p>
<p>So the new default format will look like this:</p>
<div class="highlight highlight-source-json"><pre>{
<span class="pl-ent">"rows"</span>: [
{
<span class="pl-ent">"id"</span>: <span class="pl-c1">3</span>,
<span class="pl-ent">"name"</span>: <span class="pl-s"><span class="pl-pds">"</span>Detroit<span class="pl-pds">"</span></span>
},
{
<span class="pl-ent">"id"</span>: <span class="pl-c1">2</span>,
<span class="pl-ent">"name"</span>: <span class="pl-s"><span class="pl-pds">"</span>Los Angeles<span class="pl-pds">"</span></span>
},
{
<span class="pl-ent">"id"</span>: <span class="pl-c1">4</span>,
<span class="pl-ent">"name"</span>: <span class="pl-s"><span class="pl-pds">"</span>Memnonia<span class="pl-pds">"</span></span>
},
{
<span class="pl-ent">"id"</span>: <span class="pl-c1">1</span>,
<span class="pl-ent">"name"</span>: <span class="pl-s"><span class="pl-pds">"</span>San Francisco<span class="pl-pds">"</span></span>
}
]
}</pre></div>
<p>The <code>rows</code> key is there so I can add extra keys to the output, based on additional <code>?_extra=</code> request parameters. You'll be able to get back everything you can get in the current full-fat table API, but you'll have to ask for it.</p>
<p>There are a ton of other changes I want to make to Datasette as a whole - things like renaming <code>metadata.yaml</code> to <code>config.yaml</code> to reflect that it's gone way beyond its origins as a way of attaching metadata to a database.</p>
<p>The <a href="https://github.com/simonw/datasette/milestone/7">1.0 milestone</a> is a dumping ground for many of these ideas. It's not a canonical reference though: I'd be very surprised if everything currently in that milestone makes it into the final 1.0 release.</p>
<p>As I get closer to 1.0 though I'll be refining that milestone so it should get more accurate over time.</p>
<p>Once again: <strong>now is the time</strong> to be providing feedback on this stuff! The <a href="https://datasette.io/discord">Datasette Discord</a> is a particularly valuable way for me to get feedback on the work so far, and my plans for the future.</p>
Datasette's new JSON write API: The first alpha of Datasette 1.02022-12-02T23:15:07+00:002022-12-02T23:15:07+00:00https://simonwillison.net/2022/Dec/2/datasette-write-api/#atom-series
<p>This week I published <a href="https://docs.datasette.io/en/latest/changelog.html#a0-2022-11-29">the first alpha release of Datasette 1.0</a>, with a significant new feature: Datasette core now includes <a href="https://docs.datasette.io/en/latest/json_api.html#the-json-write-api">a JSON API</a> for creating and dropping tables and inserting, updating and deleting data.</p>
<p><img src="https://static.simonwillison.net/static/2022/datasette.svg" alt="The Datasette logo" style="max-width: 100%;" /></p>
<p>Combined with Datasette's existing APIs for reading and filtering table data and executing SELECT queries this effectively turns Datasette into a SQLite-backed JSON data layer for any application.</p>
<p>If you squint at it the right way, you could even describe it as offering a NoSQL interface to a SQL database!</p>
<p>My initial motivation for this work was to provide an API for loading data into my <a href="https://datasette.cloud/">Datasette Cloud</a> SaaS product - but now that I've got it working I'm realizing that it can be applied to a whole host of interesting things.</p>
<p>I shipped <a href="https://docs.datasette.io/en/latest/changelog.html#a0-2022-11-29">the 1.0a0 alpha</a> on Wednesday, then spent the last two days ironing out some bugs (released in <a href="https://docs.datasette.io/en/latest/changelog.html#a1-2022-12-01">1.0a1</a>) and building some illustrative demos.</p>
<h4>Scraping Hacker News to build an atom feed</h4>
<p>My first demo reuses my <a href="https://github.com/simonw/scrape-hacker-news-by-domain">scrape-hacker-news-by-domain</a> project from earlier this year.</p>
<p><a href="https://news.ycombinator.com/from?site=simonwillison.net">https://news.ycombinator.com/from?site=simonwillison.net</a> is the page on Hacker News that shows submissions from my blog. I like to keep an eye on that page to see if anyone has linked to my work.</p>
<p><img src="https://static.simonwillison.net/static/2022/hacker-news-from.jpg" alt="The page lists posts from my blog - the top one has 222 points and 39 comments, but most of the others have 2 or 3 points and no discussion at all." style="max-width: 100%;" /></p>
<p>Data from that page is not currently available through the <a href="https://github.com/HackerNews/API">official Hacker News API</a>... but it's in an HTML format that's pretty easy to scrape.</p>
<p>My <a href="https://shot-scraper.datasette.io/">shot-scraper</a> command-line browser automation tool has the ability to execute JavaScript against a web page and return scraped data as JSON.</p>
<p>I wrote about that in <a href="https://simonwillison.net/2022/Mar/14/scraping-web-pages-shot-scraper/">Scraping web pages from the command line with shot-scraper</a>, including a recipe for scraping that Hacker News page that looks like this:</p>
<div class="highlight highlight-source-shell"><pre>shot-scraper javascript \
<span class="pl-s"><span class="pl-pds">"</span>https://news.ycombinator.com/from?site=simonwillison.net<span class="pl-pds">"</span></span> \
-i scrape.js -o simonwillison-net.json</pre></div>
<p>Here's that <a href="https://github.com/simonw/scrape-hacker-news-by-domain/blob/main/scrape.js">scrape.js</a> script.</p>
<p>I've been running a <a href="https://simonwillison.net/2020/Oct/9/git-scraping/">Git scraper</a> that executes that scraping script using GitHub Actions for several months now, out of my <a href="https://github.com/simonw/scrape-hacker-news-by-domain">simonw/scrape-hacker-news-by-domain</a> repository.</p>
<p>Today I modified that script to also publish the data it has scraped to my personal Datasette Cloud account using the new API - and then used the <a href="https://datasette.io/plugins/datasette-atom">datasette-atom</a> plugin to generate an Atom feed from that data.</p>
<p>Here's <a href="https://simon.datasette.cloud/data/hacker_news_posts?_sort_desc=dt">the new table</a> in Datasette Cloud.</p>
<p>This is the <code>bash</code> script that runs in GitHub Actions and pushes the data to Datasette:</p>
<div class="highlight highlight-source-shell"><pre><span class="pl-k">export</span> SIMONWILLISON_ROWS=<span class="pl-s"><span class="pl-pds">$(</span></span>
<span class="pl-s"> jq -n --argjson rows <span class="pl-s"><span class="pl-pds">"</span><span class="pl-s"><span class="pl-pds">$(</span>cat simonwillison-net.json<span class="pl-pds">)</span></span><span class="pl-pds">"</span></span> \</span>
<span class="pl-s"> <span class="pl-s"><span class="pl-pds">'</span>{ "rows": $rows, "replace": true }<span class="pl-pds">'</span></span></span>
<span class="pl-s"><span class="pl-pds">)</span></span>
curl -X POST \
https://simon.datasette.cloud/data/hacker_news_posts/-/insert \
-H <span class="pl-s"><span class="pl-pds">"</span>Content-Type: application/json<span class="pl-pds">"</span></span> \
-H <span class="pl-s"><span class="pl-pds">"</span>Authorization: Bearer <span class="pl-smi">$DS_TOKEN</span><span class="pl-pds">"</span></span> \
-d <span class="pl-s"><span class="pl-pds">"</span><span class="pl-smi">$SIMONWILLISON_ROWS</span><span class="pl-pds">"</span></span></pre></div>
<p><code>$DS_TOKEN</code> is an environment variable containing a signed API token, see the <a href="https://docs.datasette.io/en/latest/authentication.html#api-tokens">API token documentation</a> for details.</p>
<p>I'm using <code>jq</code> here (with a recipe <a href="https://til.simonwillison.net/gpt3/jq">generated using GPT-3</a>) to convert the scraped data into the JSON format needeed by the Datasette API. The result looks like this:</p>
<div class="highlight highlight-source-json"><pre>{
<span class="pl-ent">"rows"</span>: [
{
<span class="pl-ent">"id"</span>: <span class="pl-s"><span class="pl-pds">"</span>33762438<span class="pl-pds">"</span></span>,
<span class="pl-ent">"title"</span>: <span class="pl-s"><span class="pl-pds">"</span>Coping strategies for the serial project hoarder<span class="pl-pds">"</span></span>,
<span class="pl-ent">"url"</span>: <span class="pl-s"><span class="pl-pds">"</span>https://simonwillison.net/2022/Nov/26/productivity/<span class="pl-pds">"</span></span>,
<span class="pl-ent">"dt"</span>: <span class="pl-s"><span class="pl-pds">"</span>2022-11-27T12:12:56<span class="pl-pds">"</span></span>,
<span class="pl-ent">"points"</span>: <span class="pl-c1">222</span>,
<span class="pl-ent">"submitter"</span>: <span class="pl-s"><span class="pl-pds">"</span>usrme<span class="pl-pds">"</span></span>,
<span class="pl-ent">"commentsUrl"</span>: <span class="pl-s"><span class="pl-pds">"</span>https://news.ycombinator.com/item?id=33762438<span class="pl-pds">"</span></span>,
<span class="pl-ent">"numComments"</span>: <span class="pl-c1">38</span>
}
],
<span class="pl-ent">"replace"</span>: <span class="pl-c1">true</span>
}</pre></div>
<p>This is then POSTed up to the <code>https://simon.datasette.cloud/data/hacker_news_posts/-/insert</code> API endpoint.</p>
<p>The <code>"rows"</code> key is a list of rows to be inserted.</p>
<p><code>"replace": true</code> tells Datasette to replace any existing rows with the same primary key. Without that, the API would return an error if any rows already existed.</p>
<p>The API also accepts <code>"ignore": true</code> which will cause it to ignore any rows that already exist.</p>
<p>Full insert API documentation <a href="https://docs.datasette.io/en/latest/json_api.html#inserting-rows">is here</a>.</p>
<h4>Initially creating the table</h4>
<p>Before I could insert any rows I needed to create the table.</p>
<p>I did that from the command-line too, using this recipe:</p>
<div class="highlight highlight-source-shell"><pre><span class="pl-k">export</span> ROWS=<span class="pl-s"><span class="pl-pds">$(</span></span>
<span class="pl-s"> jq -n --argjson rows <span class="pl-s"><span class="pl-pds">"</span><span class="pl-s"><span class="pl-pds">$(</span>cat simonwillison-net.json<span class="pl-pds">)</span></span><span class="pl-pds">"</span></span> \</span>
<span class="pl-s"> <span class="pl-s"><span class="pl-pds">'</span>{ "table": "hacker_news_posts", "rows": $rows, "pk": "id" }<span class="pl-pds">'</span></span></span>
<span class="pl-s"><span class="pl-pds">)</span></span>
<span class="pl-c"><span class="pl-c">#</span> Use curl to POST some JSON to a URL</span>
curl -X POST \
https://simon.datasette.cloud/data/-/create \
-H <span class="pl-s"><span class="pl-pds">"</span>Content-Type: application/json<span class="pl-pds">"</span></span> \
-H <span class="pl-s"><span class="pl-pds">"</span>Authorization: Bearer <span class="pl-smi">$DS_TOKEN</span><span class="pl-pds">"</span></span> \
-d <span class="pl-smi">$ROWS</span></pre></div>
<p>This uses the same trick as above, but hits a different API endpoint: <code>/data/-/create</code> which is the endpoint for <a href="https://docs.datasette.io/en/latest/json_api.html#creating-a-table">creating a table</a> in the <code>data.db</code> database.</p>
<p>The JSON submitted to that endpoint looks like this:</p>
<div class="highlight highlight-source-json"><pre>{
<span class="pl-ent">"table"</span>: <span class="pl-s"><span class="pl-pds">"</span>hacker_news_posts<span class="pl-pds">"</span></span>,
<span class="pl-ent">"pk"</span>: <span class="pl-s"><span class="pl-pds">"</span>id<span class="pl-pds">"</span></span>,
<span class="pl-ent">"rows"</span>: [
{
<span class="pl-ent">"id"</span>: <span class="pl-s"><span class="pl-pds">"</span>33762438<span class="pl-pds">"</span></span>,
<span class="pl-ent">"title"</span>: <span class="pl-s"><span class="pl-pds">"</span>Coping strategies for the serial project hoarder<span class="pl-pds">"</span></span>,
<span class="pl-ent">"url"</span>: <span class="pl-s"><span class="pl-pds">"</span>https://simonwillison.net/2022/Nov/26/productivity/<span class="pl-pds">"</span></span>,
<span class="pl-ent">"dt"</span>: <span class="pl-s"><span class="pl-pds">"</span>2022-11-27T12:12:56<span class="pl-pds">"</span></span>,
<span class="pl-ent">"points"</span>: <span class="pl-c1">222</span>,
<span class="pl-ent">"submitter"</span>: <span class="pl-s"><span class="pl-pds">"</span>usrme<span class="pl-pds">"</span></span>,
<span class="pl-ent">"commentsUrl"</span>: <span class="pl-s"><span class="pl-pds">"</span>https://news.ycombinator.com/item?id=33762438<span class="pl-pds">"</span></span>,
<span class="pl-ent">"numComments"</span>: <span class="pl-c1">38</span>
}
]
}</pre></div>
<p>It's almost the same shape as the <code>/-/insert</code> call above. That's because it's using a feature of the Datasette API inherited from <a href="https://sqlite-utils.datasette.io/">sqlite-utils</a> - it can create a table from a list of rows, automatically determining the correct schema.</p>
<p>If you already know your schema you can pass a <code>"columns": [...]</code> key instead, but I've found that this kind of automatic schema generation works really well in practice.</p>
<p>Datasette will let you call the create API like that multiple times, and if the table already exists it will insert new rows directly into the existing tables. I expect this to be a really convenient way to write automation scripts where you don't want to bother checking if the table exists already.</p>
<h4>Building an Atom feed</h4>
<p>My end goal with this demo was to build an Atom feed I could subscribe to in my NetNewsWire feed reader.</p>
<p>I have a plugin for that already: <a href="https://datasette.io/plugins/datasette-atom">datasette-atom</a>, which lets you generate an Atom feed for any data in Datasette, defined using a SQL query.</p>
<p>I created a SQL view for this (using the <a href="https://datasette.io/plugins/datasette-write">datasette-write</a> plugin, which is installed on Datasette Cloud):</p>
<div class="highlight highlight-source-sql"><pre><span class="pl-k">CREATE</span> <span class="pl-k">VIEW</span> <span class="pl-en">hacker_news_posts_atom</span> <span class="pl-k">as</span> <span class="pl-k">select</span>
id <span class="pl-k">as</span> atom_id,
title <span class="pl-k">as</span> atom_title,
url,
commentsUrl <span class="pl-k">as</span> atom_link,
dt <span class="pl-k">||</span> <span class="pl-s"><span class="pl-pds">'</span>Z<span class="pl-pds">'</span></span> <span class="pl-k">as</span> atom_updated,
<span class="pl-s"><span class="pl-pds">'</span>Submitter: <span class="pl-pds">'</span></span> <span class="pl-k">||</span> submitter <span class="pl-k">||</span> <span class="pl-s"><span class="pl-pds">'</span> - <span class="pl-pds">'</span></span> <span class="pl-k">||</span> points <span class="pl-k">||</span> <span class="pl-s"><span class="pl-pds">'</span> points, <span class="pl-pds">'</span></span> <span class="pl-k">||</span> numComments <span class="pl-k">||</span> <span class="pl-s"><span class="pl-pds">'</span> comments<span class="pl-pds">'</span></span> <span class="pl-k">as</span> atom_content
<span class="pl-k">from</span>
hacker_news_posts
<span class="pl-k">order by</span>
dt <span class="pl-k">desc</span>
<span class="pl-k">limit</span>
<span class="pl-c1">100</span>;</pre></div>
<p><code>datasette-atom</code> requires a table, view or SQL query that returns <code>atom_id</code>, <code>atom_title</code> and <code>atom_updated</code> columns - and will make use of <code>atom_link</code> and <code>atom_content</code> as well if they are present.</p>
<p>Datasette Cloud defaults to keeping all tables and views private - but a while ago I created the <a href="https://datasette.io/plugins/datasette-public">datasette-public</a> plugin to provide a UI for making a table public.</p>
<p>It turned out this didn't work for SQL views yet, so <a href="https://github.com/simonw/datasette-public/issues/5">I fixed that</a> - then used that option to make my view public. You can visit it at:</p>
<p><a href="https://simon.datasette.cloud/data/hacker_news_posts_atom">https://simon.datasette.cloud/data/hacker_news_posts_atom</a></p>
<p>And to get an Atom feed, just add <code>.atom</code> to the end of the URL:</p>
<p><a href="https://simon.datasette.cloud/data/hacker_news_posts_atom.atom">https://simon.datasette.cloud/data/hacker_news_posts_atom.atom</a></p>
<p>Here's what it looks like in NetNewsWire:</p>
<p><img src="https://static.simonwillison.net/static/2022/netnewswire-hacker-news.jpg" alt="A screenshot of a feed reading interface, showing posts from Hacker News with the submitter, number of points and number of comments" style="max-width: 100%;" /></p>
<p>I'm pretty excited about being able to combine these tools in this way: it makes getting from scraped data to a Datasette table to an Atom feed a very repeatable process.</p>
<h4>Building a TODO list application</h4>
<p>My second demo explores what it looks like to develop custom applications against the new API.</p>
<p><a href="https://todomvc.com">TodoMVC</a> is a project that provides the same TODO list interface built using dozens of different JavaScript frameworks, as a comparison tool.</p>
<p>I decided to use it to build my own TODO list application, using Datasette as the backend.</p>
<p>You can try it out at <a href="https://todomvc.datasette.io/">https://todomvc.datasette.io/</a> - but be warned that the demo resets every 15 minutes so don't use it for real task tracking!</p>
<p><img src="https://static.simonwillison.net/static/2022/todomvc.gif" alt="Animated GIF showing a TODO list interface - I add two items to it, then check one of them off as done, then remove the other one" style="max-width: 100%;" /></p>
<p>The source code for this demo lives in <a href="https://github.com/simonw/todomvc-datasette">simonw/todomvc-datasette</a> - which also serves the demo itself using GitHub Pages.</p>
<p>The code is based on the TodoMVC <a href="https://github.com/tastejs/todomvc/tree/gh-pages/examples/vanillajs">Vanilla JavaScript example</a>. I used that unmodified, except for one file - <a href="https://github.com/simonw/todomvc-datasette/blob/main/js/store.js">store.js</a>, which I modified to use the Datasette API instead of <code>localStorage</code>.</p>
<p>The demo currently uses a hard-coded authentication token, which is signed to allow actions to be performed against the <a href="https://latest.datasette.io/">https://latest.datasette.io/</a> demo instance as a user called <code>todomvc</code>.</p>
<p>That user is granted permissions <a href="https://github.com/simonw/datasette/blob/cab5b60e09e94aca820dbec5308446a88c99ea3d/tests/plugins/my_plugin.py#L223-L230">in a custom plugin</a> at the moment, but I plan to provide a more user-friendly way to do this in the future.</p>
<p>A couple of illustrative snippets of code. First, on page load this constructor uses the Datasette API to create the table used by the application:</p>
<div class="highlight highlight-source-js"><pre><span class="pl-k">function</span> <span class="pl-v">Store</span><span class="pl-kos">(</span><span class="pl-s1">name</span><span class="pl-kos">,</span> <span class="pl-s1">callback</span><span class="pl-kos">)</span> <span class="pl-kos">{</span>
<span class="pl-s1">callback</span> <span class="pl-c1">=</span> <span class="pl-s1">callback</span> <span class="pl-c1">||</span> <span class="pl-k">function</span> <span class="pl-kos">(</span><span class="pl-kos">)</span> <span class="pl-kos">{</span><span class="pl-kos">}</span><span class="pl-kos">;</span>
<span class="pl-c">// Ensure a table exists with this name</span>
<span class="pl-k">let</span> <span class="pl-s1">self</span> <span class="pl-c1">=</span> <span class="pl-smi">this</span><span class="pl-kos">;</span>
<span class="pl-s1">self</span><span class="pl-kos">.</span><span class="pl-c1">_dbName</span> <span class="pl-c1">=</span> <span class="pl-s">`todo_<span class="pl-s1"><span class="pl-kos">${</span><span class="pl-s1">name</span><span class="pl-kos">}</span></span>`</span><span class="pl-kos">;</span>
<span class="pl-en">fetch</span><span class="pl-kos">(</span><span class="pl-s">"https://latest.datasette.io/ephemeral/-/create"</span><span class="pl-kos">,</span> <span class="pl-kos">{</span>
<span class="pl-c1">method</span>: <span class="pl-s">"POST"</span><span class="pl-kos">,</span>
<span class="pl-c1">mode</span>: <span class="pl-s">"cors"</span><span class="pl-kos">,</span>
<span class="pl-c1">headers</span>: <span class="pl-kos">{</span>
<span class="pl-c1">Authorization</span>: <span class="pl-s">`Bearer <span class="pl-s1"><span class="pl-kos">${</span><span class="pl-c1">TOKEN</span><span class="pl-kos">}</span></span>`</span><span class="pl-kos">,</span>
<span class="pl-s">"Content-Type"</span>: <span class="pl-s">"application/json"</span><span class="pl-kos">,</span>
<span class="pl-kos">}</span><span class="pl-kos">,</span>
<span class="pl-c1">body</span>: <span class="pl-c1">JSON</span><span class="pl-kos">.</span><span class="pl-en">stringify</span><span class="pl-kos">(</span><span class="pl-kos">{</span>
<span class="pl-c1">table</span>: <span class="pl-s1">self</span><span class="pl-kos">.</span><span class="pl-c1">_dbName</span><span class="pl-kos">,</span>
<span class="pl-c1">columns</span>: <span class="pl-kos">[</span>
<span class="pl-kos">{</span><span class="pl-c1">name</span>: <span class="pl-s">"id"</span><span class="pl-kos">,</span> <span class="pl-c1">type</span>: <span class="pl-s">"integer"</span><span class="pl-kos">}</span><span class="pl-kos">,</span>
<span class="pl-kos">{</span><span class="pl-c1">name</span>: <span class="pl-s">"title"</span><span class="pl-kos">,</span> <span class="pl-c1">type</span>: <span class="pl-s">"text"</span><span class="pl-kos">}</span><span class="pl-kos">,</span>
<span class="pl-kos">{</span><span class="pl-c1">name</span>: <span class="pl-s">"completed"</span><span class="pl-kos">,</span> <span class="pl-c1">type</span>: <span class="pl-s">"integer"</span><span class="pl-kos">}</span><span class="pl-kos">,</span>
<span class="pl-kos">]</span><span class="pl-kos">,</span>
<span class="pl-c1">pk</span>: <span class="pl-s">"id"</span><span class="pl-kos">,</span>
<span class="pl-kos">}</span><span class="pl-kos">)</span><span class="pl-kos">,</span>
<span class="pl-kos">}</span><span class="pl-kos">)</span><span class="pl-kos">.</span><span class="pl-en">then</span><span class="pl-kos">(</span><span class="pl-k">function</span> <span class="pl-kos">(</span><span class="pl-s1">r</span><span class="pl-kos">)</span> <span class="pl-kos">{</span>
<span class="pl-s1">callback</span><span class="pl-kos">.</span><span class="pl-en">call</span><span class="pl-kos">(</span><span class="pl-smi">this</span><span class="pl-kos">,</span> <span class="pl-kos">[</span><span class="pl-kos">]</span><span class="pl-kos">)</span><span class="pl-kos">;</span>
<span class="pl-kos">}</span><span class="pl-kos">)</span><span class="pl-kos">;</span>
<span class="pl-kos">}</span></pre></div>
<p>Most applications would run against a table that has already been created, but this felt like a good opportunity to show what table creation looks like.</p>
<p>Note that the table is being created using <code>/ephemeral/-/create</code> - this endpoint that lets you create tables in the ephemeral database, which is a temporary database that drops every table after 15 minutes. I built the <a href="https://datasette.io/plugins/datasette-ephemeral-tables">datasette-ephemeral-tables</a> plugin to make this possible.</p>
<p>Here's the code which is called when a new TODO list item is created or updated:</p>
<div class="highlight highlight-source-js"><pre><span class="pl-v">Store</span><span class="pl-kos">.</span><span class="pl-c1">prototype</span><span class="pl-kos">.</span><span class="pl-en">save</span> <span class="pl-c1">=</span> <span class="pl-k">function</span> <span class="pl-kos">(</span><span class="pl-s1">updateData</span><span class="pl-kos">,</span> <span class="pl-s1">callback</span><span class="pl-kos">,</span> <span class="pl-s1">id</span><span class="pl-kos">)</span> <span class="pl-kos">{</span>
<span class="pl-c">// {title, completed}</span>
<span class="pl-s1">callback</span> <span class="pl-c1">=</span> <span class="pl-s1">callback</span> <span class="pl-c1">||</span> <span class="pl-k">function</span> <span class="pl-kos">(</span><span class="pl-kos">)</span> <span class="pl-kos">{</span><span class="pl-kos">}</span><span class="pl-kos">;</span>
<span class="pl-k">var</span> <span class="pl-s1">table</span> <span class="pl-c1">=</span> <span class="pl-smi">this</span><span class="pl-kos">.</span><span class="pl-c1">_dbName</span><span class="pl-kos">;</span>
<span class="pl-c">// If an ID was actually given, find the item and update each property</span>
<span class="pl-k">if</span> <span class="pl-kos">(</span><span class="pl-s1">id</span><span class="pl-kos">)</span> <span class="pl-kos">{</span>
<span class="pl-en">fetch</span><span class="pl-kos">(</span>
<span class="pl-s">`https://latest.datasette.io/ephemeral/<span class="pl-s1"><span class="pl-kos">${</span><span class="pl-s1">table</span><span class="pl-kos">}</span></span>/<span class="pl-s1"><span class="pl-kos">${</span><span class="pl-s1">id</span><span class="pl-kos">}</span></span>/-/update`</span><span class="pl-kos">,</span>
<span class="pl-kos">{</span>
<span class="pl-c1">method</span>: <span class="pl-s">"POST"</span><span class="pl-kos">,</span>
<span class="pl-c1">mode</span>: <span class="pl-s">"cors"</span><span class="pl-kos">,</span>
<span class="pl-c1">headers</span>: <span class="pl-kos">{</span>
<span class="pl-c1">Authorization</span>: <span class="pl-s">`Bearer <span class="pl-s1"><span class="pl-kos">${</span><span class="pl-c1">TOKEN</span><span class="pl-kos">}</span></span>`</span><span class="pl-kos">,</span>
<span class="pl-s">"Content-Type"</span>: <span class="pl-s">"application/json"</span><span class="pl-kos">,</span>
<span class="pl-kos">}</span><span class="pl-kos">,</span>
<span class="pl-c1">body</span>: <span class="pl-c1">JSON</span><span class="pl-kos">.</span><span class="pl-en">stringify</span><span class="pl-kos">(</span><span class="pl-kos">{</span><span class="pl-c1">update</span>: <span class="pl-s1">updateData</span><span class="pl-kos">}</span><span class="pl-kos">)</span><span class="pl-kos">,</span>
<span class="pl-kos">}</span>
<span class="pl-kos">)</span>
<span class="pl-kos">.</span><span class="pl-en">then</span><span class="pl-kos">(</span><span class="pl-kos">(</span><span class="pl-s1">r</span><span class="pl-kos">)</span> <span class="pl-c1">=></span> <span class="pl-s1">r</span><span class="pl-kos">.</span><span class="pl-en">json</span><span class="pl-kos">(</span><span class="pl-kos">)</span><span class="pl-kos">)</span>
<span class="pl-kos">.</span><span class="pl-en">then</span><span class="pl-kos">(</span><span class="pl-kos">(</span><span class="pl-s1">data</span><span class="pl-kos">)</span> <span class="pl-c1">=></span> <span class="pl-kos">{</span>
<span class="pl-s1">callback</span><span class="pl-kos">.</span><span class="pl-en">call</span><span class="pl-kos">(</span><span class="pl-s1">self</span><span class="pl-kos">,</span> <span class="pl-s1">data</span><span class="pl-kos">)</span><span class="pl-kos">;</span>
<span class="pl-kos">}</span><span class="pl-kos">)</span><span class="pl-kos">;</span>
<span class="pl-kos">}</span> <span class="pl-k">else</span> <span class="pl-kos">{</span>
<span class="pl-c">// Save it and store ID</span>
<span class="pl-en">fetch</span><span class="pl-kos">(</span><span class="pl-s">`https://latest.datasette.io/ephemeral/<span class="pl-s1"><span class="pl-kos">${</span><span class="pl-s1">table</span><span class="pl-kos">}</span></span>/-/insert`</span><span class="pl-kos">,</span> <span class="pl-kos">{</span>
<span class="pl-c1">method</span>: <span class="pl-s">"POST"</span><span class="pl-kos">,</span>
<span class="pl-c1">mode</span>: <span class="pl-s">"cors"</span><span class="pl-kos">,</span>
<span class="pl-c1">headers</span>: <span class="pl-kos">{</span>
<span class="pl-c1">Authorization</span>: <span class="pl-s">`Bearer <span class="pl-s1"><span class="pl-kos">${</span><span class="pl-c1">TOKEN</span><span class="pl-kos">}</span></span>`</span><span class="pl-kos">,</span>
<span class="pl-s">"Content-Type"</span>: <span class="pl-s">"application/json"</span><span class="pl-kos">,</span>
<span class="pl-kos">}</span><span class="pl-kos">,</span>
<span class="pl-c1">body</span>: <span class="pl-c1">JSON</span><span class="pl-kos">.</span><span class="pl-en">stringify</span><span class="pl-kos">(</span><span class="pl-kos">{</span>
<span class="pl-c1">row</span>: <span class="pl-s1">updateData</span><span class="pl-kos">,</span>
<span class="pl-kos">}</span><span class="pl-kos">)</span><span class="pl-kos">,</span>
<span class="pl-kos">}</span><span class="pl-kos">)</span>
<span class="pl-kos">.</span><span class="pl-en">then</span><span class="pl-kos">(</span><span class="pl-kos">(</span><span class="pl-s1">r</span><span class="pl-kos">)</span> <span class="pl-c1">=></span> <span class="pl-s1">r</span><span class="pl-kos">.</span><span class="pl-en">json</span><span class="pl-kos">(</span><span class="pl-kos">)</span><span class="pl-kos">)</span>
<span class="pl-kos">.</span><span class="pl-en">then</span><span class="pl-kos">(</span><span class="pl-kos">(</span><span class="pl-s1">data</span><span class="pl-kos">)</span> <span class="pl-c1">=></span> <span class="pl-kos">{</span>
<span class="pl-k">let</span> <span class="pl-s1">row</span> <span class="pl-c1">=</span> <span class="pl-s1">data</span><span class="pl-kos">.</span><span class="pl-c1">rows</span><span class="pl-kos">[</span><span class="pl-c1">0</span><span class="pl-kos">]</span><span class="pl-kos">;</span>
<span class="pl-s1">callback</span><span class="pl-kos">.</span><span class="pl-en">call</span><span class="pl-kos">(</span><span class="pl-s1">self</span><span class="pl-kos">,</span> <span class="pl-s1">row</span><span class="pl-kos">)</span><span class="pl-kos">;</span>
<span class="pl-kos">}</span><span class="pl-kos">)</span><span class="pl-kos">;</span>
<span class="pl-kos">}</span>
<span class="pl-kos">}</span><span class="pl-kos">;</span></pre></div>
<p>TodoMVC passes an <code>id</code> if a record is being updated - which this code uses as a sign that the <code>...table/row-id/-/update</code> API should be called (see <a href="https://docs.datasette.io/en/latest/json_api.html#updating-a-row">update API documentation</a>).</p>
<p>If the row doen't have an ID it is inserted using <code>table/-/insert</code>, this time using the <code>"row":</code> key because we are only inserting a single row.</p>
<p>The hardest part of getting this to work was ensuring Datasette's <a href="https://docs.datasette.io/en/latest/json_api.html#json-api">CORS mode</a> worked correctly for writes. I had to add a new <code>Access-Control-Allow-Methods</code> header, which I shipped in <a href="https://docs.datasette.io/en/latest/changelog.html#a1-2022-12-01">Datasette 1.0a1</a> (see <a href="https://github.com/simonw/datasette/issues/1922">issue #1922</a>).</p>
<h4>Try the ephemeral hosted API</h4>
<p>I built the <a href="https://datasette.io/plugins/datasette-ephemeral-tables">datasette-ephemeral-tables</a> plugin because I wanted to provide a demo instance of the write API that anyone could try out without needing to install Datasette themselves - but that wouldn't leave me responsible for taking care of their data or cleaning up any of their mess.</p>
<p>You're welcome to experiment with the API using the <a href="https://latest.datasette.io/">https://latest.datasette.io/</a> demo instance.</p>
<p>First, you'll need to sign in as a root user. You can do that (no password required) using the button <a href="https://latest.datasette.io/login-as-root">on this page</a>.</p>
<p>Once signed in you can view the ephemeral database (which isn't visible to anonymous users) here:</p>
<p><a href="https://latest.datasette.io/ephemeral">https://latest.datasette.io/ephemeral</a></p>
<p>You can use the API explorer to try out the different write APIs against it here:</p>
<p><a href="https://latest.datasette.io/-/api">https://latest.datasette.io/-/api</a></p>
<p>And you can create your own signed token for accessing the API on this page:</p>
<p><a href="https://latest.datasette.io/-/create-token">https://latest.datasette.io/-/create-token</a></p>
<p><img src="https://static.simonwillison.net/static/2022/create-token.jpg" alt="The Create an API token page lets you create a token that expires after a set number of hours - you can then copy that token to your clipboard" style="max-width: 100%;" /></p>
<p>The TodoMVC application described above also uses the <code>ephemeral</code> database, so you may see a <code>todo_todos-vanillajs</code> table appear there if anyone is playing with that demo.</p>
<h4 id="your-machine">Or run this on your own machine</h4>
<p>You can install the latest Datasette alpha like this:</p>
<pre><code>pip install datasette==1.0a1
</code></pre>
<p>Then create a database and sign in as the <code>root</code> user in order to gain access to the API:</p>
<pre><code>datasette demo.db --create --root
</code></pre>
<p>Click on the link it outputs to sign in as the root user, then visit the API explorer to start trying out the API:</p>
<p><a href="http://127.0.0.1:8001/-/api">http://127.0.0.1:8001/-/api</a></p>
<p><img src="https://static.simonwillison.net/static/2022/api-explorer.jpg" alt="The API explorer interface has tools for sending GET and POST requests, plus a list of API endpoints" style="max-width: 100%;" /></p>
<p>The API explorer works without a token at all, using your existing browser cookies.</p>
<p>If you want to try the API using <code>curl</code> or similar you can use this page to create a new signed API token for the <code>root</code> user:</p>
<p><a href="http://127.0.0.1:8001/-/create-token">http://127.0.0.1:8001/-/create-token</a></p>
<p>This token will become invalid if you restart the server, unless you fix the <code>DATASETTE_SECRET</code> environment variable to a stable string before you start the server:</p>
<pre><code>export DATASETTE_SECRET=$(
python3 -c 'print(__import__("secrets").token_hex(16))'
)
</code></pre>
<p>Check the <a href="https://docs.datasette.io/en/latest/json_api.html#the-json-write-api">Write API documentation</a> for more details.</p>
<h4>What's next?</h4>
<p>If you have feedback on these APIs, <em>now is the time</em> to share it! I'm hoping to ship Datasette 1.0 at the start of 2023, after which these APIs will be considered stable for hopefully a long time to come.</p>
<p>If you have thoughts or feedback (or questions) join us on the <a href="https://datasette.io/discord">Datasette Discord</a>. You can also file issue comments against <a href="https://github.com/simonw/issues">Datasette</a> itself.</p>
<p>My priority for the next 1.0 alpha is to bake in a small number of backwards incompatible changes to other aspects of Datasette's JSON API that I've been hoping to include in 1.0 for a while.</p>
<p>I'm also going to be rolling out API support to my <a href="https://datasette.cloud/">Datasette Cloud</a> preview users. If you're interested in trying that out you can <a href="https://www.datasette.cloud/preview/">request access here</a>.</p>
Datasette 0.63: The annotated release notes2022-10-27T22:13:41+00:002022-10-27T22:13:41+00:00https://simonwillison.net/2022/Oct/27/datasette-0-63/#atom-series
<p>I released <a href="https://docs.datasette.io/en/stable/changelog.html#v0-63">Datasette 0.63</a> today. These are the <a href="https://simonwillison.net/series/datasette-release-notes/">annotated release notes</a>.</p>
<h4>Features</h4>
<blockquote>
<ul>
<li>Now tested against Python 3.11. Docker containers used by <code>datasette publish</code> and <code>datasette package</code> both now use that version of Python. (<a href="https://github.com/simonw/datasette/issues/1853">#1853</a>)</li>
</ul>
</blockquote>
<p><a href="https://docs.python.org/3.11/whatsnew/3.11.html">Python 3.11</a> came out a few days ago, and one of the highlights of that release was a significant amount of work invested in performance.</p>
<p>I ran a very basic load test comparing Datasette running on Python 3.10 and 3.11... and got 533.89 requests/second up from 413.56 requests/second - a 29% improvement!</p>
<p>Datasette works on (and is tested against) Python 3.7 and higher, but it has a couple of mechanisms that can bake a new Docker container for you. This performance increase was significant enough that I took the time to upgrade those containers to Python 3.11, using the <code>3.11.0-slim-bullseye</code> base image.</p>
<blockquote>
<ul>
<li>
<code>--load-extension</code> option now supports entrypoints. Thanks, Alex Garcia. (<a href="https://github.com/simonw/datasette/pull/1789">#1789</a>)</li>
</ul>
</blockquote>
<p>Alex continues to do <a href="https://observablehq.com/@asg017/a-new-sqlite-blog-series?collection=@asg017/sqlite-blog">amazing work</a> documenting the process for building SQLite extensions. It turns out a single extension can offer multiple variants - a trick he uses in some of his own. He contributed this fix to Datasette, and I ported the same feature <a href="https://github.com/simonw/sqlite-utils/issues/470">to sqlite-utils as well</a>.</p>
<blockquote>
<ul>
<li>Facet size can now be set per-table with the new <code>facet_size</code> table metadata option. (<a href="https://github.com/simonw/datasette/issues/1804">#1804</a>)</li>
</ul>
</blockquote>
<p>Suggested by Charles Nepote on the <a href="https://datasette.io/discord">Datasette Discord</a>. Neat for if you're dealing with larger databases.</p>
<blockquote>
<ul>
<li>The <a href="https://docs.datasette.io/en/stable/settings.html#setting-truncate-cells-html">truncate_cells_html</a> setting now also affects long URLs in columns. (<a href="https://github.com/simonw/datasette/issues/1805">#1805</a>)</li>
</ul>
</blockquote>
<p>Also suggested by Charles. This helps keep cells with giant linked URLs in them readable.</p>
<blockquote>
<ul>
<li>The non-JavaScript SQL editor textarea now increases height to fit the SQL query. (<a href="https://github.com/simonw/datasette/issues/1786">#1786</a>)</li>
</ul>
</blockquote>
<p>I implemented this partly because I still care how well Datasette works if the JavaScript doesn't load, but mainly for <a href="https://lite.datasette.io/">Datasette Lite</a>, which still doesn't have a solution for loading JavaScript widgets like the CodeMirror editor used in regular Datasette.</p>
<p><a href="https://lite.datasette.io/#/fixtures?sql=select%0A++pk1%2C%0A++pk2%2C%0A++content%2C%0A++sortable%2C%0A++sortable_with_nulls%2C%0A++sortable_with_nulls_2%2C%0A++text%0Afrom%0A++sortable%0Aorder+by%0A++pk1%2C%0A++pk2%0Alimit%0A++101">Here's a demo in Datasette Lite</a>.</p>
<blockquote>
<ul>
<li>Facets are now displayed with better line-breaks in long values. Thanks, Daniel Rech. (<a href="https://github.com/simonw/datasette/pull/1794">#1794</a>)</li>
</ul>
</blockquote>
<p>This uses the <code>word-break: break-all;</code> CSS property.</p>
<blockquote>
<ul>
<li>The <code>settings.json</code> file used in <a href="https://docs.datasette.io/en/stable/settings.html#config-dir">Configuration directory mode</a> is now validated on startup. (<a href="https://github.com/simonw/datasette/issues/1816">#1816</a>)</li>
</ul>
</blockquote>
<p>I added this in response to <a href="https://github.com/simonw/datasette/issues/1814">a bug report</a> caused by someone misunderstanding the settings and adding an invalid one that got silently ignored. Now Datasette will raise an error if you try to use a setting doesn't exist, which is a much better user experience.</p>
<blockquote>
<ul>
<li>SQL queries can now include leading SQL comments, using <code>/* ... */</code> or <code>-- ...</code> syntax. Thanks, Charles Nepote. (<a href="https://github.com/simonw/datasette/issues/1860">#1860</a>)</li>
</ul>
</blockquote>
<p>Charles pointed out that it's useful to be able to include a descriptive comment at the start of a SQL query, but Datasette was refusing queries that started with a comment.</p>
<p>Fixing this took some regular expression wizardry. Here's the final expression plus <a href="https://www.debuggex.com/r/Rbw-UWD9PdOU2GyO">a visualization</a>:</p>
<pre><code>^\s*((?:\-\-.*?\n\s*)|(?:\/\*((?!\*\/)[\s\S])*\*\/)\s*)*\s*select\b
</code></pre>
<p><img src="https://static.simonwillison.net/static/2022/sql-comment-regex.png" alt="Screenshot of the debuggex interface showing a diagram that illustrates the regular expression" style="max-width: 100%;" /></p>
<p>The trickiest part was that I wanted to match comments like this one:</p>
<pre><code>/* This is a comment
that spans multiple lines */
</code></pre>
<p>But... I didn't want to get confused by <code>*/</code> appearing inside a string literal. I wanted to match <code>/*</code> and then any sequence of characters that was NOT the end comment marker (including newlines) - and then <code>*/</code>.</p>
<p>That's what this does:</p>
<pre><code>\/\*((?!\*\/)[\s\S])*\*\/)
</code></pre>
<p><code>\/\*</code> and <code>\*\/</code> are the esacped sequences for the literal <code>/*</code> and <code>*/</code>.</p>
<p><code>((?!\*\/)[\s\S])</code> means "match any sequence of characters that is NOT the end comment marker, multiple times". <code>\s\S</code> means both whitespace AND non-whitespace characters - it's a trick for working around the fact that <code>.</code> doesn't match newline characters.</p>
<blockquote>
<ul>
<li>SQL query is now re-displayed when terminated with a time limit error. (<a href="https://github.com/simonw/datasette/issues/1819">#1819</a>)</li>
</ul>
</blockquote>
<p>Previously the error message wouldn't include the SQL you entered, so you would have to hit "back" to recover your query.</p>
<blockquote>
<ul>
<li>The <a href="https://docs.datasette.io/en/stable/performance.html#performance-inspect">inspect data</a> mechanism is now used to speed up server startup - thanks, Forest Gregg. (<a href="https://github.com/simonw/datasette/issues/1834">#1834</a>)</li>
</ul>
</blockquote>
<p><code>datasette inspect</code> is a pretty obscure Datasette feature. Run it against a SQLite file and it dumps out JSON representing some key statistics - most importantly the number of rows in each table.</p>
<p>If the database is immutable you can then start Datasette with a refererence to that JSON file (using the <code>--inspect-data path</code> option) and Datasette will use that data to save itself from having to count rows when displaying information about the database - a potentially expensive operation.</p>
<p>Most people will never use this feature directly, but various <code>datasette publish</code> commands (including <code>datasette publish cloudrun</code>) use it as part of the process of building a Docker image, so it's actually quite impactful.</p>
<p>Forest Gregg spotted a way this could further increase server startup time, by skipping the step where Datasette calculates a hash of the database file since that's also included in the <code>inspect-data</code> file.</p>
<blockquote>
<ul>
<li>In <a href="https://docs.datasette.io/en/stable/settings.html#config-dir">Configuration directory mode</a> databases with filenames ending in <code>.sqlite</code> or <code>.sqlite3</code> are now automatically added to the Datasette instance. (<a href="https://github.com/simonw/datasette/issues/1646">#1646</a>)</li>
</ul>
</blockquote>
<p>If you start Datasette by pointing it at a directory, <code>datasette path/to/dir</code>, it will now load any <code>*.sqlite</code> or <code>*.sqlite3</code> files in that directory in addition to any <code>*.db</code> files.</p>
<blockquote>
<ul>
<li>Breadcrumb navigation display now respects the current user's permissions. (<a href="https://github.com/simonw/datasette/issues/1831">#1831</a>)</li>
</ul>
</blockquote>
<p>This improvement came out of work I've been doing on <a href="https://www.datasette.cloud/">Datasette Cloud</a>, see <a href="https://simonwillison.net/2022/Oct/12/publishing-data/">previous weeknotes</a>.</p>
<h4>Plugin hooks and internals</h4>
<blockquote>
<ul>
<li>The <a href="https://docs.datasette.io/en/stable/plugin_hooks.html#plugin-hook-prepare-jinja2-environment">prepare_jinja2_environment(env, datasette)</a> plugin hook now accepts an optional <code>datasette</code> argument. Hook implementations can also now return an <code>async</code> function which will be awaited automatically. (<a href="https://github.com/simonw/datasette/issues/1809">#1809</a>)</li>
</ul>
</blockquote>
<p>When I built the <a href="https://github.com/simonw/datasette-edit-templates">datasette-edit-templates</a> plugin I found myself wanting to access the <code>datasette</code> object inside a <code>prepare_jinja2_environment()</code> plugin hook implementation.</p>
<p>Since <code>datasette</code> wasn't passed to that hook, I solved it with this horrific piece of stack inspection:</p>
<pre><span class="pl-en">@<span class="pl-s1">hookimpl</span></span>
<span class="pl-k">def</span> <span class="pl-en">prepare_jinja2_environment</span>(<span class="pl-s1">env</span>):
<span class="pl-c"># TODO: This should ideally take datasette, but that's not an argument yet</span>
<span class="pl-s1">datasette</span> <span class="pl-c1">=</span> <span class="pl-s1">inspect</span>.<span class="pl-en">currentframe</span>().<span class="pl-s1">f_back</span>.<span class="pl-s1">f_back</span>.<span class="pl-s1">f_back</span>.<span class="pl-s1">f_back</span>.<span class="pl-s1">f_locals</span>[<span class="pl-s">"self"</span>]</pre>
<p>Making <code>datasette</code> available as a parameter there is much nicer!</p>
<blockquote>
<ul>
<li>
<code>Database(is_mutable=)</code> now defaults to <code>True</code>. (<a href="https://github.com/simonw/datasette/issues/1808">#1808</a>)</li>
</ul>
</blockquote>
<p>I hit <a href="https://github.com/simonw/datasette-upload-dbs/issues/6">a bug in datasette-upload-dbs</a> because I'd added extra databases to the Datasette interface in immutable mode by mistake. This API change makes that mistake a lot harder to make.</p>
<blockquote>
<ul>
<li>The <a href="https://docs.datasette.io/en/stable/internals.html#datasette-check-visibility">datasette.check_visibility()</a> method now accepts an optional <code>permissions=</code> list, allowing it to take multiple permissions into account at once when deciding if something should be shown as public or private. This has been used to correctly display padlock icons in more places in the Datasette interface. (<a href="https://github.com/simonw/datasette/issues/1829">#1829</a>)</li>
</ul>
</blockquote>
<p>Another improvement inspired by Datasette Cloud: I noticed that padlock icons weren't appearing correctly throughout the Datasette interface, where they are meant to indicate that a database, table, view or query is private.</p>
<p>That was because there are actually multiple permission checks involved in deciding if something is private: a user may have access granted at the instance, database or table level.</p>
<p>The code that granted permission was checking all of those, but the code that then checked to see if an anonymous user would be able to see it wasn't. I fixed that, while maintaining compatibility for older uses of the <code>check_visibility()</code> method.</p>
<blockquote>
<ul>
<li>Datasette no longer enforces upper bounds on its dependencies. (<a href="https://github.com/simonw/datasette/issues/1800">#1800</a>)</li>
</ul>
</blockquote>
<p>Henry Schreiner <a href="https://iscinumpy.dev/post/bound-version-constraints/">convinced me</a>. See also <a href="https://hynek.me/articles/semver-will-not-save-you/">Semantic Versioning Will Not Save You</a> by Hynek Schlawack.</p>
<p>I'm still worrying about this though.</p>
<p>On the one hand, Datasette can be used as a library as part of a larger application (see <a href="https://til.simonwillison.net/django/datasette-django">this recent Django TIL</a> for an example). As such, pinning requirements is rude: it makes it harder for downstream users to upgrade their dependencies.</p>
<p>But most of the time Datasette is installed as a standalone application - <code>pipx install datasette</code> or <code>brew install datasette</code> for example. In those cases, I'd like to be able to guarantee that the user will get known-to-work versions of every dependency.</p>
<p>I'm still thinking through my options here. I may even break out the core part of Datasette that works as a library into a separate package, so <code>datasette</code> has pinned dependencies but library users can install an unpinned version using a separate package name.</p>
<h4>Documentation</h4>
<p>This section stands without explanation I think. I like using the release notes to highlight this kind of thing, including articles that might not be hosted in the core Datasette documentation itself.</p>
<blockquote>
<ul>
<li>New tutorial: <a href="https://datasette.io/tutorials/clean-data">Cleaning data with sqlite-utils and Datasette</a>.</li>
<li>Screenshots in the documentation are now maintained using <a href="https://shot-scraper.datasette.io/">shot-scraper</a>, as described in <a href="https://simonwillison.net/2022/Oct/14/automating-screenshots/">Automating screenshots for the Datasette documentation using shot-scraper</a>. (<a href="https://github.com/simonw/datasette/issues/1844">#1844</a>)</li>
<li>More detailed command descriptions on the <a href="https://docs.datasette.io/en/stable/cli-reference.html#cli-reference">CLI reference</a> page. (<a href="https://github.com/simonw/datasette/issues/1787">#1787</a>)</li>
<li>New documentation on <a href="https://docs.datasette.io/en/stable/deploying.html#deploying-openrc">Running Datasette using OpenRC</a> - thanks, Adam Simpson. (<a href="https://github.com/simonw/datasette/pull/1825">#1825</a>).</li>
</ul>
</blockquote>
Datasette 0.61: The annotated release notes2022-03-24T01:53:15+00:002022-03-24T01:53:15+00:00https://simonwillison.net/2022/Mar/24/datasette-061/#atom-series
<p>I released <a href="https://docs.datasette.io/en/stable/changelog.html#v0-61">Datasette 0.61</a> this morning - closely followed by <a href="https://docs.datasette.io/en/stable/changelog.html#v0-61-1">0.61.1</a> to fix a minor bug. Here are the annotated release notes.</p>
<blockquote>
<p>In preparation for Datasette 1.0, this release includes two potentially backwards-incompatible changes. Hashed URL mode has been moved to a separate plugin, and the way Datasette generates URLs to databases and tables with special characters in their name such as <code>/</code> and <code>.</code> has changed.</p>
<ul>
<li>URLs within Datasette now use a different encoding scheme for tables or databases that include "special" characters outside of the range of <code>a-zA-Z0-9_-</code>. This scheme is explained here: <a href="https://docs.datasette.io/en/stable/internals.html#internals-tilde-encoding">Tilde encoding</a>. (<a href="https://github.com/simonw/datasette/issues/1657">#1657</a>)</li>
<li>Removed hashed URL mode from Datasette. The new <code>datasette-hashed-urls</code> plugin can be used to achieve the same result, see <a href="https://docs.datasette.io/en/stable/performance.html#performance-hashed-urls">datasette-hashed-urls</a> for details. (<a href="https://github.com/simonw/datasette/issues/1661">#1661</a>)</li>
</ul>
</blockquote>
<p>I talked about these changes <a href="https://simonwillison.net/2022/Mar/19/weeknotes/">in my weeknotes</a>. These are two major steps forward towards Datasette 1.0, in that implementing them removed a lot of code and complexity from Datasette core.</p>
<blockquote>
<p>Datasette also now requires Python 3.7 or higher.</p>
</blockquote>
<p>0.60 was the final release to work with Python 3.6, which ended security support <a href="https://pythonspeed.com/articles/stop-using-python-3.6/">last December</a>.</p>
<blockquote>
<ul>
<li>Databases can now have a custom path within the Datasette instance that is independent of the database name, using the <code>db.route</code> property. (<a href="https://github.com/simonw/datasette/issues/1668">#1668</a>)</li>
</ul>
</blockquote>
<p>This was the last change I made before the 0.61 release, and turned out to have <a href="https://github.com/simonw/datasette/issues/1682">a severe bug</a> which prompted me to release 0.61.1 shortly afterwards.</p>
<p>I needed the feature in order to implement the <a href="https://docs.datasette.io/en/stable/performance.html#performance-hashed-urls">datasette-hashed-urls</a> plugin.</p>
<p>That plugin works by modifying the URL relating to a database in order to incorporate the SHA-256 hash of that database's contents. This allows all JSON and HTML pages within that database to set a far-future cache header, providing a huge performance boost especially if run behind a caching proxy such as Cloudflare.</p>
<p>Initially it worked by changing the display name of the database too, but <a href="https://github.com/simonw/datasette-hashed-urls/issues/8">Forest Gregg pointed out</a> that exposing all of those hashes in the UI was a pretty poor user experience.</p>
<p>So I decided to split the "route" (the URL to the database - "path" was already in use as the path to the file on disk) from the unique name used to refer to the database internally.</p>
<p>I caught almost all of the places in the code that needed to be updated, but shortly after shipping 0.61 I noticed that Forest had <a href="https://github.com/simonw/datasette-hashed-urls/issues/13">filed issues</a> relating to the one place I had missed!</p>
<p>Those fixes are out now, and <a href="https://github.com/simonw/datasette-hashed-urls/releases/tag/0.3">datasette-hashed-urls 0.3</a> should work as advertised.</p>
<blockquote>
<ul>
<li>Datasette is now covered by a <a href="https://github.com/simonw/datasette/blob/main/CODE_OF_CONDUCT.md">Code of Conduct</a>. (<a href="https://github.com/simonw/datasette/issues/1654">#1654</a>)</li>
</ul>
</blockquote>
<p>It's good open source governance to have one of these. I decided to adopt the <a href="https://www.contributor-covenant.org/version/1/4/code-of-conduct/">Contributor Covenant</a> because it reflected my own values for the project and is used by a <a href="https://www.contributor-covenant.org/adopters/">large number of projects</a> that I trust.</p>
<blockquote>
<ul>
<li>Python 3.6 is no longer supported. (<a href="https://github.com/simonw/datasette/issues/1577">#1577</a>)</li>
<li>Tests now run against Python 3.11-dev. (<a href="https://github.com/simonw/datasette/issues/1621">#1621</a>)</li>
</ul>
</blockquote>
<p>I didn't test Datasette against Python 3.10 before it came out and was <a href="https://simonwillison.net/2021/Oct/9/finding-and-reporting-a-bug/">distressed to find an asyncio bug</a> that caused errors with the project after that version of Python was released!</p>
<p>I'm not going to let that happen again, so Datasette's test suite now runs against the 3.11 developer preview. Here's <a href="https://til.simonwillison.net/github-actions/python-3-11">my TIL</a> about how I set that up.</p>
<blockquote>
<ul>
<li>New <a href="https://docs.datasette.io/en/stable/internals.html#datasette-ensure-permissions">datasette.ensure_permissions(actor, permissions)</a> internal method for checking multiple permissions at once. (<a href="https://github.com/simonw/datasette/issues/1675">#1675</a>)</li>
<li>New <a href="https://docs.datasette.io/en/stable/internals.html#datasette-check-visibilty">datasette.check_visibility(actor, action, resource=None)</a> internal method for checking if a user can see a resource that would otherwise be invisible to unauthenticated users. (<a href="https://github.com/simonw/datasette/issues/1678">#1678</a>)</li>
</ul>
</blockquote>
<p>These two new permissions APIs came out of <a href="https://github.com/simonw/datasette/issues/1660">a larger effort</a> to refactor and simplify Datasette's core views.</p>
<p>Datasette's <a href="https://github.com/simonw/datasette/blob/0.60.2/datasette/views/base.py#L56-L101">BaseView class included permission logic</a>. I'm trying to shrink that superclass down to the point where I can remove it entirely, and I also wanted to make that permission logic available to plugins as well. Moving those methods into the documented <code>Datasette</code> class felt like a good way to achieve that.</p>
<blockquote>
<ul>
<li>Table and row HTML pages now include a <code><link rel="alternate" type="application/json+datasette" href="..."></code> element and return a <code>Link: URL; rel="alternate"; type="application/json+datasette"</code> HTTP header pointing to the JSON version of those pages. (<a href="https://github.com/simonw/datasette/issues/1533">#1533</a>)</li>
<li>
<code>Access-Control-Expose-Headers: Link</code> is now added to the CORS headers, allowing remote JavaScript to access that header.</li>
</ul>
</blockquote>
<p>The idea behind this change originated with my experimental work on the <a href="https://github.com/simonw/datasette-notebook">datasette-notebook</a> plugin, which aims to implement a combination wiki-dashboard system for Datasette. Development of that plugin is stalled for the moment.</p>
<p>I wanted to build a feature where you could paste in a URL to a Datasette query or filtered table and the plugin would then embed and display the results of that query on a page.</p>
<p>To support this, I built an experimental Web Component, <a href="https://www.npmjs.com/package/datasette-table">datasette-table</a>, and published it to npm (<a href="https://til.simonwillison.net/npm/publish-web-component">TIL</a>).</p>
<p>The problem I needed to solve was this: given a URL, how can I tell that it corresponds to a Datasette table or query? Especially if that URL might be hosted on a separate website entirely (why not support embedding Datasette tables from other instances?)</p>
<p>My solution was an HTTP header. You can now make a <code>HEAD</code> request against a Datasette page and, if it corresponds to a table or view, you'll get back a <code>Link: ... rel="alternate"</code> header pointing to the JSON version of that page.</p>
<p>Here's an example using <code>curl</code>:</p>
<pre><code>~ % curl -I https://latest.datasette.io/fixtures/facetable
HTTP/2 200
link: https://latest.datasette.io/fixtures/facetable.json; rel="alternate"; type="application/json+datasette"
cache-control: max-age=5
referrer-policy: no-referrer
access-control-allow-origin: *
access-control-allow-headers: Authorization
access-control-expose-headers: Link
content-type: text/html; charset=utf-8
</code></pre>
<p>I'm looking forward to building interesting features against this in the future.</p>
<blockquote>
<ul>
<li>Canned queries are now shown at the top of the database page, directly below the SQL editor. Previously they were shown at the bottom, below the list of tables. (<a href="https://github.com/simonw/datasette/issues/1612">#1612</a>)</li>
</ul>
</blockquote>
<p>Jacob Fenton <a href="https://github.com/simonw/datasette/issues/1612">suggested this</a>. Canned queries were previously listed at the very bottom of <a href="https://latest.datasette.io/fixtures">the database page</a>, below the list of tables. They're now shown at the top. I think this is a big improvement!</p>
<blockquote>
<ul>
<li>Datasette now has a default favicon. (<a href="https://github.com/simonw/datasette/issues/1603">#1603</a>)</li>
</ul>
</blockquote>
<p>I originally created this in Figma, and then <a href="https://til.simonwillison.net/pixelmator/pixel-editing-favicon">hand-edited it in Pixelmator</a>.</p>
<blockquote>
<ul>
<li>
<code>sqlite_stat</code> tables are now hidden by default. (<a href="https://github.com/simonw/datasette/issues/1587">#1587</a>)</li>
<li>SpatiaLite tables <code>data_licenses</code>, <code>KNN</code> and <code>KNN2</code> are now hidden by default. (<a href="https://github.com/simonw/datasette/issues/1601">#1601</a>)</li>
</ul>
</blockquote>
<p>SQLite and SpatiaLite occasionally use automatically created tables to power some of their functionality. These aren't very interesting to regular users, so Datasette omits them from view by default on the homepage.</p>
<p>I have an open issue for <a href="https://github.com/simonw/datasette/issues/1403">Labels explaining what hidden tables are for</a> since the current UI leads to legitimate questions from users who click on the "show hidden tables" link!</p>
<blockquote>
<ul>
<li>SQL query tracing mechanism now works for queries executed in <code>asyncio</code> sub-tasks, such as those created by <code>asyncio.gather()</code>. (<a href="https://github.com/simonw/datasette/issues/1576">#1576</a>)</li>
<li>
<a href="https://docs.datasette.io/en/stable/internals.html#internals-tracer">datasette.tracer</a> mechanism is now documented.</li>
</ul>
</blockquote>
<p>This was the impetus for dropping support for Python 3.6.</p>
<p>The tracer mechanism powers the debugging feature that shows all of the SQL queries that were executed to produce a page (<a href="https://latest-with-plugins.datasette.io/?_trace=1">demo here</a>).</p>
<p>I've been experimenting with ways to run some of these queries in parallel, taking advantage of <code>asyncio</code>. But the tracer mechanism wasn't correctly tracking these, because queries executed in additional <code>asyncio</code> tasks were not being correctly bundled together.</p>
<p>The Python standard library <a href="https://docs.python.org/3/library/contextvars.html">contextvars</a> module provides a neat way to solve this, but it was introduced in Python 3.7. So I finally bit the bullet and dropped 3.6.</p>
<blockquote>
<ul>
<li>Common Datasette symbols can now be imported directly from the top-level <code>datasette</code> package, see <a href="https://docs.datasette.io/en/stable/internals.html#internals-shortcuts">Import shortcuts</a>. Those symbols are <code>Response</code>, <code>Forbidden</code>, <code>NotFound</code>, <code>hookimpl</code>, <code>actor_matches_allow</code>. (<a href="https://github.com/simonw/datasette/issues/957">#957</a>)</li>
</ul>
</blockquote>
<p>This means plugins can now do <code>from datasette import Response</code>, where previously they had to use <code>from datasette.utils.asgi import Response</code>.</p>
<p>I've long been frustrated that Django makes me remember where to import things from - so now Datasette lets the most commonly imported stuff (counted by running grep against my own plugins) from the root of the package.</p>
<blockquote>
<ul>
<li>
<code>/-/versions</code> page now returns additional details for libraries used by SpatiaLite. (<a href="https://github.com/simonw/datasette/issues/1607">#1607</a>)</li>
</ul>
</blockquote>
<p>You can see <a href="https://calands.datasettes.com/-/versions">a demo of that here</a>.</p>
<blockquote>
<ul>
<li>Documentation now links to the <a href="https://datasette.io/tutorials">Datasette Tutorials</a>.</li>
</ul>
</blockquote>
<p>I wrote about these new tutorials <a href="https://simonwillison.net/2022/Feb/27/datasette-tutorials/">a few weeks ago</a>.</p>
<p>And the rest:</p>
<blockquote>
<ul>
<li>Datasette will now also look for SpatiaLite in <code>/opt/homebrew</code> - thanks, Dan Peterson. (<a href="https://github.com/simonw/datasette/pull/1649">#1649</a>)</li>
<li>Fixed bug where <a href="https://docs.datasette.io/en/stable/custom_templates.html#custom-pages">custom pages</a> did not work on Windows. Thanks, Robert Christie. (<a href="https://github.com/simonw/datasette/issues/1545">#1545</a>)</li>
<li>Fixed error caused when a table had a column named <code>n</code>. (<a href="https://github.com/simonw/datasette/issues/1228">#1228</a>)</li>
</ul>
</blockquote>
Datasette 0.60: The annotated release notes2022-01-14T02:30:18+00:002022-01-14T02:30:18+00:00https://simonwillison.net/2022/Jan/14/datasette-060/#atom-series
<p>I released <a href="https://docs.datasette.io/en/stable/changelog.html#v0-60">Datasette 0.60</a> today. It's a big release, incorporating 61 commits and 18 issues. Here are the annotated release notes.</p>
<h4>filters_from_request plugin hook</h4>
<blockquote>
<ul>
<li>New plugin hook: <a href="https://docs.datasette.io/en/stable/plugin_hooks.html#plugin-hook-filters-from-request">filters_from_request(request, database, table, datasette)</a>, which runs on the table page and can be used to support new custom query string parameters that modify the SQL query. (<a href="https://github.com/simonw/datasette/issues/473">#473</a>)</li>
</ul>
</blockquote>
<p>The inspiration for this hook was my ongoing quest to simplify and refactor Datasette's <code>TableView</code>, the most complex page in the project which provides <a href="global-power-plants.datasettes.com/global-power-plants/global-power-plants">an interface</a> for filtering and paginating through a table of data.</p>
<p>The main job of that page is to convert a query string - with things like <code>?country_long=China</code> and <code>&capacity_mw__gt=200</code> in it - into a SQL query.</p>
<p>So I extracted part of that logic out into a new plugin hook. I've already started using it in <a href="https://datasette.io/plugins/datasette-leaflet-freedraw">datasette-leaflet-freedraw</a> to help support filtering a table by drawing on a map, <a href="https://calands.datasettes.com/calands/CPAD_2020a_Units?_sort=id&_freedraw=%7B%22type%22%3A%22MultiPolygon%22%2C%22coordinates%22%3A%5B%5B%5B%5B-119.17969%2C36.59789%5D%2C%5B-119.53125%2C35.46067%5D%2C%5B-120.05859%2C34.958%5D%2C%5B-120.41016%2C34.88593%5D%2C%5B-120.76172%2C35.17381%5D%2C%5B-120.67383%2C36.31513%5D%2C%5B-120.32227%2C36.66842%5D%2C%5B-119.17969%2C36.59789%5D%5D%5D%5D%7D">demo here</a>.</p>
<p>I also used the new hook to refactor Datasette itself. The <a href="https://github.com/simonw/datasette/blob/0.60/datasette/filters.py">filters.py module</a> now registers <code>where_filters()</code>, <code>search_filters()</code> and <code>through_filters()</code> implementations against that hook, to support various core pieces of Datasette functionality.</p>
<h4>Tracing, write API improvements and performance</h4>
<blockquote>
<ul>
<li>The tracing feature now traces write queries, not just read queries. (<a href="https://github.com/simonw/datasette/issues/1568">#1568</a>)</li>
<li>Added two additional methods for writing to the database: <a href="https://docs.datasette.io/en/stable/internals.html#database-execute-write-script">await db.execute_write_script(sql, block=True)</a> and <a href="https://docs.datasette.io/en/stable/internals.html#database-execute-write-many">await db.execute_write_many(sql, params_seq, block=True)</a>. (<a href="https://github.com/simonw/datasette/issues/1570">#1570</a>)</li>
<li>Made several performance improvements to the database schema introspection code that runs when Datasette first starts up. (<a href="https://github.com/simonw/datasette/issues/1555">#1555</a>)</li>
</ul>
</blockquote>
<p>I built a new plugin called <a href="https://datasette.io/plugins/datasette-pretty-traces">datasette-pretty-traces</a> to help with my refactoring. It takes Datasette's existing <a href="https://docs.datasette.io/en/stable/settings.html?highlight=trace#trace-debug">?_trace=1 feature</a>, which dumps out a big blob of JSON at the bottom of the page, and turns it into something that's a bit easier to understand.</p>
<p>The plugin quickly started highlighting all sorts of interesting potential improvements!</p>
<p>After I added tracing to write queries it became apparent that Datasette's schema introspection code - which runs once when the server starts, and then re-runs any time it notices a change to a database schema - was painfully inefficient.</p>
<p>It writes information about the schema into an in-memory database, which I hope to use in the future to power features like search of all attached tables.</p>
<p>I ended up adding two new documented internal methods for speeding up those writes: <code>db.execute_write_script()</code> and <code>db.execute_write_many()</code>. These are now available for plugins to use as well.</p>
<blockquote>
<ul>
<li>The <a href="https://docs.datasette.io/en/stable/internals.html#database-execute-write">db.execute_write()</a> internal method now defaults to blocking until the write operation has completed. Previously it defaulted to queuing the write and then continuing to run code while the write was in the queue. (<a href="https://github.com/simonw/datasette/issues/1579">#1579</a>)</li>
</ul>
</blockquote>
<p>Spending time with code that wrote to the database highlighted a design flaw in Datasette's original write method. I realized that every line of code I had written that used it looked like this:</p>
<pre><span class="pl-s1">db</span>.<span class="pl-en">execute_write</span>(<span class="pl-s">"insert into ..."</span>, <span class="pl-s1">block</span><span class="pl-c1">=</span><span class="pl-c1">True</span>)</pre>
<p>The <code>block=True</code> parameter means "block until the write has completed". Without it, the write goes into a queue and code continues executing whether or not the write has been made.</p>
<p>This was clearly the wrong default. I used GitHub code search to check if changing it would be disruptive - it would not - and made the change. I'm glad I caught this before Datasette 1.0!</p>
<blockquote>
<ul>
<li>Database write connections now execute the <a href="https://docs.datasette.io/en/stable/plugin_hooks.html#plugin-hook-prepare-connection">prepare_connection(conn, database, datasette)</a> plugin hook. (<a href="https://github.com/simonw/datasette/issues/1564">#1564</a>)</li>
</ul>
</blockquote>
<p>I noticed that writes to a database with SpatiaLite were failing with an error, because the SpatiaLite module was not being correctly loaded. This fixes that.</p>
<h4>Faceting</h4>
<p>A bunch of different fixes for Datasette's <a href="https://docs.datasette.io/en/stable/facets.html">Faceting</a> made it into this release:</p>
<blockquote>
<ul>
<li>The number of unique values in a facet is now always displayed. Previously it was only displayed if the user specified <code>?_facet_size=max</code>. (<a href="https://github.com/simonw/datasette/issues/1556">#1556</a>)</li>
<li>Facets of type <code>date</code> or <code>array</code> can now be configured in <code>metadata.json</code>, see <a href="https://docs.datasette.io/en/stable/facets.html#facets-metadata">Facets in metadata.json</a>. Thanks, David Larlet. (<a href="https://github.com/simonw/datasette/issues/1552">#1552</a>)</li>
<li>New <code>?_nosuggest=1</code> parameter for table views, which disables facet suggestion. (<a href="https://github.com/simonw/datasette/issues/1557">#1557</a>)</li>
<li>Fixed bug where <code>?_facet_array=tags&_facet=tags</code> would only display one of the two selected facets. (<a href="https://github.com/simonw/datasette/issues/625">#625</a>)</li>
</ul>
</blockquote>
<h4>Other, smaller changes</h4>
<blockquote>
<ul>
<li>The <code>Datasette()</code> constructor no longer requires the <code>files=</code> argument, and is now documented at <a href="https://docs.datasette.io/en/stable/internals.html#internals-datasette">Datasette class</a>. (<a href="https://github.com/simonw/datasette/issues/1563">#1563</a>)</li>
</ul>
</blockquote>
<p>A tiny usability improvement, mainly for tests. It means you can write a test that looks like this:</p>
<pre><span class="pl-k">import</span> <span class="pl-s1">pytest</span>
<span class="pl-k">from</span> <span class="pl-s1">datasette</span>.<span class="pl-s1">app</span> <span class="pl-k">import</span> <span class="pl-v">Datasette</span>
<span class="pl-en">@<span class="pl-s1">pytest</span>.<span class="pl-s1">mark</span>.<span class="pl-s1">asyncio</span></span>
<span class="pl-k">async</span> <span class="pl-k">def</span> <span class="pl-en">test_datasette_homepage</span>():
<span class="pl-s1">ds</span> <span class="pl-c1">=</span> <span class="pl-v">Datasette</span>()
<span class="pl-s1">response</span> <span class="pl-c1">=</span> <span class="pl-k">await</span> <span class="pl-s1">ds</span>.<span class="pl-s1">client</span>.<span class="pl-en">get</span>(<span class="pl-s">"/"</span>)
<span class="pl-k">assert</span> <span class="pl-s">"<title>Datasette"</span> <span class="pl-c1">in</span> <span class="pl-s1">response</span>.<span class="pl-s1">text</span></pre>
<p>Previously the <code>files=</code> argument was required, so you would have to use <code>Datasette(files=[])</code>.</p>
<blockquote>
<ul>
<li>The query string variables exposed by <code>request.args</code> will now include blank strings for arguments such as <code>foo</code> in <code>?foo=&bar=1</code> rather than ignoring those parameters entirely. (<a href="https://github.com/simonw/datasette/issues/1551">#1551</a>)</li>
</ul>
</blockquote>
<p>This came out of the refactor - <a href="https://github.com/simonw/datasette/commit/a6ff123de5464806441f6a6f95145c9a83b7f20b">this commit</a> tells the story.</p>
<blockquote>
<ul>
<li>Upgraded Pluggy dependency to 1.0. (<a href="https://github.com/simonw/datasette/issues/1575">#1575</a>)</li>
</ul>
</blockquote>
<p>I needed this because Pluggy 1.0 allows multiple implementations of the same hook to be defined within the same file, like this:</p>
<pre><span class="pl-en">@<span class="pl-en">hookimpl</span>(<span class="pl-s1">specname</span><span class="pl-c1">=</span><span class="pl-s">"filters_from_request"</span>)</span>
<span class="pl-k">def</span> <span class="pl-en">where_filters</span>(<span class="pl-s1">request</span>, <span class="pl-s1">database</span>, <span class="pl-s1">datasette</span>):
<span class="pl-c"># ...</span>
<span class="pl-en">@<span class="pl-en">hookimpl</span>(<span class="pl-s1">specname</span><span class="pl-c1">=</span><span class="pl-s">"filters_from_request"</span>)</span>
<span class="pl-k">def</span> <span class="pl-en">search_filters</span>(<span class="pl-s1">request</span>, <span class="pl-s1">database</span>, <span class="pl-s1">table</span>, <span class="pl-s1">datasette</span>):
<span class="pl-c"># ...</span></pre>
<blockquote>
<ul>
<li>Now using <a href="https://plausible.io/">Plausible analytics</a> for the Datasette documentation.</li>
</ul>
</blockquote>
<p>I <em>really</em> like Plausible as an analytics product. It does a great job of respecting user privacy while still producing useful numbers. It's cookie-free, which means it doesn't trigger a need for GDPR banners in Europe. I'm increasing using it on all of my projects.</p>
<blockquote>
<ul>
<li>New <a href="https://docs.datasette.io/en/stable/cli-reference.html#cli-reference">CLI reference</a> page showing the output of <code>--help</code> for each of the <code>datasette</code> sub-commands. This lead to several small improvements to the help copy. (<a href="https://github.com/simonw/datasette/issues/1594">#1594</a>)</li>
</ul>
</blockquote>
<p>I first built this <a href="https://sqlite-utils.datasette.io/en/stable/cli-reference.html">for sqlite-utils</a> and liked it so much I brought it to Datasette as well. It's generated by <a href="https://nedbatchelder.com/code/cog">cog</a>, using <a href="https://github.com/simonw/datasette/blame/4b23f01f3e668c8f2a2f1a294be49f49b4073969/docs/cli-reference.rst#L9-L35">this inline script</a> in the reStructuredText.</p>
<h4>And the rest</h4>
<blockquote>
<ul>
<li>Label columns detected for foreign keys are now case-insensitive, so <code>Name</code> or <code>TITLE</code> will be detected in the same way as <code>name</code> or <code>title</code>. (<a href="https://github.com/simonw/datasette/issues/1544">#1544</a>)</li>
<li>
<code>explain query plan</code> is now allowed with varying amounts of whitespace in the query. (<a href="https://github.com/simonw/datasette/issues/1588">#1588</a>)</li>
<li>Fixed bug where writable canned queries could not be used with custom templates. (<a href="https://github.com/simonw/datasette/issues/1547">#1547</a>)</li>
<li>Improved fix for a bug where columns with a underscore prefix could result in unnecessary hidden form fields. (<a href="https://github.com/simonw/datasette/issues/1527">#1527</a>)</li>
</ul>
</blockquote>
Datasette 0.59: The annotated release notes2021-10-19T04:59:09+00:002021-10-19T04:59:09+00:00https://simonwillison.net/2021/Oct/19/datasette-059/#atom-series
<p><a href="https://docs.datasette.io/en/stable/changelog.html#v0-59">Datasette 0.59</a> is out, with a miscellaneous grab-bag of improvements. Here are the <a href="https://simonwillison.net/series/datasette-release-notes/">annotated release notes</a>.</p>
<h4>Column metadata</h4>
<blockquote>
<p>Columns can now have associated metadata descriptions in <code>metadata.json</code>, see <a href="https://docs.datasette.io/en/stable/metadata.html#metadata-column-descriptions">Column descriptions</a>. (<a href="https://github.com/simonw/datasette/issues/942">#942</a>)</p>
</blockquote>
<p>I've been wanting this for ages. Tables consist of columns, and column names very rarely give you enough information to truly understand the associated data. You can now drop extra column definitions into your <code>metadata.json</code> like so:</p>
<div class="highlight highlight-source-json"><pre>{
<span class="pl-s"><span class="pl-pds">"</span>databases<span class="pl-pds">"</span></span>: {
<span class="pl-s"><span class="pl-pds">"</span>la-times<span class="pl-pds">"</span></span>: {
<span class="pl-s"><span class="pl-pds">"</span>tables<span class="pl-pds">"</span></span>: {
<span class="pl-s"><span class="pl-pds">"</span>cdph-county-cases-deaths<span class="pl-pds">"</span></span>: {
<span class="pl-s"><span class="pl-pds">"</span>columns<span class="pl-pds">"</span></span>: {
<span class="pl-s"><span class="pl-pds">"</span>county<span class="pl-pds">"</span></span>: <span class="pl-s"><span class="pl-pds">"</span>The name of the county where the agency is based.<span class="pl-pds">"</span></span>,
<span class="pl-s"><span class="pl-pds">"</span>fips<span class="pl-pds">"</span></span>: <span class="pl-s"><span class="pl-pds">"</span>The FIPS code given to the county by the federal government. Can be used to merge with other data sources.<span class="pl-pds">"</span></span>,
<span class="pl-s"><span class="pl-pds">"</span>date<span class="pl-pds">"</span></span>: <span class="pl-s"><span class="pl-pds">"</span>The date when the data were retrieved in ISO 8601 format.<span class="pl-pds">"</span></span>,
<span class="pl-s"><span class="pl-pds">"</span>confirmed_cases<span class="pl-pds">"</span></span>: <span class="pl-s"><span class="pl-pds">"</span>The cumulative number of coronavirus cases that were confirmed as of that time. This is sometimes called the episode date by other sources.<span class="pl-pds">"</span></span>,
<span class="pl-s"><span class="pl-pds">"</span>reported_cases<span class="pl-pds">"</span></span>: <span class="pl-s"><span class="pl-pds">"</span>The cumulative number of coronavirus cases that were reported as of that time. This reflects when cases were first announced by the state.<span class="pl-pds">"</span></span>,
<span class="pl-s"><span class="pl-pds">"</span>probable_cases<span class="pl-pds">"</span></span>: <span class="pl-s"><span class="pl-pds">"</span>The cumulative number of probable coronavirus cases that were confirmed as of that time. This reflects the results of antigen tests, a rapid testing technique different from the standard test.<span class="pl-pds">"</span></span>,
<span class="pl-s"><span class="pl-pds">"</span>reported_and_probable_cases<span class="pl-pds">"</span></span>: <span class="pl-s"><span class="pl-pds">"</span>The cumulative number of reported and probable coronavirus cases as of that time.<span class="pl-pds">"</span></span>,
<span class="pl-s"><span class="pl-pds">"</span>reported_deaths<span class="pl-pds">"</span></span>: <span class="pl-s"><span class="pl-pds">"</span>The cumulative number of deaths reported at that time.<span class="pl-pds">"</span></span>
}
}
}
}
}
}</pre></div>
<p>The LA Times publish <a href="https://github.com/datadesk/california-coronavirus-data/blob/master/README.md#data-dictionary">a meticulous data dictionary</a> for the 46 CSV and GeoJSON files that they maintain tracking the pandemic in their <a href="https://github.com/datadesk/california-coronavirus-data">california-coronavirus-data</a> GitHub repository.</p>
<p>To demonstrate the new column metadata feature, I wrote <a href="https://github.com/simonw/covid-19-datasette/blob/main/build_la_times_metadata.py">a script</a> that converts their Markdown data dictionary into Datasette's metadata format and publishes it along with imported CSV data from their repository.</p>
<p>You can explore the result at <a href="https://covid-19.datasettes.com/la-times">covid-19.datasettes.com/la-times</a> - here's their <a href="https://covid-19.datasettes.com/la-times/cdcr-prison-totals?_sort_desc=date">cdcr-prison-totals</a> table tracking Covid cases in prisons operated by the California Department of Corrections and Rehabilitation.</p>
<p><img src="https://static.simonwillison.net/static/2021/cdcr-prison-totals.png" style="max-width:100%;" /></p>
<h4>register_commands() plugin hook</h4>
<blockquote>
<p>New <a href="https://docs.datasette.io/en/stable/plugin_hooks.html#plugin-hook-register-commands">register_commands()</a> plugin hook allows plugins to register additional Datasette CLI commands, e.g. <code>datasette mycommand file.db</code>. (<a href="https://github.com/simonw/datasette/issues/1449">#1449</a>)</p>
</blockquote>
<p>I originally built this because I thought I would need it for <a href="https://datasette.io/desktop">Datasette Desktop</a>, then found it wasn't necessary for that project after all.</p>
<p>I held off on implementing this for quite a while on the basis that plugins which needed their own CLI interface could implement one entirely separately as a Click-based CLI app - <a href="https://datasette.io/tools/dogsheep-beta">dogsheep-beta</a> implements this pattern, offering both a <code>dogsheep-beta index</code> command and registering itself as a Datasette plugin.</p>
<p>The problem with plugins implementing their own separate CLI commands is that users then need to understand where they have been installed. Datasette is designed to work well with virtual environments, but if a plugin is installed into a virtual environment I can't guarantee that any CLI tools it includes will execute when the user types their name.</p>
<p>Now that plugins can register <code>datasette subcommand</code> subcommands this problem has a solution: provided users can run <code>datasette</code> they'll also be able to run CLI commands provided by plugins, without needing to understand how to modify their path.</p>
<p>I expect this to be particularly useful for Datasette installed through Homebrew, which invisibly sets up its own virtual environment into which plugins can be installed using <code>datasette install plugin-name</code>.</p>
<h4>Count unique facet values with ?_facet_size=max</h4>
<blockquote>
<p>Adding <code>?_facet_size=max</code> to a table page now shows the number of unique values in each facet. (<a href="https://github.com/simonw/datasette/issues/1423">#1423</a>)</p>
</blockquote>
<p>When I'm using facets to explore data, I'm often interested in how many values are available in the facet - particularly if I'm faceting by a column such as country or state.</p>
<p>I added a <code>...</code> link to show the maximim number of facets in <a href="https://docs.datasette.io/en/stable/changelog.html#v0-57">Datasette 0.57</a>. Clicking that link adds <code>?_facet_size=max</code> to the URL, which now also adds a numeric count of the number of distinct facet values.</p>
<p>Here's <a href="https://covid-19.datasettes.com/la-times/cdcr-prison-totals?_sort_desc=date&_facet=name&_facet_size=max#facet-name">an example</a> using that Californian prison data.</p>
<p><img src="https://static.simonwillison.net/static/2021/facet-max-size-count.png" alt="Facet by name shows 35 values" style="max-width:100%;" /></p>
<h4>Upgrading httpx</h4>
<blockquote>
<p>Upgraded dependency <a href="https://github.com/encode/httpx/releases/tag/0.20.0">httpx 0.20</a> - the undocumented <code>allow_redirects=</code> parameter to <a href="https://docs.datasette.io/en/stable/internals.html#internals-datasette-client">datasette.client</a> is now <code>follow_redirects=</code>, and defaults to <code>False</code> where it previously defaulted to <code>True</code>. (<a href="https://github.com/simonw/datasette/issues/1488">#1488</a>)</p>
</blockquote>
<p>A while ago Tom Christie <a href="https://github.com/encode/httpx/discussions/1785">requested feedback</a> on how <code>httpx</code> should handle redirects.</p>
<p>The <code>requests</code> library that inspired it automatically follows 301 and 302 redirects unless you explicitly tell it not to.</p>
<p>I've been caught out by this many times in the past - it's not default behaviour that I want from my HTTP client library - so I chimed in as favouring a change in behaviour. I also suggested that <code>follow_redirects=True</code> would be a better term for it than <code>allow_redirects=True</code>.</p>
<p>Tom made that change for <code>httpx 1.0</code>, and then back-ported it for <a href="https://github.com/encode/httpx/releases/tag/0.20.0">version 0.20</a> - after all, pre-1.0 you're allowed to make breaking changes like this.</p>
<p>... and <a href="https://github.com/simonw/datasette/issues/1488">Datasette broke, hard</a>! Datasette embeds <code>httpx</code> pretty deeply inside itself, and the breaking change caused all kinds of errors and test failures.</p>
<p>This was the final push I needed to get 0.59 released.</p>
<p>(I <a href="https://github.com/simonw/datasette/issues/1493">just found out</a> this also broke the Homebrew tests for Datasette, as those relied on <code>datasette --get '/:memory:.json?sql=select+3*5'</code> automatically following the redirect to <code>/_memory.json?sql=select+3*5</code> instead.)</p>
<blockquote>
<p>Code that figures out which named parameters a SQL query takes in order to display form fields for them is no longer confused by strings that contain colon characters. (<a href="https://github.com/simonw/datasette/issues/1421">#1421</a>)</p>
</blockquote>
<p>One of my favourite obscure features of Datasette is the way it can take a SQL query such as the following:</p>
<div class="highlight highlight-source-sql"><pre><span class="pl-k">select</span> <span class="pl-k">*</span> <span class="pl-k">from</span> [cdcr<span class="pl-k">-</span>prison<span class="pl-k">-</span>totals]
<span class="pl-k">where</span> <span class="pl-s"><span class="pl-pds">"</span>zipcode<span class="pl-pds">"</span></span> <span class="pl-k">=</span> :zip</pre></div>
<p>And extract out that <code>:zip</code> parameter and turn it into an HTML form field, <a href="https://covid-19.datasettes.com/la-times?sql=select+*+from+%5Bcdcr-prison-totals%5D+where+%22zipcode%22+%3D+%3Azip&zip=93212">as seen here</a>:</p>
<p><img src="https://static.simonwillison.net/static/2021/zip-form-sql.png" alt="A form showing the SQL query with a separate field for entering the zip" style="max-width:100%;" /></p>
<p>This used to use a regular expression, which meant that it could be confused by additional colons - the following SQL query for example:</p>
<div class="highlight highlight-source-sql"><pre><span class="pl-k">select</span> <span class="pl-k">*</span> <span class="pl-k">from</span> content
<span class="pl-k">where</span> created_time <span class="pl-k">=</span> <span class="pl-s"><span class="pl-pds">'</span>07:00<span class="pl-pds">'</span></span>
<span class="pl-k">and</span> author <span class="pl-k">=</span> :author</pre></div>
<p>I thought that solving this properly would require embedding a full SQLite-compatible SQL parser in Datasette.</p>
<p>Then I realized SQLite's <code>explain</code> output included exactly the data I needed, <a href="https://latest.datasette.io/fixtures?sql=explain+select+id%2C+name+from+facet_cities+where+%22name%22+%3D+%3Aname&name=Los+Angeles">for example</a>:</p>
<table>
<thead>
<tr>
<th>addr</th>
<th>opcode</th>
<th>p1</th>
<th>p2</th>
<th>p3</th>
<th>p4</th>
<th>p5</th>
<th>comment</th>
</tr>
</thead>
<tbody>
<tr>
<td>0</td>
<td>Init</td>
<td>0</td>
<td>10</td>
<td>0</td>
<td></td>
<td>0</td>
<td></td>
</tr>
<tr>
<td>1</td>
<td>OpenRead</td>
<td>0</td>
<td>42</td>
<td>0</td>
<td>2</td>
<td>0</td>
<td></td>
</tr>
<tr>
<td>2</td>
<td>Rewind</td>
<td>0</td>
<td>9</td>
<td>0</td>
<td></td>
<td>0</td>
<td></td>
</tr>
<tr>
<td>3</td>
<td>Column</td>
<td>0</td>
<td>1</td>
<td>1</td>
<td></td>
<td>0</td>
<td></td>
</tr>
<tr>
<td>4</td>
<td>Ne</td>
<td>2</td>
<td>8</td>
<td>1</td>
<td>BINARY-8</td>
<td>82</td>
<td></td>
</tr>
<tr>
<td>5</td>
<td>Rowid</td>
<td>0</td>
<td>3</td>
<td>0</td>
<td></td>
<td>0</td>
<td></td>
</tr>
<tr>
<td>6</td>
<td>Column</td>
<td>0</td>
<td>1</td>
<td>4</td>
<td></td>
<td>0</td>
<td></td>
</tr>
<tr>
<td>7</td>
<td>ResultRow</td>
<td>3</td>
<td>2</td>
<td>0</td>
<td></td>
<td>0</td>
<td></td>
</tr>
<tr>
<td>8</td>
<td>Next</td>
<td>0</td>
<td>3</td>
<td>0</td>
<td></td>
<td>1</td>
<td></td>
</tr>
<tr>
<td>9</td>
<td>Halt</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td></td>
<td>0</td>
<td></td>
</tr>
<tr>
<td>10</td>
<td>Transaction</td>
<td>0</td>
<td>0</td>
<td>35</td>
<td>0</td>
<td>1</td>
<td></td>
</tr>
<tr>
<td>11</td>
<td>Variable</td>
<td>1</td>
<td>2</td>
<td>0</td>
<td>:name</td>
<td>0</td>
<td></td>
</tr>
<tr>
<td>12</td>
<td>Goto</td>
<td>0</td>
<td>1</td>
<td>0</td>
<td></td>
<td>0</td>
<td></td>
</tr>
</tbody>
</table>
<p>The trick then is to run an explain, then find any rows with an <code>opcode</code> of <code>Variable</code> and read the <code>p4</code> register to find out the name of those variables.</p>
<p>This is risky, since SQLite makes no promises about the stability of the explain output - but it's such a useful trick that I'm now contemplating building an automated test suite around it such that if a future SQLite release breaks things I will at least know about it promptly.</p>
<h4>Everything else</h4>
<blockquote>
<p>The <code>--cors</code> option now causes Datasette to return the <code>Access-Control-Allow-Headers: Authorization</code> header, in addition to <code>Access-Control-Allow-Origin: *</code>. (<a href="https://github.com/simonw/datasette/pull/1467">#1467</a>)</p>
</blockquote>
<p>This was a feature request from users of the <a href="https://datasette.io/plugins/datasette-auth-tokens">datasette-auth-tokens</a> plugin.</p>
<blockquote>
<p>Renamed <code>--help-config</code> option to <code>--help-settings</code>. (<a href="https://github.com/simonw/datasette/issues/1431">#1431</a>)</p>
</blockquote>
<p>Part of my continuing goal to free up the term "config" to mean plugin configuration (which is currently mixed up with Datasette's metadata concept) rather than meaning the options that can be passed to the Datasette CLI tool (now called <a href="https://docs.datasette.io/en/stable/settings.html">settings</a>).</p>
<blockquote>
<p><code>datasette.databases</code> property is now a documented API. (<a href="https://github.com/simonw/datasette/issues/1443">#1443</a>)</p>
</blockquote>
<p>I've got into the habit of documenting any Datasette internals that I use in a plugin. In this case I needed it <a href="https://github.com/simonw/datasette-block-robots/commit/f947bcc7e56de3eedd7c402d10888ff59babb905">for datasette-block-robots</a>.</p>
<blockquote>
<p>The <code>base.html</code> template now wraps everything other than the <code><footer></code> in a <code><div class="not-footer"></code> element, to help with advanced CSS customization. (<a href="https://github.com/simonw/datasette/issues/1446">#1446</a>)</p>
</blockquote>
<p>I made this change so that Datasette Desktop could more easily implement <a href="https://github.com/simonw/datasette-app-support/blob/0.11.5/datasette_app_support/static/sticky-footer.css">a sticky footer</a> that stuck to the bottom of the application window no matter how short or tall it was.</p>
<blockquote>
<p>The <a href="https://docs.datasette.io/en/stable/plugin_hooks.html#plugin-hook-render-cell">render_cell()</a> plugin hook can now return an awaitable function. This means the hook can execute SQL queries. (<a href="https://github.com/simonw/datasette/issues/1425">#1425</a>)</p>
</blockquote>
<p>Another example of the <a href="https://simonwillison.net/2020/Sep/2/await-me-maybe/">await me maybe</a> pattern in action.</p>
<blockquote>
<p><a href="https://docs.datasette.io/en/stable/plugin_hooks.html#plugin-register-routes">register_routes(datasette)</a> plugin hook now accepts an optional <code>datasette</code> argument. (<a href="https://github.com/simonw/datasette/issues/1404">#1404</a>)</p>
</blockquote>
<p>This means plugins can conditionally register routes based on plugin configuration.</p>
<blockquote>
<p>New <code>hide_sql</code> canned query option for defaulting to hiding the SQL query used by a canned query, see <a href="https://docs.datasette.io/en/stable/sql_queries.html#canned-queries-options">Additional canned query options</a>. (<a href="https://github.com/simonw/datasette/issues/1422">#1422</a>)</p>
</blockquote>
<p>The goal of canned queries is to provide an interface for people who don't know SQL to execute queries written by other people, potentially providing their own inputs using the <code>:parameter</code> mechanism described above.</p>
<p>If it's being used for that, there's not much to be gained from making them scroll past the SQL query first!</p>
<p>Adding <code>"hide_sql": true</code> to the canned query configuration now defaults to hiding the query for them - though they can still click "Show SQL" to see it.</p>
<blockquote>
<p>New <code>--cpu</code> option for <a href="https://docs.datasette.io/en/stable/publish.html#publish-cloud-run">datasette publish cloudrun</a>. (<a href="https://github.com/simonw/datasette/issues/1420">#1420</a>)</p>
</blockquote>
<p>Google Cloud Run recently <a href="https://cloud.google.com/run/docs/configuring/cpu">added the ability</a> to specify 1, 2 or 4 vCPUs when running a deploy.</p>
<blockquote>
<p>If <a href="https://github.com/willmcgugan/rich">Rich</a> is installed in the same virtual environment as Datasette, it will be used to provide enhanced display of error tracebacks on the console. (<a href="https://github.com/simonw/datasette/issues/1416">#1416</a>)</p>
</blockquote>
<p><a href="https://github.com/willmcgugan/rich">Rich</a> is Will McGugan's phenomenal Python library for building beautiful console interfaces. One of the many tricks up its sleeve is improved display of exceptions, including more detailed tracebacks and local variables. Datasette now takes advantage of this if Rich is installed in the same virtual environment.</p>
<blockquote>
<p><code>datasette.utils</code> <a href="https://docs.datasette.io/en/stable/internals.html#internals-utils-parse-metadata">parse_metadata(content)</a> function, used by the new <a href="https://datasette.io/plugins/datasette-remote-metadata">datasette-remote-metadata plugin</a>, is now a documented API. (<a href="https://github.com/simonw/datasette/issues/1405">#1405</a>)</p>
</blockquote>
<p>Another API that became documented after I used it in a plugin.</p>
<p>The <a href="https://datasette.io/plugins/datasette-remote-metadata">datasette-remote-metadata</a> plugin is pretty neat.</p>
<p>I sometimes find myself working on projects where I'm deploying a large database file - 1 or 2 GB - to Google Cloud Run. Each deploy can take several minutes.</p>
<p>If I want to tweak a canned query or a few lines of text in the metadata for that deployment, it can be frustrating to have to push an entirely new deploy just to make those changes.</p>
<p>The remote metadata plugin allows me to host the metadata at a separate URL, which I can then update without needing a full deploy of the underlying database files.</p>
<p>The last two were simple bug fixes:</p>
<blockquote>
<ul>
<li>Fixed bug where <code>?_next=x&_sort=rowid</code> could throw an error. (<a href="https://github.com/simonw/datasette/issues/1470">#1470</a>)</li>
<li>Column cog menu no longer shows the option to facet by a column that is already selected by the default facets in metadata. (<a href="https://github.com/simonw/datasette/issues/1469">#1469</a>)</li>
</ul>
</blockquote>
<h4>Releases this week</h4>
<ul>
<li>
<strong><a href="https://github.com/simonw/datasette-publish-vercel">datasette-publish-vercel</a></strong>: <a href="https://github.com/simonw/datasette-publish-vercel/releases/tag/0.11">0.11</a> - (<a href="https://github.com/simonw/datasette-publish-vercel/releases">17 releases total</a>) - 2021-10-18
<br />Datasette plugin for publishing data using Vercel</li>
<li>
<strong><a href="https://github.com/simonw/datasette-statistics">datasette-statistics</a></strong>: <a href="https://github.com/simonw/datasette-statistics/releases/tag/0.2">0.2</a> - (<a href="https://github.com/simonw/datasette-statistics/releases">3 releases total</a>) - 2021-10-15
<br />SQL statistics functions for Datasette</li>
<li>
<strong><a href="https://github.com/simonw/datasette-auth-tokens">datasette-auth-tokens</a></strong>: <a href="https://github.com/simonw/datasette-auth-tokens/releases/tag/0.3">0.3</a> - (<a href="https://github.com/simonw/datasette-auth-tokens/releases">7 releases total</a>) - 2021-10-15
<br />Datasette plugin for authenticating access using API tokens</li>
<li>
<strong><a href="https://github.com/simonw/datasette">datasette</a></strong>: <a href="https://github.com/simonw/datasette/releases/tag/0.59">0.59</a> - (<a href="https://github.com/simonw/datasette/releases">98 releases total</a>) - 2021-10-14
<br />An open source multi-tool for exploring and publishing data</li>
</ul>
<h4>TIL this week</h4>
<ul>
<li><a href="https://til.simonwillison.net/python/fabric-ssh-key">Using Fabric with an SSH public key</a></li>
<li><a href="https://til.simonwillison.net/python/sqlite-in-pyodide">Using the sqlite3 Python module in Pyodide - Python WebAssembly</a></li>
</ul>
Datasette Desktop 0.2.0: The annotated release notes2021-09-13T23:30:24+00:002021-09-13T23:30:24+00:00https://simonwillison.net/2021/Sep/13/datasette-desktop-2/#atom-series
<p><a href="https://datasette.io/desktop">Datasette Desktop</a> is a new macOS desktop application version of <a href="https://datasette.io/">Datasette</a>, an "open source multi-tool for exploring and publishing data" built on top of SQLite. I released the first version <a href="https://simonwillison.net/2021/Sep/8/datasette-desktop/">last week</a> - I've just released version 0.2.0 (and a 0.2.1 bug fix) with a whole bunch of critical improvements.</p>
<p>You can see the <a href="https://github.com/simonw/datasette-app/releases/tag/0.2.0">release notes for 0.2.0 here</a>, but as I've done <a href="https://simonwillison.net/series/datasette-release-notes/">with Datasette in the past</a> I've decided to present an annotated version of those release notes providing further background on each of the new features.</p>
<h4>The plugin directory</h4>
<blockquote>
<p>A new <strong>plugin directory</strong> for installing new plugins and upgrading or uninstalling existing ones. Open it using the "Plugins -> Install and Manage Plugins..." menu item. <a href="https://github.com/simonw/datasette-app/issues/74">#74</a></p>
</blockquote>
<p><img src="https://static.simonwillison.net/static/2021/plugin-directory.gif" alt="Demo showing installing and upgrading a plugin" style="max-width:100%;" /></p>
<p>This was the main focus for the release. <a href="https://datasette.io/plugins">Plugins</a> are a key component of both Datasette and Datasette Desktop: my goal is for Datasette to provide a robust core for exploring databases, with a wide array of plugins that support any additional kind of visualization, exploration or data manipulation capability that a user might want.</p>
<p>Datasette Desktop goes as far as bundling <a href="https://simonwillison.net/2021/Sep/8/datasette-desktop/#how-the-app-works">an entire standalone Python installation</a> just to ensure that plugins will work correctly, and invisibly sets up a dedicated Python virtual environment for plugins to install into when you first run the application.</p>
<p>The first version of the app allowed users to install plugins by pasting their name into a text input field. Version 0.2.0 is a whole lot more sophisticated: the single input field has been replaced by a full plugin directory interface that shows installed v.s. available plugins and provides "Install", "Upgrade" and "Uninstall" buttons depending on the state of the plugin.</p>
<p>When I set out to build this I knew I wanted to hit <a href="https://datasette.io/content/plugins.json?_shape=array">this JSON API</a> on <a href="https://datasette.io/">datasette.io</a> to fetch the list of plugins, and I knew I wanted a simple searchable index page. The I realized I also wanted faceted search, so I could filter for installed vs not-yet-installed plugins.</p>
<p>Datasette's built-in table interface already implements faceted search! So I decided to use that, with <a href="https://github.com/simonw/datasette-app-support/tree/0.11.5/datasette_app_support/templates">some custom templates</a> to add the install buttons and display the plugins in a more suitable format.</p>
<p>The first challenge was getting the latest list of plugins into my Datasette instance. I built this into the <code>datasette-app-support</code> plugin using the <a href="https://docs.datasette.io/en/stable/plugin_hooks.html#startup-datasette">startup() plugin hook</a> - every time the server starts up it hits that API and <a href="https://github.com/simonw/datasette-app-support/blob/ee8a05ba1dd55734d2b99c9bd774ebb8e9790d7c/datasette_app_support/__init__.py#L17-L62">populates an in-memory table</a> with the returned data.</p>
<p>The data from the API is then extended with four extra columns:</p>
<ul>
<li>
<code>"installed"</code> is set to "installed" or "not installed" depending on whether the plugin has already been installed by the user</li>
<li>
<code>"Installed_version"</code> is the currently installed version of the plugin</li>
<li>
<code>"upgrade"</code> is the string "upgrade available" or None - allowing the user to filter for just plugins that can be upgraded</li>
<li>
<code>"default"</code> is set to 1 if the plugin is a default plugin that came with Datasette</li>
</ul>
<p>The data needed to build the plugin table is gathered by these three lines of code:</p>
<pre><span class="pl-s1">plugins</span> <span class="pl-c1">=</span> <span class="pl-s1">httpx</span>.<span class="pl-en">get</span>(
<span class="pl-s">"https://datasette.io/content/plugins.json?_shape=array"</span>
).<span class="pl-en">json</span>()
<span class="pl-c"># Annotate with list of installed plugins</span>
<span class="pl-s1">installed_plugins</span> <span class="pl-c1">=</span> {
<span class="pl-s1">plugin</span>[<span class="pl-s">"name"</span>]: <span class="pl-s1">plugin</span>[<span class="pl-s">"version"</span>]
<span class="pl-k">for</span> <span class="pl-s1">plugin</span> <span class="pl-c1">in</span> (<span class="pl-k">await</span> <span class="pl-s1">datasette</span>.<span class="pl-s1">client</span>.<span class="pl-en">get</span>(<span class="pl-s">"/-/plugins.json"</span>)).<span class="pl-en">json</span>()
}
<span class="pl-s1">default_plugins</span> <span class="pl-c1">=</span> (<span class="pl-s1">os</span>.<span class="pl-s1">environ</span>.<span class="pl-en">get</span>(<span class="pl-s">"DATASETTE_DEFAULT_PLUGINS"</span>) <span class="pl-c1">or</span> <span class="pl-s">""</span>).<span class="pl-en">split</span>()</pre>
<p>The first line fetches the full list of known plugins from the Datasette <a href="https://datasette.io/plugins">plugin directory</a></p>
<p>The second makes an internal API call to the Datasette <code>/-/plugins.json</code> endpoint using the <a href="https://docs.datasette.io/en/stable/internals.html#datasette-client">datasette.client mechanism</a> to discover what plugins are currently installed and their versions.</p>
<p>The third line loads a space-separated list of default plugins from the <code>DATASETTE_DEFAULT_PLUGINS</code> environment variable.</p>
<p>That last one deserves further explanation. Datasette Desktop now ships with some default plugins, and the point of truth for what those are <a href="https://github.com/simonw/datasette-app/blob/0.2.0/main.js#L34-L42">lives in the Electron app codebase</a> - because that's where the code responsible for installing them is.</p>
<blockquote>
<p>Five plugins are now installed by default: <a href="https://datasette.io/plugins/datasette-vega">datasette-vega</a>, <a href="https://datasette.io/plugins/datasette-cluster-map">datasette-cluster-map</a>, <a href="https://datasette.io/plugins/datasette-pretty-json">datasette-pretty-json</a>, <a href="https://datasette.io/plugins/datasette-edit-schema">datasette-edit-schema</a> and <a href="https://datasette.io/plugins/datasette-configure-fts">datasette-configure-fts</a>. <a href="https://github.com/simonw/datasette-app/issues/81">#81</a></p>
</blockquote>
<p>The plugin directory needs to know what these defaults are so it can avoid showing the "uninstall" button for those plugins. Uninstalling them currently makes no sense because Datasette Desktop installs any missing dependencies when the app starts, which would instantly undo the user's uninstall action decision.</p>
<p>An environment variable felt like the most straight-forward way to expose that list of default plugins to the underlying Datasette server!</p>
<p>I plan to make default plugins uninstallable in the future but doing so require a mechanism for persisting user preference state which I haven't built yet (see <a href="https://github.com/simonw/datasette-app/issues/101">issue #101</a>).</p>
<h4>A log on the loading screen</h4>
<blockquote>
<p>The application <strong>loading screen</strong> now shows a log of what is going on. <a href="https://github.com/simonw/datasette-app/issues/70">#70</a></p>
</blockquote>
<p>The first time you launch the Datasette Desktop application it creates a virtual environment and installs <a href="https://github.com/simonw/datasette">datasette</a>, <a href="https://github.com/simonw/datasette-app-support">datasette-app-support</a> and the five default plugins (plus their dependencies) into that environment.</p>
<p><img src="https://static.simonwillison.net/static/2021/datasette-launch-log.gif" alt="Animated demo of the Datasette Desktop launch screen showing the log scrolling past" style="max-width:100%;" /></p>
<p>This can take quite a few seconds, during which the original app would show an indeterminate loading indicator.</p>
<p>Personally I hate loading indicators which don't show the difference between something that's working and something that's eternally hung. Since I can't estimate how long it will take, I decided to pipe the log of what the <code>pip install</code> command is doing to the loading screen itself.</p>
<p>For most users this will be meaningless, but hopefully will help communicate "I'm installing extra stuff that I need". Advanced users may find this useful though, especially for bug reporting if something goes wrong.</p>
<p>Under the hood <a href="https://github.com/simonw/datasette-app/commit/e0c899e422e60b43866551fd776b86a954deb94d">I implemented this</a> using a Node.js <a href="https://nodejs.org/api/events.html#events_class_eventemitter">EventEmitter</a>. I use the same trick to forward server log output to the "Debug -> Show Sever Log" interface.</p>
<h4>Example CSV files</h4>
<blockquote>
<p>The welcome screen now invites you to try out the application by opening interesting <strong>example CSV files</strong>, taking advantage of the new "File -> Open CSV from URL..." feature. <a href="https://github.com/simonw/datasette-app/issues/91">#91</a></p>
</blockquote>
<p>Previously Datasette Desktop wouldn't do anything at all until you opened up a CSV or SQLite database, and I have a hunch that unlike me most people don't have good examples of those to hand at all times!</p>
<p>The new welcome screen offers example CSV files that can be opened directly from the internet. I implemented this using a new API at <a href="https://datasette.io/content/example_csvs">datasette.io/content/example_csvs</a> (add <code>.json</code> for the JSON version) which is loaded by code running on that welcome page.</p>
<p>I have two examples at the moment, for <a href="https://www.thesquirrelcensus.com/">the Squirrel Census</a> and the <a href="https://data.london.gov.uk/dataset/animal-rescue-incidents-attended-by-lfb">London Fire Brigade's animal rescue data</a>. I'll be adding more in the future.</p>
<p>The API itself is a great example of the <a href="https://simonwillison.net/2021/Jul/28/baked-data/">Baked Data architectural pattern</a> in action: the data itself is stored in <a href="https://github.com/simonw/datasette.io/blob/main/example_csvs.yml">this hand-edited YAML file</a>, which is compiled to SQLite every time the site is deployed.</p>
<p>To get this feature working I added a new "Open CSV from URL" capability to the app, which is also available in the File menu. Under the hood this works by passing the provided URL to the new <code>/-/open-csv-from-url</code> API endpoint. The implementation of this <a href="https://github.com/simonw/datasette-app-support/blob/0.11.5/datasette_app_support/utils.py#L52-L88">was surprisingly fiddly</a> as I wanted to consume the CSV file using an asynchronous HTTP client - I ended up using an adaption of <a href="https://github.com/mosquito/aiofile/blob/3.5.1/README.rst#async-csv-dict-reader">some example code</a> from the <a href="https://github.com/mosquito/aiofile">aiofile</a> README.</p>
<h4>Recently opened files and "Open with Datasette"</h4>
<blockquote>
<p><strong>Recently opened</strong> <code>.db</code> and <code>.csv</code> files can now be accessed from the new "File -> Open Recent" menu. Thanks, <a href="https://github.com/mnckapilan">Kapilan M</a>! <a href="https://github.com/simonw/datasette-app/issues/54">#54</a></p>
</blockquote>
<p>This was the project's first external contribution! Kapilan M <a href="https://github.com/simonw/datasette-app/pull/77">figured out a way</a> to hook into the macOS "recent files" mechanism from Electron, and I expanded that to cover SQLite database in addition to CSV files.</p>
<p>When a recent file is selected, Electron fires <a href="https://www.electronjs.org/docs/api/app#event-open-file-macos">the "open-file" event</a>. This same event is fired when a file is opened using "Open With -> Datasette" or dragged onto the application's dock.</p>
<p>This meant I needed to tell the difference between a CSV or a SQLite database file, which I do by <a href="https://github.com/simonw/datasette-app/blob/0.2.0/main.js#L95-L101">checking if</a> the first 16 bytes of the file match the SQLite header of <code>SQLite format 3\0</code>.</p>
<blockquote>
<p><strong>.db and .csv files</strong> can now be opened in Datasette starting from the Finder using "Right Click -> Open With -> Datasette". <a href="https://github.com/simonw/datasette-app/issues/40">#40</a></p>
</blockquote>
<p>Registering Datasette as a file handler for <code>.csv</code> and <code>.db</code> was not at all obvious. It turned out to involve adding the following to the Electron app's <a href="https://github.com/simonw/datasette-app/blob/0.2.0/package.json#L13-L28">package.json file</a>:</p>
<div class="highlight highlight-source-json"><pre> <span class="pl-s"><span class="pl-pds">"</span>build<span class="pl-pds">"</span></span>: {
<span class="pl-s"><span class="pl-pds">"</span>appId<span class="pl-pds">"</span></span>: <span class="pl-s"><span class="pl-pds">"</span>io.datasette.app<span class="pl-pds">"</span></span>,
<span class="pl-s"><span class="pl-pds">"</span>mac<span class="pl-pds">"</span></span>: {
<span class="pl-s"><span class="pl-pds">"</span>category<span class="pl-pds">"</span></span>: <span class="pl-s"><span class="pl-pds">"</span>public.app-category.developer-tools<span class="pl-pds">"</span></span>,
<span class="pl-s"><span class="pl-pds">"</span>extendInfo<span class="pl-pds">"</span></span>: {
<span class="pl-s"><span class="pl-pds">"</span>CFBundleDocumentTypes<span class="pl-pds">"</span></span>: [
{
<span class="pl-s"><span class="pl-pds">"</span>CFBundleTypeExtensions<span class="pl-pds">"</span></span>: [
<span class="pl-s"><span class="pl-pds">"</span>csv<span class="pl-pds">"</span></span>,
<span class="pl-s"><span class="pl-pds">"</span>tsv<span class="pl-pds">"</span></span>,
<span class="pl-s"><span class="pl-pds">"</span>db<span class="pl-pds">"</span></span>
],
<span class="pl-s"><span class="pl-pds">"</span>LSHandlerRank<span class="pl-pds">"</span></span>: <span class="pl-s"><span class="pl-pds">"</span>Alternate<span class="pl-pds">"</span></span>
}
]
}</pre></div>
<h4>The Debug Menu</h4>
<blockquote>
<p>A new <strong>Debug menu</strong> can be enabled using Datasette -> About Datasette -> Enable Debug Menu".</p>
</blockquote>
<p>The debug menu existed previously in development mode, but with 0.2.0 I decided to expose it to end users. I didn't want to show it to people who weren't ready to see it, so you have to first enable it using a button on the about menu.</p>
<p>The most interesting option there is "Run Server Manually".</p>
<p>Most of the time when you are using the app there's a <code>datasette</code> Python server running under the hood, but it's entirely managed by the Node.js <a href="https://nodejs.org/api/child_process.html">child_process</a> module.</p>
<p>When developing the application (or associated plugins) it can be useful to manually run that server rather than having it managed by the app, so you can see more detailed error messages or even add the <code>--pdb</code> option to drop into a debugger should something go wrong.</p>
<p>To run that server, you need the Electron app to kill its own version... and you then need to know things like what port it was running on and which environment variables it was using.</p>
<p>Here's what you see when you click the "Run Server Manually" debug option:</p>
<p><img src="https://static.simonwillison.net/static/2021/run-server-manually.png" alt="Run server manually? Clicking OK will terminate the Datasette server used by this app. Copy this command to a terminal to manually run a replacement" style="max-width:100%;" /></p>
<p>Here's that command in full:</p>
<p><code>DATASETTE_API_TOKEN="0ebb45444ba4cbcbacdbcbb989bb0cd3aa10773c0dfce73c0115868d0cee2afa" DATASETTE_SECRET="4a8ac89d0d269c31d99059933040b4511869c12dfa699a1429ea29ee3310a850" DATASETTE_DEFAULT_PLUGINS="datasette datasette-app-support datasette-vega datasette-cluster-map datasette-pretty-json datasette-edit-schema datasette-configure-fts datasette-leaflet" /Users/simon/.datasette-app/venv/bin/datasette --port 8002 --version-note xyz-for-datasette-app --setting sql_time_limit_ms 10000 --setting max_returned_rows 2000 --setting facet_time_limit_ms 3000 --setting max_csv_mb 0</code></p>
<p>This is a simulation of the command that the app itself used to launch the server. Pasting that into a terminal will produce an exact copy of the original process - and you can add <code>--pdb</code> or other options to further customize it.</p>
<h4>Bonus: Restoring the in-memory database on restart</h4>
<p>This didn't make it into the formal release notes, but it's a fun bug that I fixed in this release.</p>
<p>Datasette Desktop defaults to opening CSV files in an in-memory database. You can import them into an on-disk database too, but if you just want to start exploring CSV data in Datasette I decided an in-memory database would be a better starting point.</p>
<p>There's one problem with this: installing a plugin requires a Datasette server restart, and restarting the server clears the content of that in-memory database, causing any tables created from imported CSVs to disappear. This is confusing!</p>
<p>You can follow my progress on this in issue <a href="https://github.com/simonw/datasette-app/issues/42">#42: If you open a CSV and then install a plugin the CSV table vanishes</a>. I ended up solving it by adding code that dumps the "temporary" in-memory database to a file on disk before a server restart, restarts the server, then copies that disk backup into memory again.</p>
<p>This works using two custom API endpoints added to the <a href="https://github.com/simonw/datasette-app-support">datasette-app-support</a> plugin:</p>
<ul>
<li>
<code>POST /-/dump-temporary-to-file</code> with <code>{"path": "/path/to/backup.db"}</code> dumps the contents of that in-memory temporary database to the specified file.</li>
<li>
<code>POST /-/restore-temporary-from-file</code> with <code>{"path": "/path/to/backup.db"}</code> restors the content back again.</li>
</ul>
<p>These APIs are called from the <a href="https://github.com/simonw/datasette-app/blob/0.2.0/main.js#L189-L221">startOrRestart()</a> method any time the server restarts, using a file path generated by Electron using the following:</p>
<div class="highlight highlight-source-js"><pre><span class="pl-s1">backupPath</span> <span class="pl-c1">=</span> <span class="pl-s1">path</span><span class="pl-kos">.</span><span class="pl-en">join</span><span class="pl-kos">(</span>
<span class="pl-s1">app</span><span class="pl-kos">.</span><span class="pl-en">getPath</span><span class="pl-kos">(</span><span class="pl-s">"temp"</span><span class="pl-kos">)</span><span class="pl-kos">,</span>
<span class="pl-s">`backup-<span class="pl-s1"><span class="pl-kos">${</span><span class="pl-s1">crypto</span><span class="pl-kos">.</span><span class="pl-en">randomBytes</span><span class="pl-kos">(</span><span class="pl-c1">8</span><span class="pl-kos">)</span><span class="pl-kos">.</span><span class="pl-en">toString</span><span class="pl-kos">(</span><span class="pl-s">"hex"</span><span class="pl-kos">)</span><span class="pl-kos">}</span></span>.db`</span>
<span class="pl-kos">)</span><span class="pl-kos">;</span></pre></div>
<p>The file is deleted once it has been restored.</p>
<p>After much experimentation, I <a href="https://github.com/simonw/datasette-app-support/blob/0.11.5/datasette_app_support/__init__.py#L270-L288">ended up using</a> the <code>db.backup(other_connection)</code> method that was added to Python's <code>sqlite3</code> module in Python 3.7. Since Datasette Desktop bundles its own copy of Python 3.9 I don't have to worry about compatibility with older versions at all.</p>
<h4>The rest is in the milestone</h4>
<p>If you want even more detailed notes on what into the release, each new feature is included in the <a href="https://github.com/simonw/datasette-app/milestone/2?closed=1">0.2.0 milestone</a>, accompanied by a detailed issue with screenshots (and even a few videos) plus links to the underlying commits.</p>
Datasette 0.58: The annotated release notes2021-07-16T02:21:11+00:002021-07-16T02:21:11+00:00https://simonwillison.net/2021/Jul/16/datasette-058/#atom-series
<p>I released <a href="https://docs.datasette.io/en/stable/changelog.html#v0-58">Datasette 0.58</a> last night, with new plugin hooks, Unix domain socket support, a major faceting performance fix and a few other improvements. Here are the <a href="https://simonwillison.net/series/datasette-release-notes/">annotated release notes</a>.</p>
<h4>Faceting performance improvement</h4>
<p><a href="https://docs.datasette.io/en/stable/facets.html">Facets</a> remains my favourite feature in Datasette: it turns out a simple group by / count against a column is one of the most productive ways I know of to start understanding new data.</p>
<p>Yesterday I stumbled across <a href="https://github.com/simonw/datasette/commit/a6c8e7fa4cffdeff84e9e755dcff4788fd6154b8">a tiny tweak</a> (details in <a href="https://github.com/simonw/datasette/issues/1394">this issue</a>) that gave me a 10x performance boost on facet queries! Short version: given the following example query:</p>
<div class="highlight highlight-source-sql"><pre><span class="pl-k">select</span>
country_long,
<span class="pl-c1">count</span>(<span class="pl-k">*</span>)
<span class="pl-k">from</span> (
<span class="pl-k">select</span> <span class="pl-k">*</span> <span class="pl-k">from</span> [global<span class="pl-k">-</span>power<span class="pl-k">-</span>plants]
<span class="pl-k">order by</span> rowid
)
<span class="pl-k">where</span>
country_long <span class="pl-k">is not null</span>
<span class="pl-k">group by</span>
country_long
<span class="pl-k">order by</span>
<span class="pl-c1">count</span>(<span class="pl-k">*</span>) <span class="pl-k">desc</span></pre></div>
<p>Removing the unnecessary <code>order by rowid</code> from that inner query knocked the time down from 53ms to 7.2ms (and makes even more of a difference on larger tables).</p>
<p>I was surprised SQLite didn't perform that optimization automatically - so I <a href="https://sqlite.org/forum/forumpost/2d76f2bcf65d256a">started a thread</a> on the SQLite forum and SQLite author D. Richard Hipp <a href="https://sqlite.org/src/timeline?r=omit-subquery-order-by">figured out a patch</a>! It's not yet certain that it will land in a SQLite release but I'm excited to have found an issue interesting enough to be worth looking into. (UPDATE: it <a href="https://sqlite.org/forum/forumpost/878ca7a9be0862af?t=h">landed on trunk</a>).</p>
<h4>The get_metadata() plugin hook</h4>
<blockquote>
<p>New plugin hook: <a href="https://docs.datasette.io/en/stable/plugin_hooks.html#plugin-hook-get-metadata">get_metadata(datasette, key, database, table)</a>, for returning custom metadata for an instance, database or table. Thanks, Brandon Roberts! (<a href="https://github.com/simonw/datasette/issues/1384">#1384</a>)</p>
</blockquote>
<p>Brandon Roberts contributed this hook as part of work he's been doing with <a href="https://next.newsday.com/">Newsday nextLI</a> - always exciting to see Datasette used by another news organization. Brandon has <a href="https://datasette-live.bxroberts.org/">a live demo</a> of the plugins he has been building: <a href="https://github.com/next-LI/datasette-live-config">datasette-live-config</a>, <a href="https://github.com/next-LI/datasette-live-permissions">datasette-live-permissions</a>, <a href="https://github.com/next-LI/datasette-csv-importer">datasette-csv-importer</a> and <a href="https://github.com/next-LI/datasette-surveys">datasette-surveys</a>. He also has a <a href="https://drive.google.com/file/d/1SShy_C6-CSUlSaqyQSIlUDIa4WA9YzTr/view">6 minute demo video</a> explaining the project so far.</p>
<p>The new hook allows plugins to customize the <a href="https://docs.datasette.io/en/stable/metadata.html#metadata">metadata</a> displayed for different databases and tables within the Datasette interface.</p>
<p>There is one catch at the moment: the plugin doesn't yet allow for async calls (including <code>await db.execute(sql)</code>) because Datasette's own internals currently treat access to metadata as a sync rather than async feature.</p>
<p>There are workarounds for this. Brandon's <code>datasette-live-config</code> plugin <a href="https://github.com/next-LI/datasette-live-config/blob/d7e39db50f33b78ec0ef3f404ba421c4a47a5844/datasette_live_config/__init__.py">opens an additional, synchronous connection</a> to the DB which is completely fine for fast queries. Another option would be to keep metadata in an in-memory Python dictionary which is updated by SQL queries that run in an async background task.</p>
<p>In the longer run though I'd like to redesign Datasette's internals to support asynchronous metadata access - ideally before Datasette 1.0.</p>
<h4 id="skip-csrf-plugin-hook">The skip_csrf() plugin hook</h4>
<blockquote>
<p>New plugin hook: <a href="https://docs.datasette.io/en/stable/plugin_hooks.html#plugin-hook-skip-csrf">skip_csrf(datasette, scope)</a>, for opting out of CSRF protection based on the incoming request. (<a href="https://github.com/simonw/datasette/issues/1377">#1377</a>)</p>
</blockquote>
<p>I wanted to write a plugin that supported an HTTP POST to a Datasette form that wasn't protected by Datasette's <a href="https://docs.datasette.io/en/stable/internals.html?highlight=csrf#csrf-protection">CSRF protection</a>. This proved surprisingly difficult! I ended up shipping <a href="https://github.com/simonw/asgi-csrf/releases/tag/0.9">asgi-csrf 0.9</a> with a new mechanism for custom opting-out of CSRF protection based on the ASGI scope, then exposing that mechanism in a new plugin hook in Datasette.</p>
<p>CSRF is such a frustrating security issue to write code against, because in modern browsers the SameSite cookie attribute more-or-less solves the problem for you... but that attribute only has <a href="https://caniuse.com/same-site-cookie-attribute">90% global usage according to caniuse.com</a> - not quite enough for me to forget about it entirely.</p>
<p>There also remains <a href="https://twitter.com/simonw/status/1413484080226717708">one obscure edge-case</a> in which SameSite won't help you: the definition of "same site" includes other subdomains of your domain (provided it's not on the <a href="https://github.com/publicsuffix/list">Public Suffix List</a>). This means that for SameSite CSRF protection to work you need to be confident that no subdomains of your domain will suffer an XSS - and in my experience its common for subdomains to be pointed at third-party applications that may not have the same stringent XSS protection as your main code.</p>
<p>So I continue to care about CSRF protection in Datasette.</p>
<h4>Unix domain socket support</h4>
<blockquote>
<p>New <code>datasette --uds /tmp/datasette.sock</code> option for binding Datasette to a Unix domain socket, see <a href="https://docs.datasette.io/en/stable/deploying.html#deploying-proxy">proxy documentation</a>. (<a href="https://github.com/simonw/datasette/issues/1388">#1388</a>)</p>
</blockquote>
<p>I wrote about this <a href="https://simonwillison.net/2021/Jul/13/unix-domain-sockets/">in my weeknotes</a> - this is a great way to run Datasette if you have it behind a proxy such as Apache or nginx and don't want to have the Datasette server listening on a high port.</p>
<h4>"searchmode": "raw" in table metadata</h4>
<blockquote>
<p><code>"searchmode": "raw"</code> table metadata option for defaulting a table to executing SQLite full-text search syntax without first escaping it, see <a href="https://docs.datasette.io/en/stable/full_text_search.html#full-text-search-advanced-queries">Advanced SQLite search queries</a>. (<a href="https://github.com/simonw/datasette/issues/1389">#1389</a>)</p>
</blockquote>
<p>SQLite's built in full-text search feature includes support <a href="https://www.sqlite.org/fts5.html#full_text_query_syntax">for advanced operators</a>: you can use operators like AND, OR and NEAR and you can add column specifiers like <code>name:Simon</code> to restrict searches to individual columns.</p>
<p>This is something of a two-edged sword: I've found innocent looking queries that raise errors due to unexpected interactions with the query language.</p>
<p>In <a href="https://github.com/simonw/datasette/issues/651">issue 651</a> I switched to escaping all queries by default to prevent these errors from happening, with a <code>?_searchmode=raw</code> query string option for opting back into the default functionality.</p>
<p>I've since had a few requests for a mechanism to enable this by default - hence the new <code>"searchmode": "raw"</code> option in table metadata.</p>
<h4>Link plugin hooks now take a request</h4>
<blockquote>
<p>The <a href="https://docs.datasette.io/en/stable/plugin_hooks.html#plugin-hook-menu-links">menu_links()</a>, <a href="https://docs.datasette.io/en/stable/plugin_hooks.html#plugin-hook-table-actions">table_actions()</a> and <a href="https://docs.datasette.io/en/stable/plugin_hooks.html#plugin-hook-database-actions">database_actions()</a> plugin hooks all gained a new optional <code>request</code> argument providing access to the current request. (<a href="https://github.com/simonw/datasette/issues/1371">#1371</a>)</p>
</blockquote>
<p>I have a plugin which needs to add links to different places depending on the subdomain that the Datasette instance is running on. Adding <code>request</code> to these plugin hooks proved to be the easiest way to achieve this.</p>
<p>This is a really nice thing about how <a href="https://pluggy.readthedocs.io/">Pluggy</a> (the plugin library used by Datasette) works: adding new named parameters to hooks can be done without breaking backwards compatibility with existing plugins.</p>
<h4>And the rest</h4>
<ul>
<li>Improved documentation for <a href="https://docs.datasette.io/en/stable/deploying.html#deploying-proxy">Running Datasette behind a proxy</a> to recommend using <code>ProxyPreservehost On</code> with Apache. (<a href="https://github.com/simonw/datasette/issues/1387">#1387</a>)</li>
<li>
<code>POST</code> requests to endpoints that do not support that HTTP verb now return a 405 error.</li>
<li>
<code>db.path</code> can now be provided as a <code>pathlib.Path</code> object, useful when writing unit tests for plugins. Thanks, Chris Amico. (<a href="https://github.com/simonw/datasette/issues/1365">#1365</a>)</li>
</ul>
Datasette 0.54: The annotated release notes2021-01-25T17:31:04+00:002021-01-25T17:31:04+00:00https://simonwillison.net/2021/Jan/25/datasette/#atom-series
<p><a href="https://docs.datasette.io/en/stable/changelog.html#v0-54">Datasette 0.54</a> is out today, with a number of foundational new features. As I've done for <a href="https://simonwillison.net/tags/releasenotes+datasette/">previous releases</a>, here are the annotated release notes with some additional commentary.</p>
<h4>The _internal database</h4>
<blockquote>
<p>As part of ongoing work to help Datasette handle much larger numbers of connected databases and tables (see <a href="https://github.com/simonw/datasette/issues/417">Datasette Library</a>) Datasette now maintains an in-memory SQLite database with details of all of the attached databases, tables, columns, indexes and foreign keys. (<a href="https://github.com/simonw/datasette/issues/1150">#1150</a>)</p>
<p>This will support future improvements such as a searchable, paginated homepage of all available tables.</p>
</blockquote>
<p>The Datasette Library issue has been open for nearly two years now. It's a need I identified at the NICAR 2019 data journalism conference, where it became apparent that many newsrooms are sat on an enormous pile of data that they have collected but without any central place to keep it all.</p>
<p>Large tech companies usually have some kind of data warehouse or data lake which is meant to address this problem. These are complicated, extremely expensive to run and the ability to add and maintain data in them is often reserved for a high technology priesthood.</p>
<p>I see Datasette as targeting the "small data" end of this equation, where I define small data as anything that will fit on my phone. My iPhone has half a terabyte of storage now!</p>
<p>How many more people and organizations would run their own data warehouse if it could sit on their laptop or run on a $5/month VPS?</p>
<blockquote>
<p>You can explore an example of this database by <a href="https://latest.datasette.io/login-as-root">signing in as root</a> to the <code>latest.datasette.io</code> demo instance and then navigating to <a href="https://latest.datasette.io/_internal">latest.datasette.io/_internal</a>.</p>
</blockquote>
<p>Every commit to Datasette deploys a new demo instance to <code>latest.datasette.io</code> - here's <a href="https://github.com/simonw/datasette/blob/main/.github/workflows/deploy-latest.yml">the GitHub Actions workflow</a>. This is useful for all sorts of things - I can use it to test additional devices, remind myself how different pages of Datasette work and in this case I can link to it directly from the documentation.</p>
<blockquote>
<p>Plugins can use these tables to introspect attached data in an efficient way. Plugin authors should note that this is not yet considered a stable interface, so any plugins that use this may need to make changes prior to Datasette 1.0 if the <code>_internal</code> table schemas change.</p>
</blockquote>
<p>In the ongoing press towards Datasette 1.0 I've decided to start shipping features like this earlier, but with prominent "this is not yet stable" notices to warn people that the details might change. I think this is a better way to preview new features than having them live in a branch or alpha release.</p>
<p>Once I hit 1.0 I'm going to only ship features that are genuinely stable, to avoid having to make any major version number bumps for as long as possible.</p>
<h4>Named in-memory database support</h4>
<blockquote>
<p>As part of the work building the <code>_internal</code> database, Datasette now supports named in-memory databases that can be shared across multiple connections. This allows plugins to create in-memory databases which will persist data for the lifetime of the Datasette server process. (<a href="https://github.com/simonw/datasette/issues/1151">#1151</a>)</p>
<p>The new <code>memory_name=</code> parameter to the <a href="https://docs.datasette.io/en/stable/#internals-database">Database class</a> can be used to create named, shared in-memory databases.</p>
</blockquote>
<p>This exposes a relatively obscure feature of SQLite: it allows in-memory databases to <a href="https://sqlite.org/inmemorydb.html#sharedmemdb">operate with a shared cache</a>, which means multiple database connections can see the same in-memory data.</p>
<p>I can think of so many fun plugin use-cases for this!</p>
<ul>
<li>A plugin that accepts a URL to a CSV file, then pulls that data into an in-memory table and presents a standard Datasette table interface for interacting with it</li>
<li>An API rate-limiting plugin that uses an in-memory database as a persistent store</li>
<li>Any plugin that might need to cache something can now do so with its own in-memory database</li>
</ul>
<p>I haven't used this capability in a plugin myself yet but I'm excited to try it out with something soon.</p>
<h4 id="javascript-module-support">JavaScript module support</h4>
<blockquote>
<p><a href="https://developer.mozilla.org/en-US/docs/Web/JavaScript/Guide/Modules">JavaScript modules</a> were introduced in ECMAScript 2015 and provide native browser support for the <code>import</code> and <code>export</code> keywords.</p>
<p>To use modules, JavaScript needs to be included in <code><script></code> tags with a <code>type="module"</code> attribute.</p>
<p>Datasette now has the ability to output <code><script type="module"></code> in places where you may wish to take advantage of modules. The <code>extra_js_urls</code> option described in <a href="https://docs.datasette.io/en/stable/custom_templates.html#customization-css-and-javascript">Custom CSS and JavaScript</a> can now be used with modules, and module support is also available for the <a href="https://docs.datasette.io/en/stable/plugin_hooks.html#plugin-hook-extra-body-script">extra_body_script()</a> plugin hook. (<a href="https://github.com/simonw/datasette/issues/1186">#1186</a>, <a href="https://github.com/simonw/datasette/issues/1187">#1187</a>)</p>
</blockquote>
<p>This feature was inspired by discussion on <a href="https://github.com/simonw/datasette/issues/983">this issue</a> about a JavaScript plugin system for Datasette.</p>
<p>I've been getting excited about JavaScript modules recently, partly thanks to <a href="https://www.snowpack.dev/">Snowpack</a> and <a href="https://www.skypack.dev/">Skypack</a>. I enjoyed maintainer Fred K. Schott's <a href="https://octo.github.com/speakerseries/FredKSchott/">GitHub OCTO speaker series talk</a> last week.</p>
<p>The key feature of modules that is useful for Datasette is that they solve the "double imports" problem. Datasette plugins often need to load additional JavaScript libraries, such as Leaflet or React. Using modules means those plugins can, without any knowledge of each other, avoid accidentally loading the same code twice.</p>
<blockquote>
<p><a href="https://datasette.io/plugins/datasette-leaflet-freedraw">datasette-leaflet-freedraw</a> is the first example of a Datasette plugin that takes advantage of the new support for JavaScript modules. See <a href="https://simonwillison.net/2021/Jan/24/drawing-shapes-spatialite/">Drawing shapes on a map to query a SpatiaLite database</a> for more on this plugin.</p>
</blockquote>
<p>I now have three plugins that load three different versions of Leaflet: <a href="https://datasette.io/plugins/datasette-cluster-map">datasette-cluster-map</a>, <a href="https://datasette.io/plugins/datasette-leaflet-geojson">datasette-leaflet-geojson</a> and the new <a href="https://datasette.io/plugins/datasette-leaflet-freedraw">datasette-leaflet-freedraw</a>.</p>
<p>I've started work on a new plugin called <a href="https://datasette.io/plugins/datasette-leaflet">datasette-leaflet</a> which will make Leaflet available as a JavaScript module. Those other three plugins can then all depend on <code>datasette-leaflet</code> so that they can share the same library version.</p>
<h4>Code formatting with Black and Prettier</h4>
<blockquote>
<p>Datasette adopted <a href="https://github.com/psf/black">Black</a> for opinionated Python code formatting in June 2019. Datasette now also embraces <a href="https://prettier.io/">Prettier</a> for JavaScript formatting, which like Black is enforced by tests in continuous integration. Instructions for using these two tools can be found in the new section on <a href="https://docs.datasette.io/en/stable/contributing.html#contributing-formatting">Code formatting</a> in the contributors documentation. (<a href="https://github.com/simonw/datasette/issues/1167">#1167</a>)</p>
</blockquote>
<p>I <em>love</em> opinionated code formatting tools. After adopting Black it started to dawn on my quite how much of my programming brain cycles had been spent thinking about indentation - sweating over the details (even in Python) of when to linebreak, how to format complex function definitions and so on.</p>
<p>Switching to Black gave me <em>all</em> of that thinking time back to spend on other things. It even gave me a <a href="https://simonwillison.net/2020/Feb/11/cheating-at-unit-tests-pytest-black/">more productive way to write unit tests</a>.</p>
<p>I've been wanting to do the same thing for JavaScript for years, but there were tiny aspects of the formatting style used by Prettier that I didn't like - specifically the way it formats chained method calls with newlines.</p>
<p>As the positive evidence from using Black continue to build, I realized that the benefits of never having to think about formatting again outweighed any tiny disagreements I might have about aesthetics. So I'm all on board with Prettier now!</p>
<h4>Everything else</h4>
<p>The <a href="https://docs.datasette.io/en/stable/changelog.html#v0-54">rest of the release</a> is mostly small bug fixes.</p>
<p>There are some documentation improvements too. I added a new section to the page about writing tests for plugins on <a href="https://docs.datasette.io/en/stable/testing_plugins.html#testing-plugins-pytest-httpx">testing outbound HTTP calls with pytest-httpx</a> - any time I find myself having to solve the same problem in more than one plugin is a hint that it should probably be covered by the documentation.</p>
<p>As always, I welcome conversations about the new release (and Datasette in general) on the <a href="https://github.com/simonw/datasette/discussions">Datasette discussions forum</a>. I'm also continuing to run <a href="https://calendly.com/swillison/datasette-office-hours">Datasette Office Hours</a> every Friday, so if you're interested in a 1-1 conversation about the project please grab a slot!</p>
Datasette 0.50: The annotated release notes2020-10-09T20:23:11+00:002020-10-09T20:23:11+00:00https://simonwillison.net/2020/Oct/9/datasette-0-50/#atom-series
<p>I released <a href="https://docs.datasette.io/en/stable/changelog.html#v0-50">Datasette 0.50</a> this morning, with a new user-facing column actions menu feature and a way for plugins to make internal HTTP requests to consume the JSON API of their parent Datasette instance.</p>
<h4 id="column-actions">The column actions menu</h4>
<blockquote cite="https://docs.datasette.io/en/stable/changelog.html#v0-50"><p>The key new feature in this release is the <strong>column actions</strong> menu on the table page (<a href="https://github.com/simonw/datasette/issues/891">#891</a>). This can be used to sort a column in ascending or descending order, facet data by that column or filter the table to just rows that have a value for that column.</p></blockquote>
<p>The table page is the most important page within Datasette: it's where users interact with database tables.</p>
<p>Prior to 0.50 users could sort those tables by clicking on the column header. If they wanted to sort in descending order they had to click it, wait for the table to reload and then click it a second time.</p>
<p>In 0.50 I've introduced a new UI element which I'm calling the <em>column actions menu</em>. Here's an animation showing it in action on the <a href="https://latest.datasette.io/fixtures/facetable">facetable</a> demo table:</p>
<p><img src="https://static.simonwillison.net/static/2020/column-actions.gif" alt="Animated demo of the columns action menu, showing it used to sort a column and select two other columns for faceting" style="max-width:100%;" /></p>
<p>Right now the menu can be used to sort ascending, sort descending or add the column to the current set of select facets. If a column has any blank values on the current page a menu option to "Show not-blank rows" appears too - you can try that out on the <a href="https://latest.datasette.io/fixtures/sortable">sortable</a> table.</p>
<p>I plan to extend this with more options in the future. I'd also like to make it a documented plugin extension point, so plugins can add their own column-specific actions. I need to figure out a JavaScript equivalent of the Python pluggy plugins mechanism first though, see <a href="https://github.com/simonw/datasette/issues/983">issue 983</a>.</p>
<h4 id="datasette-client">datasette.client</h4>
<blockquote cite="https://docs.datasette.io/en/stable/changelog.html#v0-50"><p>Plugin authors can use the new <a class="reference internal" href="https://docs.datasette.io/en/stable/internals.html#internals-datasette-client"><span class="std std-ref">datasette.client</span></a> object to make internal HTTP requests from their plugins, allowing them to make use of Datasette's JSON API. (<a href="https://github.com/simonw/datasette/issues/943">#943</a>)</p></blockquote>
<p>In building the <a href="https://github.com/simonw/datasette-graphql">datasette-graphql</a> plugin I ran into an interesting requirement. I wanted to provide efficient <a href="https://simonwillison.net/2018/Oct/4/datasette-ideas/#Keyset_pagination">keyset pagination</a> within the GraphQL schema, which is actually quite a complex things to implement.</p>
<p>Datasette already has a robust implementation of keyset pagination, but it's tangled up in the implementation of <a href="https://github.com/simonw/datasette/blob/0.50/datasette/views/table.py#L742-L770">the internal TableView class</a>.</p>
<p>It's not available as a documented, stable Python API... but it IS available via the Datasette JSON API.</p>
<p>Wouldn't it be great if Datasette plugins could make direct calls to the same externally facing, documented HTTP JSON API that Datasette itself exposes to end users?</p>
<p>That's what the new <code>datasette.client</code> object does. It's a thin wrapper around <a href="https://www.python-httpx.org/async/">HTTPX AsyncClient</a> (the excellent new Python HTTP library which takes the Requests API and makes it fully asyncio compliant) which dispatches requests internally to Datasette's ASGI application, without any of the network overhead of an external HTTP request.</p>
<p>One of my goals for Datasette 1.0 is to bring the externally facing JSON API to full, documented, stable status.</p>
<p>The idea of Python plugins being able to efficiently use that same API feels really elegant to me. I'm looking forward to taking advantage of this in my own plugins.</p>
<h4 id="deploying-datasette">Deploying Datasette documentation</h4>
<blockquote cite="https://docs.datasette.io/en/stable/changelog.html#v0-50"><p>New <a href="https://docs.datasette.io/en/stable/deploying.html#deploying">Deploying Datasette</a> documentation with guides for deploying Datasette on a Linux server <a href="https://docs.datasette.io/en/stable/deploying.html#deploying-systemd">using systemd</a> or to hosting providers <a href="https://docs.datasette.io/en/stable/deploying.html#deploying-buildpacks">that support buildpacks</a>. (<a href="https://github.com/simonw/datasette/issues/514">#514</a>, <a href="https://github.com/simonw/datasette/issues/997">#997</a>)</p></blockquote>
<p>The buildpack documenation was inspired by <a href="https://til.simonwillison.net/til/til/digitalocean_datasette-on-digitalocean-app-platform.md">my experiments</a> with the new <a href="https://www.digitalocean.com/docs/app-platform/">DigitialOcean App Platform</a> this week. App Platform is a Heroku-style PaaS hosting platform that implements the <a href="https://buildpacks.io/">Cloud Native Buildpacks</a> standard which emerged based on Heroku's architecture a few years ago.</p>
<p>I hadn't realized quite how easy it is to run a custom Python application (such as Datasette) using buildpacks - it's literally just a GitHub repository with two single-line files in it, <samp>requirements.txt</samp> and <samp>Procfile</samp> - the buildpacks mechanism detects the <samp>requirements.txt</samp> and configures a Python environment automatically.</p>
<p>I deployed my new <a href="https://github.com/simonw/buildpack-datasette-demo">simonw/buildpack-datasette-demo</a> repo on DigitalOcean, Heroku and <a href="https://scalingo.com/">Scalingo</a> to try this out. It worked on all three providers with no changes - and all three offer continuous deployment against GitHub where any changes to that repository automatically trigger a deployment (optionally guarded by a CI test suite).</p>
<p>Since I was creating a deployment documenatation page I decided to finally address <a href="https://github.com/simonw/datasette/issues/514">issue 514</a> and document how I've used <samp>systemd</samp> to deploy Datasette on some of my own projects. I'm very keen to hear from people who try out this recipe so I can continue to improve it over time.</p>
<p><em>This is part of a series: see also the annotated release notes for Datasette <a href="https://simonwillison.net/2020/Jun/12/annotated-release-notes/">0.44</a>, <a href="https://simonwillison.net/2020/Jul/1/datasette-045/">0.45</a> and <a href="https://simonwillison.net/2020/Sep/15/datasette-0-49/">0.49</a>.</em></p>
Datasette 0.49: The annotated release notes2020-09-15T23:45:59+00:002020-09-15T23:45:59+00:00https://simonwillison.net/2020/Sep/15/datasette-0-49/#atom-series
<p><a href="https://docs.datasette.io/en/stable/changelog.html#v0-49">Datasette 0.49</a> is out. Some notes on what's new.</p>
<h4>API for writable canned queries</h4>
<blockquote>
<p>Writable canned queries now expose a JSON API, see <a href="https://docs.datasette.io/en/stable/sql_queries.html#canned-queries-json-api">JSON API for writable canned queries</a>. (<a href="https://github.com/simonw/datasette/issues/880">#880</a>)</p>
</blockquote>
<p>I wrote <a href="https://simonwillison.net/2020/Jun/12/annotated-release-notes/">about writable canned queries</a> when they were introduced in Datasette 0.44 back in June. They provide a mechanism for defining a canned SQL query which can make writes (inserts, updates or deletes) to the underlying SQLite database. They can be protected by <a href="https://docs.datasette.io/en/stable/authentication.html">Datasette authentication</a> or you can leave them open - for example if you want unauthenticated users to be able to post comments or leave feedback messages.</p>
<p>The missing feature was API support. Datasette 0.49 adds that, so now you can define a canned query that writes to the database and then call it as a JSON API - either without authentication or protected by a mechanism such as that provided by the <a href="https://github.com/simonw/datasette-auth-tokens">datasette-auth-tokens</a> plugin.</p>
<p>This feature works with <a href="https://simonwillison.net/2020/Jul/1/datasette-045/">magic parameters</a>, so you can define an API that accepts API traffic and automatically logs things like the incoming IP address. Here's a canned query defined in a <code>metadata.yml</code> file that logs user agent and IP addresses:</p>
<div class="highlight highlight-source-yaml"><pre><span class="pl-ent">databases</span>:
<span class="pl-ent">logs</span>:
<span class="pl-ent">queries</span>:
<span class="pl-ent">log</span>:
<span class="pl-ent">sql</span>: <span class="pl-s">|-</span>
<span class="pl-s"> INSERT INTO logs (</span>
<span class="pl-s"> user_agent, datetime</span>
<span class="pl-s"> ) VALUES (</span>
<span class="pl-s"> :_header_user_agent, :_now_datetime_utc</span>
<span class="pl-s"> )</span>
<span class="pl-s"></span> <span class="pl-ent">write</span>: <span class="pl-c1">true</span></pre></div>
<p>Create a SQLite database file called <code>logs.db</code> with the correct table like this:</p>
<pre><code>$ sqlite-utils create-table logs.db logs id integer user_agent text datetime text --pk=id
</code></pre>
<p>Confirm the created schema with:</p>
<pre><code>$ sqlite3 logs.db .schema
CREATE TABLE [logs] (
[id] INTEGER PRIMARY KEY,
[user_agent] TEXT,
[datetime] TEXT
);
</code></pre>
<p>Now start Datasette like so:</p>
<pre><code>$ datasette logs.db -m metadata.yml
</code></pre>
<p>And visit <code>http://127.0.0.1:8001/logs/log</code>. You can click the "Run SQL" button there to insert a new log row, which you can then view at <code>http://127.0.0.1:8001/logs/logs</code>.</p>
<p>Next, the API. You can request a JSON response by adding <code>?_json=1</code> to the URL, so try this with <code>curl</code>:</p>
<pre><code>$ curl -XPOST 'http://127.0.0.1:8001/logs/log?_json=1'
{"ok": true, "message": "Query executed, 1 row affected", "redirect": null}
</code></pre>
<p>You can also set the <code>Accept: application/json</code> header on your request, like so:</p>
<pre><code>$ curl -XPOST 'http://127.0.0.1:8001/logs/log' -H 'Accept: application/json'
{"ok": true, "message": "Query executed, 1 row affected", "redirect": null} ~ %
</code></pre>
<p>Or by passing <code>_json=1</code> as part of a POST submission. Let's try that using <code>requests</code>:</p>
<pre><code>$ ipython
In [1]: import requests
In [2]: requests.post("http://127.0.0.1:8001/logs/log", data={"_json": "1"}).json()
Out[2]: {'ok': True, 'message': 'Query executed, 1 row affected', 'redirect': None}
</code></pre>
<h4 id="path-parameters-custom-page-templates">Path parameters for custom page templates</h4>
<blockquote>
<p>New mechanism for defining page templates with custom path parameters - a template file called <code>pages/about/{slug}.html</code> will be used to render any requests to <code>/about/something</code>. See <a href="https://docs.datasette.io/en/stable/custom_templates.html#custom-pages-parameters">Path parameters for pages</a>. (<a href="https://github.com/simonw/datasette/issues/944">#944</a>)</p>
</blockquote>
<p>I added custom page support in <a href="https://docs.datasette.io/en/stable/changelog.html?highlight=pages#v0-41">Datasette 0.41</a> back in May, based on the needs of my <a href="https://www.niche-museums.com/">Niche Museums</a> site. I wanted an easy way to create things like an <code>/about</code> page that returned content from a custom template.</p>
<p>Custom page templates work as a fallback for Datasette 404s. If <code>/about</code> fails to resolve and Datasette was provided a <code>--template-dir</code> on startup, Datasette will check to see if a template exists called <code>templates/pages/about.html</code>.</p>
<p>Datasette 0.49 adds support for <a href="https://docs.datasette.io/en/stable/custom_templates.html#path-parameters-for-pages">path parameters</a>, partially inspired by cookiecutter (which showed me that it's OK to create files with curly braces in their name). You can now create templates with <code>{slug}</code> style wildcards as part of their filenames, and Datasette will route matching requests to that template.</p>
<p>I shipped a new release of Niche Museums today that takes advantage of that feature. I wanted neater URLs for museum pages - to shift from <a href="https://www.niche-museums.com/browse/museums/101">https://www.niche-museums.com/browse/museums/101</a> to <a href="https://www.niche-museums.com/101">https://www.niche-museums.com/101</a>.</p>
<p>Here's how it works. I have a template file called <a href="https://github.com/simonw/museums/blob/master/templates/pages/%7Bid%7D.html">templates/pages/{id}.html</a>. That template takes advantage of the <a href="https://github.com/simonw/datasette-template-sql">datasette-template-sql</a> plugin, which adds a <code>sql()</code> function that can be called within the template. It starts like this:</p>
<div class="highlight highlight-text-html-django"><pre><!DOCTYPE html>
<<span class="pl-ent">html</span>>
<span class="pl-e">{%</span> <span class="pl-s">set</span> <span class="pl-s">rows</span> = <span class="pl-s">sql</span>(<span class="pl-s">"select * from museums where id = :id"</span>, {<span class="pl-s">"id"</span>: <span class="pl-s">id</span>}) <span class="pl-e">%}</span>
<span class="pl-e">{%</span> <span class="pl-k">if</span> <span class="pl-k">not</span> <span class="pl-s">rows</span> <span class="pl-e">%}</span>
{{ raise_404("Museum not found") }}
<span class="pl-e">{%</span> <span class="pl-k">endif</span> <span class="pl-e">%}</span>
<<span class="pl-ent">head</span>>
<<span class="pl-ent">meta</span> <span class="pl-e">charset</span>=<span class="pl-s"><span class="pl-pds">"</span>utf-8<span class="pl-pds">"</span></span>>
<span class="pl-e">{%</span> <span class="pl-s">set</span> <span class="pl-s">museum</span> = <span class="pl-s">rows</span>[0] <span class="pl-e">%}</span>
... rest of the template here ...</pre></div>
<p>Datasette made the variable <code>id</code> available to the context having captured it from the <code>{id}.html</code> template matching the incoming URL.</p>
<p>I then use the <code>sql()</code> function to execute a query, passing in <code>id</code> as a query parameter.</p>
<p>If there are no matches, I call the brand new <code>raise_404()</code> template function which cancels the rendering of this template and falls back to Datasette's default 404 handling.</p>
<p>Otherwise, I set the <code>museum</code> variable to <code>rows[0]</code> and continue rendering the page.</p>
<p>I've basically reinvented PHP and ColdFusion in my custom 404 handler. Hooray!</p>
<h4>A few other notable changes</h4>
<blockquote>
<p><code>register_output_renderer()</code> render functions can now return a <code>Response</code>. (<a href="https://github.com/simonw/datasette/issues/953">#953</a>)</p>
</blockquote>
<p>The <a href="https://docs.datasette.io/en/stable/plugin_hooks.html#register-output-renderer-datasette">register_output_renderer() plugin hook</a> was designed before Datasette had a <a href="https://docs.datasette.io/en/stable/internals.html#response-class">documented Response class</a>. It asked your plugin to return a custom <code>{"content_type": "...", "body": "...", "status_code": 200, "headers": {}}</code> dictionary.</p>
<p>You can now return a <code>Response</code> instead, and I plan to remove the dictionary version before Datasette 1.0. I released new versions of <a href="https://github.com/simonw/datasette-ics">datasette-ics</a>, <a href="https://github.com/simonw/datasette-atom">datasette-atom</a>, <a href="https://github.com/simonw/datasette-copyable">datasette-copyable</a> and the new <a href="https://github.com/simonw/datasette-yaml">datasette-yaml</a> that use the new return format.</p>
<blockquote>
<p>New <code>--upgrade</code> option for <code>datasette install</code>. (<a href="https://github.com/simonw/datasette/issues/945">#945</a>)</p>
</blockquote>
<p>I added the <code>datasette install datasette-cluster-map</code> command as a thin wrapper around <code>pip install</code>.</p>
<p>This means you can install new plugins without first figuring out which virtual environment your Datasette is running out - particularly useful if you install Datasette <a href="https://docs.datasette.io/en/stable/installation.html#using-homebrew">using Homebrew</a>.</p>
<p>I then realized that this could be used to upgrade Datasette itself - but only if you could run <code>pip install -U</code>. So now <code>datasette install -U datasette</code> will upgrade Datasette in place.</p>
<blockquote>
<p>New <code>datasette --pdb</code> option. (<a href="https://github.com/simonw/datasette/issues/962">#962</a>)</p>
</blockquote>
<p>This is useful if you are working on Datasette itself, or a Datasette plugin. Pass the <code>--pdb</code> option and Datasette will start an interactive Python debugger session any time it hits an exception.</p>
<blockquote>
<p><code>datasette --get</code> exit code now reflects the internal HTTP status code. (<a href="https://github.com/simonw/datasette/issues/947">#947</a>)</p>
</blockquote>
<p>I'm excited about the pattern of using <code>datasette --get</code> for running simple soundness checks, e.g. as part of a CI suite. Now that the exit code reflects the status code for the page you can write test scripts that look like this:</p>
<pre><code># Fail if homepage returns 404 or 500
datasette . --get /
</code></pre>
<blockquote>
<p>New <code>raise_404()</code> template function for returning 404 errors. (<a href="https://github.com/simonw/datasette/issues/964">#964</a>)</p>
</blockquote>
<p>Demonstrated in the Niche Museums example above.</p>
<p>And the rest:</p>
<blockquote>
<ul>
<li>
<code>datasette publish heroku</code> now deploys using Python 3.8.5</li>
<li>Upgraded <a href="https://codemirror.net/">CodeMirror</a> to 5.57.0. (<a href="https://github.com/simonw/datasette/issues/948">#948</a>)</li>
<li>Upgraded code style to Black 20.8b1. (<a href="https://github.com/simonw/datasette/issues/958">#958</a>)</li>
<li>Fixed bug where selected facets were not correctly persisted in hidden form fields on the table page. (<a href="https://github.com/simonw/datasette/issues/963">#963</a>)</li>
<li>Renamed the default error template from <code>500.html</code> to <code>error.html</code>.</li>
<li>Custom error pages are now documented, see <a href="https://docs.datasette.io/en/stable/custom_templates.html#custom-pages-errors">Custom error pages</a>. (<a href="https://github.com/simonw/datasette/issues/965">#965</a>)</li>
</ul>
</blockquote>
<p>In writing up these annotated release notes I spotted <a href="https://github.com/simonw/datasette/issues/967">a bug</a> with writable canned queries, which I have now fixed in <a href="https://docs.datasette.io/en/stable/changelog.html#v0-49-1">Datasette 0.49.1</a>.</p>
Datasette 0.45: The annotated release notes2020-07-01T22:33:06+00:002020-07-01T22:33:06+00:00https://simonwillison.net/2020/Jul/1/datasette-045/#atom-series
<p><strong>Datasette 0.45</strong>, out today, features magic parameters for canned queries, a log out feature, improved plugin documentation and four new plugin hooks.</p>
<p>As I did <a href="https://simonwillison.net/2020/Jun/12/annotated-release-notes/">for Datasette 0.44</a>, I'm going to provide an annotated version of <a href="https://datasette.readthedocs.io/en/stable/changelog.html#v0-45">the full release notes</a> here on my blog.</p>
<h4>Magic parameters for canned queries</h4>
<blockquote><p>Canned queries now support <a href="https://datasette.readthedocs.io/en/stable/sql_queries.html#canned-queries-magic-parameters">Magic parameters</a>, which can be used to insert or select automatically generated values. For example:</p>
<pre><code>insert into logs
(user_id, timestamp)
values
(:_actor_id, :_now_datetime_utc)
</code></pre>
<p>This inserts the currently authenticated actor ID and the current datetime. (<a href="https://github.com/simonw/datasette/issues/842">#842</a>)</p></blockquote>
<p>This is a fun new feature that extends the capabilities of <a href="https://datasette.readthedocs.io/en/stable/sql_queries.html#writable-canned-queries">writable canned queries</a>, introduced in Datasette 0.44.</p>
<p>The key idea here is to make it easy to insert contextual information such as the current timestamp, the authenticated actor or other automatically generated values as part of a writable query.</p>
<p>This means Datasette's canned queries are now powerful enough to build things like simple comment systems or logging endpoints purely by defining a SQL query with the right magic parameters.</p>
<p>There's even a <code>:_random_chars_32</code> parameter that automatically generates a random text string - useful for things like generating authentication tokens for use with <code>datasette-auth-tokens</code>. More on this below.</p>
<h4>Log out</h4>
<blockquote><p>The <a href="https://datasette.readthedocs.io/en/stable/authentication.html#authentication-ds-actor">ds_actor cookie</a> can be used by plugins (or by Datasette's <a href="https://datasette.readthedocs.io/en/stable/authentication.html#authentication-root">--root mechanism</a>) to authenticate users. The new <code>/-/logout</code> page provides a way to clear that cookie.</p><p>A "Log out" button now shows in the global navigation provided the user is authenticated using the <code>ds_actor</code> cookie. (<a href="https://github.com/simonw/datasette/issues/840">#840</a>)</p></blockquote>
<p>Out of the box, Datasette's authentication system is quite primitive: the only way to get an authenticated session is to use the <code>--root</code> option to get a special link when the server first starts running. As <a href="https://datasette.readthedocs.io/en/stable/authentication.html#the-ds-actor-cookie">described in the documentation</a>, the goal is for plugins to fill in the rest.</p>
<p>Even with just that mechanism it still makes sense to let people log out again! The new <code>/-/logout</code> page can do that, and Datasette's navigation now includes a log out button if the user is logged in using that <code>ds_actor</code> cookie.</p>
<p>You can see what this looks like <a href="https://latest.datasette.io/-/patterns">in Datasette's pattern portfolio</a>.</p>
<h4>New plugin hooks</h4>
<blockquote><p><a href="https://datasette.readthedocs.io/en/stable/plugin_hooks.html#plugin-hook-register-magic-parameters">register_magic_parameters(datasette)</a> can be used to define new types of magic canned query parameters.</p></blockquote>
<p>I'm increasingly trying to have Datasette internally use plugin hooks for default behaviour. This hook can define custom magic parameters - you can see the implementation of the default parameters using this hook in <a href="https://github.com/simonw/datasette/blob/0.45/datasette/default_magic_parameters.py">default_magic_parameters.py</a>.</p>
<blockquote><p><a href="https://datasette.readthedocs.io/en/stable/plugin_hooks.html#plugin-hook-startup">startup(datasette)</a> can run custom code when Datasette first starts up. <a href="https://github.com/simonw/datasette-init">datasette-init</a> is a new plugin that uses this hook to create database tables and views on startup if they have not yet been created. (<a href="https://github.com/simonw/datasette/issues/834">#834</a>)</p></blockquote>
<p>Here's an example <code>datasette-init</code> plugin configuration in <code>metadata.yaml</code>. This will create a <code>dogs</code> table when the server starts, but only if one has not yet been created:</p>
<pre><code>plugins:
datasette-init:
my_database:
tables:
dogs:
columns:
id: integer
name: text
age: integer
weight: float
pk: id
</code></pre>
<blockquote><p><a href="https://datasette.readthedocs.io/en/stable/plugin_hooks.html#plugin-hook-canned-queries">canned_queries(datasette, database, actor)</a> lets plugins provide additional canned queries beyond those defined in Datasette's metadata. See <a href="https://github.com/simonw/datasette-saved-queries">datasette-saved-queries</a> for an example of this hook in action. (<a href="https://github.com/simonw/datasette/issues/852">#852</a>)</p></blockquote>
<p>This started out as <a href="https://twitter.com/amjithr/status/1273440766862352384">a feature request</a> from Amjith Ramanujam on Twitter.</p>
<p>Canned queries, like <a href="https://latest.datasette.io/fixtures#queries">these ones</a>, are usually defined in the increasingly poorly-named <code>metadata.json/yaml</code>.</p>
<p>Letting plugins define them opens up some neat possibilities. <a href="https://github.com/simonw/datasette-saved-queries">datasette-saved-queries</a> is an interesting example: it lets users store new queries in their database, inserting them using a writable canned query that the plugin itself returns from that hook by default.</p>
<p>Here's <a href="https://github.com/simonw/datasette-saved-queries/blob/b6b61af4ec3b3ff15e6101561dec8567847c881f/datasette_saved_queries/__init__.py">the code</a>. It also uses the new <code>startup()</code> hook to create its own table.</p>
<blockquote><p><a href="https://datasette.readthedocs.io/en/stable/plugin_hooks.html#plugin-hook-forbidden">forbidden(datasette, request, message)</a> is a hook for customizing how Datasette responds to 403 forbidden errors. (<a href="https://github.com/simonw/datasette/issues/812">#812</a>)</p></blockquote>
<p>I need this for the next version of <a href="https://github.com/simonw/datasette-auth-github">datasette-auth-github</a> - it's a way to customize what happens when a user fails a permission check.</p>
<h4>Even more plugins</h4>
<p>Thanks to the <a href="https://simonwillison.net/2020/Jun/20/cookiecutter-plugins/">datasette-plugin cookiecutter template</a> I can turn out simple plugins in just a few minutes. Here are my new releases from the past week:</p>
<ul><li><a href="https://github.com/simonw/datasette-init">datasette-init</a>, described above.</li><li><a href="https://github.com/simonw/datasette-write">datasette-write</a> provides a <code>/-/write</code> page that can be used to directly execute write SQL queries against a selected database.</li><li><a href="https://github.com/simonw/datasette-allow-permissions-debug">datasette-allow-permissions-debug</a> which is <a href="https://github.com/simonw/datasette-allow-permissions-debug/blob/7d9d315082b70fd03a979acc0bf55df1443b3d27/datasette_allow_permissions_debug/__init__.py">absolutely tiny</a>. All it does is listen for <code>permissions-debug</code> checks and return <code>True</code> for them. This means you can access the <a href="https://datasette.readthedocs.io/en/stable/authentication.html#the-permissions-debug-tool">/-/permissions debug page</a> on your Datasette instance without authenticating first, which is handy for debugging.</li><li><a href="https://github.com/simonw/datasette-glitch">datasette-glitch</a> is designed for use with <a href="https://glitch.com/">Glitch</a>. It outputs a magic one-time use URL to the private Glitch console which you can use to authenticate with your Datasette instance there as the <code>root</code> user.</li><li>I also released a new version of <a href="https://github.com/simonw/datasette-auth-tokens">datasette-auth-tokens</a>, which allows users to configure API tokens to be used to access a private Datasette instance. It now lets you <a href="https://github.com/simonw/datasette-auth-tokens/blob/0.2.1/README.md#tokens-from-your-database">store tokens in a database table</a> based on a configured SQL query. The <code>:_random_chars_32</code> magic parameter mentioned above can be used to help create new tokens.</li></ul>
<h4>What's next?</h4>
<p>I've already slipped one feature into the <a href="https://github.com/simonw/datasette/milestone/16">Datasette 0.46 milestone</a>, but my focus from here on should really be on getting everything in place for <a href="https://github.com/simonw/datasette/milestone/7">Datasette 1.0</a>.</p>
Datasette 0.44: The annotated release notes2020-06-12T03:11:06+00:002020-06-12T03:11:06+00:00https://simonwillison.net/2020/Jun/12/annotated-release-notes/#atom-series
<p>I just released <strong>Datasette 0.44</strong> to PyPI. With <a href="https://github.com/simonw/datasette/compare/0.43...0.44">128 commits</a> since 0.43 this is the biggest release in a long time - and likely the last major release of new features before Datasette 1.0.</p>
<p>You can read the <a href="https://datasette.readthedocs.io/en/latest/changelog.html#v0-44">full release notes here</a>, but I've decided to try something a little different for this release and write up some annotations here on my blog.</p>
<blockquote><h4>Writable canned queries</h4><p>Datasette's <a href="https://datasette.readthedocs.io/en/latest/sql_queries.html#canned-queries">Canned queries</a> feature lets you define SQL queries in <code>metadata.json</code> which can then be executed by users visiting a specific URL. <a href="https://latest.datasette.io/fixtures/neighborhood_search">https://latest.datasette.io/fixtures/neighborhood_search</a> for example.</p><p>Canned queries were previously restricted to <code>SELECT</code>, but Datasette 0.44 introduces the ability for canned queries to execute <code>INSERT</code> or <code>UPDATE</code> queries as well, using the new <code>"write": true</code> property (<a href="https://github.com/simonw/datasette/issues/800">#800</a>)</p></blockquote>
<p>I originally intended this to be the main feature in the release.</p>
<p>Datasette 0.37 added the ability for plugins to <a href="https://simonwillison.net/2020/Feb/26/weeknotes-datasette-writes/">write to the database</a>. This marked a pretty huge philosophical shift for Datasette: from a read-only publishing system to a framework for building interactive SQL-driven applications. But you needed a plugin, such as my <a href="https://github.com/simonw/datasette-upload-csvs">datasette-upload-csvs</a>.</p>
<p>I realized <a href="https://github.com/simonw/datasette/issues/698">back in March</a> that canned queries could provide a simple, sensible way to start adding write functionality to Datasette core. A query could be configured like this:</p>
<pre><code>{
"databases": {
"my-database": {
"queries": {
"add_twitter_handle": {
"sql": "insert into twitter_handles (username) values (:username)",
"write": true
}
}
}
}
}</code></pre>
<p>And Datasette could then provide a form interface at <code>/my-database/add_twitter_handle</code> for executing that query. How hard could that be to implement?</p>
<p>The problem with "simple" features like this is that they open up a cascade of other features.</p>
<p>If you're going to have write queries, you probably need to restrict who can execute them - which means authentication and permissions. If forms are performing POSTs you need CSRF protection. If users are changing state you need a way to show them messages telling them what happened.</p>
<p>So let's talk about authentication and permissions.</p>
<blockquote><h4>Authentication</h4><p>Prior to this release the Datasette ecosystem has treated authentication as exclusively the realm of plugins, most notably through <a href="https://github.com/simonw/datasette-auth-github">datasette-auth-github</a>.</p><p>0.44 introduces <a href="https://datasette.readthedocs.io/en/latest/authentication.html#authentication">Authentication and permissions</a> as core Datasette concepts (<a href="https://github.com/simonw/datasette/issues/699">#699</a>). This makes it easier for different plugins can share responsibility for authenticating requests - you might have one plugin that handles user accounts and another one that allows automated access via API keys, for example.</p></blockquote>
<p>I demonstrated with <a href="https://github.com/simonw/datasette-auth-github">datasette-auth-github</a> and <a href="https://github.com/simonw/datasette-auth-existing-cookies">datasette-auth-existing-cookies</a> that authentication could exist as completely separate layer from Datasette call (thanks to the magic of ASGI middleware). But I'd started to run into limitations of this approach.</p>
<p>Crucially, I wanted to be able to support more than one kind of authentication. Users might get authenticated with cookies (via a SSO mechanism such as GitHub's) but API clients need API keys. Now the different authentication plugins need to make sure they don't accidentally intefere with each other's logic.</p>
<p>Authentication in web applications always comes down to the same thing: inspecting aspects of the incoming HTTP request (headers, cookies, querystring variables etc) and deciding if they prove that the request is coming from a specific user or API integration.</p>
<p>I decided to use the word "actor" for this, since "user or API integration" is a bit of a mess. This means that authentication can work entirely via a new plugin hook, <a href="https://datasette.readthedocs.io/en/stable/plugins.html#actor-from-request-datasette-request">actor_from_request</a>.</p>
<p>Here's a really simple implementation of that hook, copied directly from the documentation:</p>
<pre><code>from datasette import hookimpl
import secrets
SECRET_KEY = "this-is-a-secret"
@hookimpl
def actor_from_request(datasette, request):
authorization = request.headers.get("authorization") or ""
expected = "Bearer {}".format(SECRET_KEY)
if secrets.compare_digest(authorization, expected):
return {"id": "bot"}</code></pre>
<p>It returns an "actor dictionary" describing the authenticated actor. Datasette currently has no opinion at all on what shape this dictionary should take, though I expect conventions to emerge over time.</p>
<p>I have <a href="https://github.com/simonw/datasette/commit/9f236c4c00689a022fd1d508f2b809ee2305927f">a new policy</a> of never releasing a new plugin hook without also building a real-world plugin with it. For <code>actor_from_request</code> I've released <a href="https://github.com/simonw/datasette-auth-tokens">datasette-auth-tokens</a>, which lets you create secret API tokens in a configuration file and specify which actions they are allowed to perfom.</p>
<blockquote><h4>Permissions</h4><p>Datasette also now has a built-in concept of <a href="https://datasette.readthedocs.io/en/stable/authentication.html#authentication-permissions">Permissions</a>. The permissions system answers the following question:</p><blockquote><p>Is this <strong>actor</strong> allowed to perform this <strong>action</strong>, optionally against this particular <strong>resource</strong>?</p></blockquote><p>You can use the new <code>"allow"</code> block syntax in <code>metadata.json</code> (or <code>metadata.yaml</code>) to set required permissions at the instance, database, table or canned query level. For example, to restrict access to the <code>fixtures.db</code> database to the <code>"root"</code> user:</p><pre><code>{
"databases": {
"fixtures": {
"allow": {
"id" "root"
}
}
}
}</code></pre><p>See <a href="https://datasette.readthedocs.io/en/stable/authentication.html#authentication-permissions-allow">Defining permissions with "allow" blocks</a> for more details.</p><p>Plugins can implement their own custom permission checks using the new <a href="https://datasette.readthedocs.io/en/stable/plugins.html#plugin-hook-permission-allowed">permission_allowed(datasette, actor, action, resource)</a> hook.</p></blockquote>
<p>Authentication on its own isn't enough: you also need a way of deciding if an authenticated actor has permission to perform a specific action.</p>
<p>I was dreading adding permissions to Datasette. I have a long-running feud with Amazon IAM and the Google Cloud equivalent: I've been using AWS for over a decade and I still get completely lost any time I try to figure out the minimum set of permissions for something.</p>
<p>But... I want Datasette permissions to be flexible. My dream for Datasette is to nurture a growing ecosystem of plugins that can solve data collaboration and analysis problems way beyond what I've imagined myself.</p>
<p>Thanks to Datasette's plugin hooks, I think I've found a way to provide powerful plugins with minimum additional footprint to Datasette itself.</p>
<p>The key is the new <a href="https://datasette.readthedocs.io/en/stable/plugins.html#plugin-hook-permission-allowed">permission_allowed()</a> hook, which lets plugins receive an actor, action and optional resource and allows them to reply with "allow", "deny" or "I don't know, ask someone else".</p>
<p>Its partner is the <a href="https://datasette.readthedocs.io/en/stable/internals.html#await-permission-allowed-actor-action-resource-none-default-false">datasette.permission_allowed(actor, action, resource)</a> method, which plugins (and Datasette core) can call to check if the current actor is allowed to perform an action against a given resource (a specific database, table or canned query).</p>
<p>I invented a JSON/YAML based syntax for defining simple permission rules. If you want to provide access to a table to a specific user you can do so like this:</p>
<pre><code>{
"allow": {
"id": "simonw"
}
}</code></pre>
<p>Or you can grant access to any actor with a role of "staff" like so:</p>
<pre><code>{
"allow": {
"role": "staff"
}
}</code></pre>
<p>What are roles here? They're nothing at all to Datasette itself. Datasette authentication plugins can create actors of any shape, so if your plugin decides that "role" is a useful concept it can just bake it into the Datasette.</p>
<p>You can read more about <a href="https://datasette.readthedocs.io/en/stable/authentication.html#defining-permissions-with-allow-blocks">how these "allow" blocks work</a> in the documentation.</p>
<h4>/-/permissions debug tool</h4>
<p>Given my ongoing battle with opaque permission systems, I'm determined to try and make Datasette's take on permissions as transparent as possible. The new <code>/-/permissions</code> page - visible only to authenticated users with the <code>debug-permissions</code> permission - shows a rolling log of the last 200 permission checks carried out by that instance. My hope is that instance administrators and plugin authors can use this to figure out exactly what is going on.</p>
<p><img src="https://static.simonwillison.net/static/2020/Debug_permissions.png" alt="/-/permissions screenshot" style="max-width: 100%" /></p>
<h4>datasette-permissions-sql</h4>
<p><a href="https://github.com/simonw/datasette-permissions-sql">datasette-permissions-sql</a> is my new proof-of-concept plugin that puts the permission hook to use.</p>
<p>It exercises a Datasette pattern I find really interesting: using SQL queries to configure custom behaviour. I first started eploring this in <a href="https://github.com/simonw/datasette-atom">datasette-atom</a> and <a href="https://github.com/simonw/datasette-ics">datasette-ics</a>.</p>
<p>This is best illustrated by an example <code>metadata.yaml</code> file. I prefer YAML over JSON for anything that includes a SQL query because YAML has support for multi-line strings:</p>
<pre><code>databases:
mydatabase:
queries:
promote_to_staff:
sql: |-
UPDATE users
SET is is_staff=1
WHERE id=:id
write: true
plugins:
datasette-permissions-sql:
- action: view-query
resource:
- mydatabase
- promote_to_staff
sql: |-
SELECT * FROM users
WHERE is_staff = 1
AND id = :actor_id</code></pre>
<p>This block does two things. It configures a writable canned query called <code>promote_to_staff</code>. It then uses <code>datasette-permissions-sql</code> to define a permission rule that says that only authenticated actors who's <code>id</code> appears in the <code>users</code> table with <code>is_staff=1</code> are allewod to execute that canned query</p>
<p>This is the beginnings of a full user management system in just a few lines of configuration. I'm really excited about exploring this concept further.</p>
<blockquote><h4>register_routes() plugin hooks</h4><p>Plugins can now register new views and routes via the <a href="https://datasette.readthedocs.io/en/stable/plugins.html#plugin-register-routes">register_routes()</a> plugin hook (<a href="https://github.com/simonw/datasette/issues/819">#819</a>). View functions can be defined that accept any of the current <code>datasette</code> object, the current <code>request</code>, or the ASGI <code>scope</code>, <code>send</code> and <code>receive</code> objects.</p></blockquote>
<p>I thought the <a href="https://datasette.readthedocs.io/en/stable/plugins.html#asgi-wrapper-datasette">asgi_wrapper() hook</a> might be enough to allow plugins to add their own custom routes and views, but the more I worked with it the more I wanted something a bit more high level.</p>
<p>Inspired by <a href="https://docs.pytest.org/ ">pytest</a>, the view functions you can define using <code>register_routes()</code> benefit from a simple form of dependency injection. Any of the following counts as a valid view function - it will be called with the arguments it requests:</p>
<pre><code>async def hello(request):
return Response.html("Hello {}".format(
html.escape(request.args.get("name"))
))
async def page(request, datasette):
page = await datasette.get_databse().execute(
"select body from pages where id = :id", {
"id: request.url_vars["page_id"]
}
).first()
return Response.html(body)
async def asgi_hello_world(scope, receive, send):
assert scope["type"] == "http"
await send(
{
"type": "http.response.start",
"status": 200,
"headers": [
[b"content-type", b"application/json"]
],
}
)
await send({
"type": "http.response.body",
"body": b'{"hello": "world"}'
})</code></pre>
<p>Here's <a href="https://github.com/simonw/datasette/blob/b906030235efbdff536405d66078f4868ce0d3bd/datasette/utils/__init__.py#L850-L871">the code that makes this work</a> - utility functions that pass in just the arguments that match a function's signature.</p>
<p>Tucked away at the end of the 0.44 release notes is this:</p>
<blockquote><h4>The road to Datasette 1.0</h4><p>I've assembled a <a href="https://github.com/simonw/datasette/milestone/7">milestone for Datasette 1.0</a>. The focus of the 1.0 release will be the following:</p><ul><li>Signify confidence in the quality/stability of Datasette</li><li>Give plugin authors confidence that their plugins will work for the whole 1.x release cycle</li><li>Provide the same confidence to developers building against Datasette JSON APIs</li></ul><p>If you have thoughts about what you would like to see for Datasette 1.0 you can join <a href="https://github.com/simonw/datasette/issues/519">the conversation on issue #519</a>.</p></blockquote>
<p>It's time to start working towards the big 1.0!</p>