Simon Willison’s Weblog

Subscribe

Datasette 0.64, with a warning about SpatiaLite

9th January 2023

I release Datasette 0.64 this morning. This release is mainly a response to the realization that it’s not safe to run Datasette with the SpatiaLite extension loaded if that Datasette instance is configured to enable arbitrary SQL queries from untrusted users.

Here are the release notes quoted in full:

The problem with SpatiaLite

Datasette allows arbitrary SQL execution as a core feature. It takes a bunch of steps to provide this safely: database connections are opened in read-only mode, it imposes a strict time limit on SQL queries and Datasette is designed to be run in containers for a further layer of protection.

SQLite itself is an excellent platform for this feature: it has a set of default functionality that supports this well, protected by a legendarily thorough test suite.

SpatiaLite is a long running third-party extension for SQLite that adds a bewildering array of additional functionality to SQLite—much of it around GIS, but with a whole host of extras as well. It includes debugging routines, XML parsers and even it’s own implementation of stored procedures!

Unfortunately, not all of this functionality is safe to expose to untrusted queries—even for databases that have been opened in read-only mode.

After identifying functions which could crash the Datasette instance, I decided that Datasette should make a strong recommendation not to expose SpatiaLite in an unprotected manner.

In addition to the new documentation, I also added a feature I’ve been planning for a while: a simple setting for disabling arbitrary SQL queries entirely:

datasette --setting default_allow_sql off

Prior to 0.64 you could achieve the same thing by adding the following line to your metadata.json file:

{
    "allow_sql": false
}

Or in metadata.yml:

allow_sql: false

The new setting achieves the same thing, but is more obvious and can be easily applied even for Datasette instances that don’t use metadata.

A new SpatiaLite tutorial

The documentation now recommends running SpatiaLite instances with pre-approved SQL implemented using Datasette’s canned queries feature.

To help clarify how this works, I decided to publish a new entry in the official series of Datasette tutorials:

Building a location to time zone API with SpatiaLite

This is an updated version of a tutorial I first wrote back in 2017.

The new tutorial now includes material on Chris Amico’s datasette-geojson-map plugin, SpatiaLite point-in-polygon queries, polygon intersection queries, spatial indexes and how to use the simplify() function to reduce huge polygons down to a size that is more practical to display on a map.

I’m really happy with this new tutorial. Not only does it show a safe way to run SpatiaLite, but it also illustrates a powerful pattern for using Datasette to create and deploy custom APIs.

The resulting API can be accessed here:

https://timezones.datasette.io/timezones

It’s hosted on Fly, using their $1.94/month instance size with 256MB of RAM—easily powerful enough to host this class of application.

I also updated the datasette-publish-fly plugin to make it easier to deploy instances with SQL execution disabled, see the 1.3 release notes.