Simon Willison’s Weblog

Subscribe

Designing a write API for Datasette

9th November 2022

Building out Datasette Cloud has made one thing clear to me: Datasette needs a write API for ingesting new data into its attached SQLite databases.

I had originally thought that this could be left entirely to plugins: my datasette-insert plugin already provides a JSON API for inserting data, and other plugins like datasette-upload-csvs also implement data import functionality.

But some things deserve to live in core. An API for manipulating data is one of them, because it can hopefully open up a floodgate of opportunities for other plugins and external applications to build on top of it.

I’ve been working on this over the past two weeks, in between getting distracted by Mastodon (it’s just blogs!).

Designing the API

You can follow my progress in this tracking issue: Write API in Datasette core #1850. I’m building the new functionality in a branch (called 1.0-dev, because this is going to be one of the defining features of Datasette 1.0—and will be previewed in alphas of that release).

Here’s the functionality I’m aiming for in the first alpha:

  • API for writing new records (singular or plural) to a table
  • API for updating an existing record
  • API for deleting an existing record
  • API for creating a new table—either with an explicit schema or by inferring it from a set of provided rows
  • API for dropping a table

I have a bunch of things I plan to add later, but I think the above represents a powerful, coherent set of initial functionality.

In terms of building this, I have a secret weapon: sqlite-utils. It already has both a Python client library and a comprehensive CLI interface for inserting data and creating tables. I’ve evolved the design of those over multiple major versions, and I’m confident that they’re solid. Datasette’s write API will mostly implement the same patterns I’ve eventually settled on for sqlite-utils.

I still need to design the higher level aspects of the API though—the endpoint URLs and the JSON format that will be used.

This is still in flux, but my current design looks like this.

To insert records:

POST /database/table/-/insert
{
    "rows": [
        {"id": 1, "name": "Simon"},
        {"id": 2, "name": "Cleo"}
    ]
}

Or use "row": {...} to insert a single row.

To create a new table with an explicit schema:

POST /database/-/create
{
    "name": "people",
    "columns": [
        {
            "name": "id",
            "type": "integer"
        },
        {
            "name": "title",
            "type": "text"
        }
    ]
   "pk": "id"
}

To create a new table with a schema automatically derived from some initial rows:

POST /database/-/create
{
    "name": "my new table",
    "rows": [
        {"id": 1, "name": "Simon"},
        {"id": 2, "name": "Cleo"}
    ]
   "pk": "id"
}

To update a record:

POST /database/table/134/-/update
{
    "update": {
        "name": "New name"
    }
}

Where 134 in the URL is the primary key of the record. Datasette supports compound primary keys too, so this could be /database/docs/article,242/-/update for a table with a compound primary key.

I’m using a "update" nested object here rather than having everything at the root of the document because that frees me up to add extra future fields that control the update—"alter": true to specify that the table schema should be updated to add new columns, for example.

To delete a record:

POST /database/table/134/-/delete

I thought about using the HTTP DELETE verb here and I’m ready to be convinced that it’s a good idea, but thinking back over my career I can’t see any times where I’ve seen DELETE offered a concrete benefit over just sticking with POST for this kind of thing.

This isn’t going to be a pure REST API, and I’m OK with that.

So many details

There are so many interesting details to consider here—especially given that Datasette is designed to support ANY schema that’s possible in SQLite.

  • Should you be allowed to update the primary key of an existing record?
  • What happens if you try to insert a record that violates a foreign key constraint?
  • What happens if you try to insert a record that violates a unique constraint?
  • How should inserting binary data work, given that JSON doesn’t have a binary type?
  • What permissions should the different API endpoints require (I’m looking to add a bunch of new ones)
  • How should compound primary keys be treated?
  • Should the API return a copy of the records that were just inserted? Initially I thought yes, but it turns out to be a big impact on insert speeds, at least in SQLite versions before the RETURNING clause was added in SQLite 3.35.0 (in March 2021, so not necessarily widely available yet).
  • How should the interactive API explorer work? I’ve been building that in this issue.

I’m working through these questions in the various issues attached to my tracking issue. If you have opinions to share you’re welcome to join me there!

Token authentication

This is another area that I’ve previously left to plugins. datasette-auth-tokens adds Authorization: Bearer xxx authentication to Datasette, but if there’s a write API in core there really needs to be a default token authentication mechanism too.

I’ve implemented a default mechanism based around generating signed tokens, described in issue #1852 and described in this in-progress documentation.

The basic idea is to support tokens that are signed JSON objects (similar to JWT but not JWT, because JWT is a flawed standard—I rolled my own using itsdangerous).

The signed content of a token looks like this:

{
    "a": "user_id",
    "t": 1668022423,
    "d": 3600
}

The "a" field captures the ID of the user created that token. The token can then inherit the permissions of that user.

The "t" field shows when the token was initially created.

The "d" field is optional, and indicates after how many seconds duration the token should expire. This allows for the creation of time-limited tokens.

Tokens can be created using the new /-/create-token page or the new datasette create-token CLI command.

It’s important to note that this is not intended to be the only way tokens work in Datasette. There are plenty of applications where database-backed tokens makes more sense, since it allows tokens to be revoked individually without rotating secrets and revoking every issued token at once. I plan to implement this pattern myself for Datasette Cloud.

But I think this is a reasonable default scheme to include in Datasette core. It can even be turned off entirely using the new --setting allow_signed_tokens off option.

I’m also planning a variant of these tokens that can apply additional restrictions. Let’s say you want to issue a token that acts as your user but is only allowed to insert rows into the docs table in the primary database. You’ll be able to create a token that looks like this:

{
    "a": "simonw",
    "t": 1668022423,
    "r": {
        "t": {
            "primary: {
                "docs": ["ir"]
            }
        }
    }
}

"r" means restrictions. The "t" key indicates per-table restrictions, and the "ir" is an acronym for the insert-row permission.

I’m still fleshing out how this will work, but it feels like an important feature of any permissions system. I find it frustrating any time I’m working with a a system that doesn’t allow me to create scoped-down tokens.

Releases this week

TIL this week