Simon Willison’s Weblog

On githubactions, cloudrun, datasettecloud, covid19, dogsheep, ...

 

Recent entries

Datasette 0.45: The annotated release notes two days ago

Datasette 0.45, out today, features magic parameters for canned queries, a log out feature, improved plugin documentation and four new plugin hooks.

As I did for Datasette 0.44, I’m going to provide an annotated version of the full release notes here on my blog.

Magic parameters for canned queries

Canned queries now support Magic parameters, which can be used to insert or select automatically generated values. For example:

insert into logs
  (user_id, timestamp)
values
  (:_actor_id, :_now_datetime_utc)

This inserts the currently authenticated actor ID and the current datetime. (#842)

This is a fun new feature that extends the capabilities of writable canned queries, introduced in Datasette 0.44.

The key idea here is to make it easy to insert contextual information such as the current timestamp, the authenticated actor or other automatically generated values as part of a writable query.

This means Datasette’s canned queries are now powerful enough to build things like simple comment systems or logging endpoints purely by defining a SQL query with the right magic parameters.

There’s even a :_random_chars_32 parameter that automatically generates a random text string—useful for things like generating authentication tokens for use with datasette-auth-tokens. More on this below.

Log out

The ds_actor cookie can be used by plugins (or by Datasette’s --root mechanism) to authenticate users. The new /-/logout page provides a way to clear that cookie.

A “Log out” button now shows in the global navigation provided the user is authenticated using the ds_actor cookie. (#840)

Out of the box, Datasette’s authentication system is quite primitive: the only way to get an authenticated session is to use the --root option to get a special link when the server first starts running. As described in the documentation, the goal is for plugins to fill in the rest.

Even with just that mechanism it still makes sense to let people log out again! The new /-/logout page can do that, and Datasette’s navigation now includes a log out button if the user is logged in using that ds_actor cookie.

You can see what this looks like in Datasette’s pattern portfolio.

New plugin hooks

register_magic_parameters(datasette) can be used to define new types of magic canned query parameters.

I’m increasingly trying to have Datasette internally use plugin hooks for default behaviour. This hook can define custom magic parameters—you can see the implementation of the default parameters using this hook in default_magic_parameters.py.

startup(datasette) can run custom code when Datasette first starts up. datasette-init is a new plugin that uses this hook to create database tables and views on startup if they have not yet been created. (#834)

Here’s an example datasette-init plugin configuration in metadata.yaml. This will create a dogs table when the server starts, but only if one has not yet been created:

plugins:
  datasette-init:
    my_database:
      tables:
        dogs:
          columns:
            id: integer
            name: text
            age: integer
            weight: float
          pk: id

canned_queries(datasette, database, actor) lets plugins provide additional canned queries beyond those defined in Datasette’s metadata. See datasette-saved-queries for an example of this hook in action. (#852)

This started out as a feature request from Amjith Ramanujam on Twitter.

Canned queries, like these ones, are usually defined in the increasingly poorly-named metadata.json/yaml.

Letting plugins define them opens up some neat possibilities. datasette-saved-queries is an interesting example: it lets users store new queries in their database, inserting them using a writable canned query that the plugin itself returns from that hook by default.

Here’s the code. It also uses the new startup() hook to create its own table.

forbidden(datasette, request, message) is a hook for customizing how Datasette responds to 403 forbidden errors. (#812)

I need this for the next version of datasette-auth-github—it’s a way to customize what happens when a user fails a permission check.

Even more plugins

Thanks to the datasette-plugin cookiecutter template I can turn out simple plugins in just a few minutes. Here are my new releases from the past week:

  • datasette-init, described above.
  • datasette-write provides a /-/write page that can be used to directly execute write SQL queries against a selected database.
  • datasette-allow-permissions-debug which is absolutely tiny. All it does is listen for permissions-debug checks and return True for them. This means you can access the /-/permissions debug page on your Datasette instance without authenticating first, which is handy for debugging.
  • datasette-glitch is designed for use with Glitch. It outputs a magic one-time use URL to the private Glitch console which you can use to authenticate with your Datasette instance there as the root user.
  • I also released a new version of datasette-auth-tokens, which allows users to configure API tokens to be used to access a private Datasette instance. It now lets you store tokens in a database table based on a configured SQL query. The :_random_chars_32 magic parameter mentioned above can be used to help create new tokens.

What’s next?

I’ve already slipped one feature into the Datasette 0.46 milestone, but my focus from here on should really be on getting everything in place for Datasette 1.0.

Weeknotes: cookiecutter templates, better plugin documentation, sqlite-generate eight days ago

I spent this week spreading myself between a bunch of smaller projects, and finally getting familiar with cookiecutter. I wrote about my datasette-plugin cookiecutter template earlier in the week; here’s what else I’ve been working on.

sqlite-generate

Datasette is supposed to work against any SQLite database you throw at it, no matter how weird the schema or how unwieldy the database shape or size.

I built a new tool called sqlite-generate this week to help me create databases of different shapes. It’s a Python command-line tool which uses Faker to populate a new database with random data. You run it something like this:

sqlite-generate demo.db \
    --tables=20 \
    --rows=100,500 \
    --columns=5,20 \
    --fks=0,3 \
    --pks=0,2 \
    --fts

This command creates a database containing 20 tables, each with between 100 and 500 rows and 5-20 columns. Each table will also have between 0 and 3 foreign key columns to other tables, and will feature between 0 and 2 primary key columns. SQLite full-text search will be configured against all of the text columns in the table.

I always try to include a live demo with any of my projects, and sqlite-generate is no exception. This GitHub Action runs on every push to main and deploys a demo to https://sqlite-generate-demo.datasette.io/ showing the latest version of the code in action.

The demo runs my datasette-search-all plugin in order to more easily demonstrate full-text search across all of the text columns in the generated tables. Try searching for newspaper.

click-app cookiecutter template

I write quite a lot of Click powered command-line tools like this one, so inspired by datasette-plugin I created a new click-app cookiecutter template that bakes in my own preferences about how to set up a new Click project (complete with GitHub Actions). sqlite-generate is the first tool I’ve built using that template.

Improved Datasette plugin documentation

I’ve split Datasette’s plugin documentation into five separate pages, and added a new page to the documentation about patterns for testing plugins.

The five pages are:

  • Plugins describing how to install and configure plugins
  • Writing plugins showing how to write one-off plugins, how to use the datasette-plugin cookiecutter template and how to package templates for release to PyPI
  • Plugin hooks documenting all of the available plugin hooks
  • Testing plugins describing my preferred patterns for writing tests for them (using pytest and HTTPX)
  • Internals for plugins describing the APIs Datasette makes available for use within plugin hook implementations

There’s also a list of available plugins on the Datasette Ecosystem page of the documentation, though I plan to move those to a separate plugin directory in the future.

datasette-block-robots

The datasette-plugin template practically eliminates the friction involved in starting a new plugin.

sqlite-generate generates random names for people. I don’t particularly want people who search for their own names stumbling across the live demo and being weirded out by their name featured there, so I decided to block it from search engine crawlers using robots.txt.

I wrote a tiny plugin to do this: datasette-block-robots, which uses the new register_routes() plugin hook to add a /robots.txt page.

It’s also a neat example of the simplest possible plugin to use that feature—along with the simplest possible unit test for exercising such a page.

datasette-saved-queries

Another new plugin, this time with a bit more substance to it. datasette-saved-queries exercises the new canned_queries() hook I described last week. It uses the new startup() hook to create tables on startup (if they are missing), then lets users insert records into those tables to save their own queries. Queries saved in this way are then returned as canned queries for that particular database.

main, not master

main is a better name for the main GitHub branch than master, which has unpleasant connotations (it apparently derives from master/slave in BitKeeper). My datasette-plugin and click-app cookiecutter templates both include instructions for renaming master to main in their READMEs—it’s as easy as running git branch -m master main before running your first push to GitHub.

I’m working towards making the switch for Datasette itself.

A cookiecutter template for writing Datasette plugins 13 days ago

Datasette’s plugin system is one of the most interesting parts of the entire project. As I explained to Matt Asay in this interview, the great thing about plugins is that Datasette can gain new functionality overnight without me even having to review a pull request. I just need to get more people to write them!

datasette-plugin is my most recent effort to help make that as easy as possible. It’s a cookiecutter template that sets up the outline of a new plugin, combining various best patterns I’ve discovered over the past two years of writing my own plugins.

Once you’ve installed cookiecutter you can start building a new plugin by running:

cookiecutter gh:simonw/datasette-plugin

Cookiecutter will run a quick interactive session asking for a few details. It will then use those details to generate a new directory structure ready for you to start work on the plugin.

The datasette-plugin README describes the next steps. A couple of things are worth exploring in more detail.

Writing tests for plugins

I’m a big believer in automated testing: every single one of my plugins includes tests, and those test are run against every commit and must pass before new packages are shipped to PyPI.

In my experience the hardest part of writing tests is getting them started: setting up an initial test harness and ensuring that new tests can be easily written.

datasette-plugin adds pytest as a testing dependency and creates a tests/ folder with an initial, passing unit test in it.

The test confirms that the new plugin has been correctly installed, by running a request through a configured Datasette instance and hitting the /-/plugins.json introspection endpoint.

In doing so, it demonstrates how to run tests that interact with Datasette’s HTTP API. This is a very productive way to write tests.

The example test uses the HTTPX Python library. HTTPX offers a requests-style API but with a couple of crucial improvements. Firstly, it’s been built with asyncio support as a top-level concern. Secondly, it understands the ASGI protocol and can be run directly against an ASGI Python interface without needing to spin up an actual HTTP server. Since Datasette speaks ASGI this makes it the ideal tool for testing Datasette plugins.

Here’s that first test that gets created by the cookiecutter template:

from datasette.app import Datasette
import pytest
import httpx

@pytest.mark.asyncio
async def test_plugin_is_installed():
    app = Datasette([], memory=True).app()
    async with httpx.AsyncClient(app=app) as client:
        response = await client.get(
            "http://localhost/-/plugins.json"
        )
        assert 200 == response.status_code
        installed_plugins = {
            p["name"] for p in response.json()
        }
        assert "datasette-plugin-template-demo" in installed_plugins

My hope is that including a passing test that demonstrates how to execute test requests will make it much easier for plugin authors to start building out their own custom test suite.

Continuous integration with GitHub Actions

My favourite thing about GitHub Actions is that they’re enabled on every GitHub repository for free, without any extra configuration necessary.

The datasette-plugin template takes advantage of this. Not only does every new project get a passing test—it also gets a GitHub Action—in .github/workflows/test.yml—that executes the tests on every commit.

It even runs the test suite in parallel against Python 3.6, 3.7 and 3.8—the versions currently supported by Datasette itself.

A second action in .github/workflows/publish.yml bakes in my opinions on the best way to manage plugin releases: it builds and ships a new package to PyPI every time a new tag (and corresponding GitHub release) is added to the repository.

For this to work you’ll need to create a PyPI API token and add it to your plugin’s GitHub repository as a PYPI_TOKEN secret. This is explained in the README.

Deploying a live demo of the template with GitHub Actions

Whenever possible, I like to ship my projects with live demos. The Datasette repository publishes a demo of the latest commit to https://latest.datasette.io/ on every commit. I try to do the same for my plugins, where it makes sense to do so.

What could a live demo of a cookiecutter template look like?

Ideally it would show a complete, generated project. I love GitHub’s code browsing interface, so a separate repository containing that generated project would be ideal.

So that’s what https://github.com/simonw/datasette-plugin-template-demo is: it’s a repository showing the most recent output of the latest version of the cookiecutter template that lives in https://github.com/simonw/datasette-plugin.

It’s powered by this GitHub Action, which runs on every push to the datasette-plugin repo, installs cookiecutter, uses cookiecutter against some fixed inputs to re-generate the project and then pushes the results up to datasette-plugin-template-demo as a new commit.

As a fun final touch, it uses the GitHub commit comments API to add a comment to the commit to datasette-plugin linking to the “browse” view on the resulting code in the datasette-plugin-template-demo repository. Here’s one of those commit comments.

Figuring out how to build this took quite a bit of work. Issue #4 has a blow-by-blow rundown of how I got it working.

I couldn’t resist tweeting about it:

Weeknotes: Datasette alphas for testing new plugin hooks 15 days ago

A relatively quiet week this week, compared to last week’s massive push to ship Datasette 0.44 with authentication, permissions and writable canned queries. I can now ship alpha releases, such as today’s Datasette 0.45a1, which means I can preview new plugin features before they are completely ready and stable.

Datasette alphas and betas

I want to be able to iterate on plugin hooks more effectively, which means embracing release early, release often. I also want plugin authors to be able to trust Datasette not to break their work—a major focus for Datasette 1.0.

Releasing alpha preview versions can really help here. I have two plugin hooks in the pipeline for Datasette 0.45 already: startup and canned_queries. These are still under development but are now available to preview in Datasette 0.45a1. Install it like so:

pip install datasette==0.45a1

Please post any feedback on the design of these hooks to the startup hook or canned_queries hook issue threads.

Another compelling reason to ship alphas: it means I can release alpha versions of my own plugins that themselves depend on a Datasette alpha, and still have their unit tests pass in continuous integration. I expect to take advantage of that ability a lot in the future.

Figuring out how to safely ship an alpha took a little bit of work, because I wanted to make sure that alphas wouldn’t become the default version installed from PyPI, mess up my existing CI configuration or end up accidentally published to Docker Hub. You can follow my research on this in this issue, including my experiments shipping an alpha for datasette-render-images first.

Those new plugin hooks

startup() (documentation) is a hook that gets called on Datasette server startup, and passed the datasette object. The initial use-case was plugins that might want to validate their own configuration, but I imagine other interesting uses for it will emerge over time.

canned_queries() (documentation) lets plugin authors dynamically generate new canned queries for a given database. It gets passed the datasette object, the current database name and the currently authenticated actor, if there is one. This means you can do fun things like include a user’s own saved queries loaded from another database table:

from datasette import hookimpl

@hookimpl
def canned_queries(datasette, database, actor):
    async def inner():
        db = datasette.get_database(database)
        if actor is not None and await db.table_exists("saved_queries"):
            results = await db.execute(
                "select name, sql from saved_queries where actor_id = :id", {
                    "id": actor["id"]
                }
            )
            return {result["name"]: {
                "sql": result["sql"]
            } for result in results}
    return inner

I’m using a pattern here that’s shared by a number of other Datasette plugin hooks: rather than returning the results directly, this plugin function returns an async def inner() function.

The code that calls the hook knows that if an asyncio awaitable function is returned it should await it. This is my trick for using awaitable functions with Pluggy, which wasn’t initially built with async in mind.

Shooting naturalist videos with Natalie

Natalie has started a new YouTube channel to put her various science communication courses at Stanford into action. I’ve been helping out as camera-person, which has been really interesting. I’m currently shooting with FiLMiC running on an iPhone 11 Pro on a tripod, using audio from an AirPod (until we can get our hands on something better).

Natalie’s been editing the videos on her iPhone and these early results are really good! Here’s the video we shot for Sea Lion Birthday on 15th June, a day when 50% of all California Sea Lions celebrate their birthday. Watch the video to find out why.

The close-up footage of the sea lions was shot by Natalie on a Canon DSLR with a 100-400mm lens. I love that lens so much for wildlife photography.

TIL this week

Just one new TIL this week but it’s a good one: Using LD_PRELOAD to run any version of SQLite with Python. I’ve been wanting to figure out a good way to replace the SQLite version used by the Python standard library for ages—pysqlite3 helps a lot here, but I also need the ability to run arbitrary older versions to help respond to bug reports. The LD_PRELOAD trick works perfectly for that.

How much can you learn from just two columns? 18 days ago

Derek Willis shared an intriguing dataset this morning: a table showing every Twitter account followed by an official GOP congressional Twitter account.

He published it here using Datasette. It’s a single table containing 385,979 rows—each row is a username, account_name pair, where username is the Twitter account that is being followed and account_name is the congressional Twitter account that’s following it.

Here’s some sample data:

username
njhotline
emilykpierce
jessblevinsoh
familylink
howardsnowdon
pattidomm

How much can we learn from just these two columns?

Which accounts have the most GOP congressional followers?

Let’s start with a simple aggregation: which accounts on Twitter have the most GOP congressional followers?

select
  username,
  count(*) as num_gop_followers
from
  following
group by
  username
order by
  num_gop_followers desc

All we’re doing here is counting the number of times a unique username (an account that is being followed) shows up in our table, then sorting by those counts.

Here are the result. The top ten are:

usernamenum_gop_followers
housegop 231
gopleader 229
realdonaldtrump 219
vp 216
speakerryan 207
whitehouse 207
stevescalise 198
chadpergram 195
potus 195
foxnews 187

Adding a “view more” link

Wouldn’t it be useful if you could see which accounts those 231 followers of @housegop were?

We can do that in Datasette without a SQL query—we can instead use the form on the table page to construct a filter—or construct a querystring URL directly. Here are the 5 GOP congressional accounts following @cityofdallas:

https://official-gop-following.herokuapp.com/following/following?username=cityofdallas

Let’s add that link to our original top-followed query. Datasette automatically links any value that begins with https://, so we can use SQL concatenation trick (with the || concatenation operator) to construct that URL as part of the query:


select
  username,
  count(*) as num_gop_followers,
  'https://official-gop-following.herokuapp.com/following/following?username=' || username as list_of_gop_followers
from
  following
group by
  username
order by
  num_gop_followers desc

Here’s that query. The first five rows look like this:

usernamenum_gop_followerslist_of_gop_followers
housegop 231 https://official-gop-following.herokuapp.com/following/following?username=housegop
gopleader 229 https://official-gop-following.herokuapp.com/following/following?username=gopleader
realdonaldtrump 219 https://official-gop-following.herokuapp.com/following/following?username=realdonaldtrump
vp 216 https://official-gop-following.herokuapp.com/following/following?username=vp
speakerryan 207 https://official-gop-following.herokuapp.com/following/following?username=speakerryan

Congressional accounts who aren’t following certain accounts

Since there are only 279 congressional GOP Twitter accounts, how about seeing who are the 279—219 = 60 accounts that aren’t following @realdonaldtrump?

Let’s construct a SQL query for this, using a sub-select:

select
  distinct account_name
from
  following
where
  account_name not in (
    select
      account_name
    from
      following
    where
      username = 'realdonaldtrump'
  )

Here that is in Datasette.

A neat thing we can do here is to parametrize that query. We can swap the hard-coded 'realdonaldtrump' value for a named parameter, :name, instead:

select
    distinct account_name
  from
    following
  where
    account_name not in (
      select
        account_name
      from
        following
      where
        username = :name
    )

Now when we visit that in Datasette it looks like this:

Screenshot of the SQL for in Datasette, showing a name input value

We can add ?name=realdonaldtrump to the URL (or submit the form and save the resulting URL) to link to results for one individual.

We’ve essentially created a new mini-application here—complete with an input form—just by bookmarking a URL in Datasette.

Let’s make the query a bit more interesting by including a count of the number of accounts those congress-people ARE following, and sorting by that.

select
  account_name,
  count(*) as num_accounts_they_follow
from
  following
where
  account_name not in (
    select
      account_name
    from
      following
    where
      username = 'realdonaldtrump'
  )
group by
  account_name
order by
  num_accounts_they_follow desc

Here are the results.

num_accounts_they_follow
13475
8560
5793
4423
3846

@ChuckGrassley follows 13,475 accounts but none of them are the president!

Most similar accounts, based on number of shared follows

One last query. This time we’re going to look at which accounts are “most similar” to each other, based on the largest overlap of follows. Here’s the SQL for that:

select
  :name as representative,
  account_name as similar_representative,
  count(*) as num_shared_follows
from
  following
where
  username in (
    select
      username
    from
      following
    where
      account_name = :name
  )
  and account_name != :name
group by
  account_name
order by
  num_shared_follows desc

Again, we’re using a :name placeholder. Here are the congressional accounts that are most similar to @MikeKellyPA.

What else can you do?

I’m pretty impressed at how much insight can be gained using SQL against just a two column table.

This post started as a Twitter thread. Charles Arthur suggested cross-referencing this against other sources such as the GovTrack ideology analysis of congressional candidates. This is a great idea! It’s also very feasible, given that much of the data underlying GovTrack is available on GitHub. Import that into Datasette alongside Derek’s follower data and you could construct some very interesting SQL joins indeed.

Datasette 0.44: The annotated release notes 22 days ago

I just released Datasette 0.44 to PyPI. With 128 commits since 0.43 this is the biggest release in a long time—and likely the last major release of new features before Datasette 1.0.

You can read the full release notes here, but I’ve decided to try something a little different for this release and write up some annotations here on my blog.

Writable canned queries

Datasette’s Canned queries feature lets you define SQL queries in metadata.json which can then be executed by users visiting a specific URL. https://latest.datasette.io/fixtures/neighborhood_search for example.

Canned queries were previously restricted to SELECT, but Datasette 0.44 introduces the ability for canned queries to execute INSERT or UPDATE queries as well, using the new "write": true property (#800)

I originally intended this to be the main feature in the release.

Datastte 0.37 added the ability for plugins to write to the database. This marked a pretty huge philosophical shift for Datasette: from a read-only publishing system to a framework for building interactive SQL-driven applications. But you needed a plugin, such as my datasette-upload-csvs.

I realized back in March that canned queries could provide a simple, sensible way to start adding write functionality to Datasette core. A query could be configured like this:

{
  "databases": {
    "my-database": {
      "queries": {
        "add_twitter_handle": {
          "sql": "insert into twitter_handles (username) values (:username)",
          "write": true
        }
      }
    }
  }
}

And Datasette could then provide a form interface at /my-database/add_twitter_handle for executing that query. How hard could that be to implement?

The problem with “simple” features like this is that they open up a cascade of other features.

If you’re going to have write queries, you probably need to restrict who can execute them—which means authentication and permissions. If forms are performing POSTs you need CSRF protection. If users are changing state you need a way to show them messages telling them what happened.

So let’s talk about authentication and permissions.

Authentication

Prior to this release the Datasette ecosystem has treated authentication as exclusively the realm of plugins, most notably through datasette-auth-github.

0.44 introduces Authentication and permissions as core Datasette concepts (#699). This makes it easier for different plugins can share responsibility for authenticating requests—you might have one plugin that handles user accounts and another one that allows automated access via API keys, for example.

I demonstrated with datasette-auth-github and datasette-auth-existing-cookies that authentication could exist as completely separate layer from Datasette call (thanks to the magic of ASGI middleware). But I’d started to run into limitations of this approach.

Crucially, I wanted to be able to support more than one kind of authentication. Users might get authenticated with cookies (via a SSO mechanism such as GitHub’s) but API clients need API keys. Now the different authentication plugins need to make sure they don’t accidentally intefere with each other’s logic.

Authentication in web applications always comes down to the same thing: inspecting aspects of the incoming HTTP request (headers, cookies, querystring variables etc) and deciding if they prove that the request is coming from a specific user or API integration.

I decided to use the word “actor” for this, since “user or API integration” is a bit of a mess. This means that authentication can work entirely via a new plugin hook, actorfromrequest.

Here’s a really simple implementation of that hook, copied directly from the documentation:

from datasette import hookimpl
import secrets

SECRET_KEY = "this-is-a-secret"

@hookimpl
def actor_from_request(datasette, request):
    authorization = request.headers.get("authorization") or ""
    expected = "Bearer {}".format(SECRET_KEY)

    if secrets.compare_digest(authorization, expected):
        return {"id": "bot"}

It returns an “actor dictionary” describing the authenticated actor. Datasette currently has no opinion at all on what shape this dictionary should take, though I expect conventions to emerge over time.

I have a new policy of never releasing a new plugin hook without also building a real-world plugin with it. For actor_from_request I’ve released datasette-auth-tokens, which lets you create secret API tokens in a configuration file and specify which actions they are allowed to perfom.

Permissions

Datasette also now has a built-in concept of Permissions. The permissions system answers the following question:

Is this actor allowed to perform this action, optionally against this particular resource?

You can use the new "allow" block syntax in metadata.json (or metadata.yaml) to set required permissions at the instance, database, table or canned query level. For example, to restrict access to the fixtures.db database to the "root" user:

{
    "databases": {
        "fixtures": {
            "allow": {
                "id" "root"
            }
        }
    }
}

See Defining permissions with “allow” blocks for more details.

Plugins can implement their own custom permission checks using the new permission_allowed(datasette, actor, action, resource) hook.

Authentication on its own isn’t enough: you also need a way of deciding if an authenticated actor has permission to perform a specific action.

I was dreading adding permissions to Datasette. I have a long-running feud with Amazon IAM and the Google Cloud equivalent: I’ve been using AWS for over a decade and I still get completely lost any time I try to figure out the minimum set of permissions for something.

But... I want Datasette permissions to be flexible. My dream for Datasette is to nurture a growing ecosystem of plugins that can solve data collaboration and analysis problems way beyond what I’ve imagined myself.

Thanks to Datasette’s plugin hooks, I think I’ve found a way to provide powerful plugins with minimum additional footprint to Datasette itself.

The key is the new permission_allowed() hook, which lets plugins receive an actor, action and optional resource and allows them to reply with “allow”, “deny” or “I don’t know, ask someone else”.

Its partner is the datasette.permission_allowed(actor, action, resource) method, which plugins (and Datasette core) can call to check if the current actor is allowed to perform an action against a given resource (a specific database, table or canned query).

I invented a JSON/YAML based syntax for defining simple permission rules. If you want to provide access to a table to a specific user you can do so like this:

{
    "allow": {
        "id": "simonw"
    }
}

Or you can grant access to any actor with a role of “staff” like so:

{
    "allow": {
        "role": "staff"
    }
}

What are roles here? They’re nothing at all to Datasette itself. Datasette authentication plugins can create actors of any shape, so if your plugin decides that “role” is a useful concept it can just bake it into the Datasette.

You can read more about how these “allow” blocks work in the documentation.

/-/permissions debug tool

Given my ongoing battle with opaque permission systems, I’m determined to try and make Datasette’s take on permissions as transparent as possible. The new /-/permissions page—visible only to authenticated users with the debug-permissions permission—shows a rolling log of the last 200 permission checks carried out by that instance. My hope is that instance administrators and plugin authors can use this to figure out exactly what is going on.

/-/permissions screenshot

datasette-permissions-sql

datasette-permissions-sql is my new proof-of-concept plugin that puts the permission hook to use.

It exercises a Datasette pattern I find really interesting: using SQL queries to configure custom behaviour. I first started eploring this in datasette-atom and datasette-ics.

This is best illustrated by an example metadata.yaml file. I prefer YAML over JSON for anything that includes a SQL query because YAML has support for multi-line strings:

databases:
  mydatabase:
    queries:
      promote_to_staff:
        sql: |-
          UPDATE users
          SET is is_staff=1
          WHERE id=:id
        write: true
plugins:
  datasette-permissions-sql:
  - action: view-query
    resource:
    - mydatabase
    - promote_to_staff
    sql: |-
      SELECT * FROM users
      WHERE is_staff = 1
      AND id = :actor_id

This block does two things. It configures a writable canned query called promote_to_staff. It then uses datasette-permissions-sql to define a permission rule that says that only authenticated actors who’s id appears in the users table with is_staff=1 are allewod to execute that canned query

This is the beginnings of a full user management system in just a few lines of configuration. I’m really excited about exploring this concept further.

register_routes() plugin hooks

Plugins can now register new views and routes via the register_routes() plugin hook (#819). View functions can be defined that accept any of the current datasette object, the current request, or the ASGI scope, send and receive objects.

I thought the asgi_wrapper() hook might be enough to allow plugins to add their own custom routes and views, but the more I worked with it the more I wanted something a bit more high level.

Inspired by pytest, the view functions you can define using register_routes() benefit from a simple form of dependency injection. Any of the following counts as a valid view function—it will be called with the arguments it requests:

async def hello(request):
    return Response.html("Hello {}".format(
        html.escape(request.args.get("name"))
    ))

async def page(request, datasette):
    page = await datasette.get_databse().execute(
        "select body from pages where id = :id", {
            "id: request.url_vars["page_id"]
        }
    ).first()
    return Response.html(body)

async def asgi_hello_world(scope, receive, send):
    assert scope["type"] == "http"
    await send(
        {
            "type": "http.response.start",
            "status": 200,
            "headers": [
                [b"content-type", b"application/json"]
            ],
        }
    )
    await send({
        "type": "http.response.body",
        "body": b'{"hello": "world"}'
    })

Here’s the code that makes this work—utility functions that pass in just the arguments that match a function’s signature.

Tucked away at the end of the 0.44 release notes is this:

The road to Datasette 1.0

I’ve assembled a milestone for Datasette 1.0. The focus of the 1.0 release will be the following:

  • Signify confidence in the quality/stability of Datasette
  • Give plugin authors confidence that their plugins will work for the whole 1.x release cycle
  • Provide the same confidence to developers building against Datasette JSON APIs

If you have thoughts about what you would like to see for Datasette 1.0 you can join the conversation on issue #519.

It’s time to start working towards the big 1.0!

Elsewhere

Yesterday

  • How to find what you want in the Django documentation (via) Useful guide by Matthew Segal to navigating the Django documentation, and tips for reading documentation in general. The Django docs have a great reputation so it’s easy to forget how intimidating they can be for newcomers: Matthew emphasizes that docs are rarely meant to be read in full: the trick is learning how to quickly search them for the things you need to understand right now. #

2nd July 2020

  • Better Python Decorators with wrapt (via) Adam Johnson explains the intricacies of decorating a Python function without breaking the ability to correctly introspect it, and dicsusses how Scout use the wrapt library by Graham Dumpleton to implement their instrumentation library. #

1st July 2020

29th June 2020

  • Unlocking value with durable teams (via) Anna Shipman describes the FT’s experience switching from project-based teams to “durable” teams—teams which own a specific area of the product. Lots of really smart organizational design thinking in this. I’ve seen how much of a difference it makes to have every inch of a complex system “owned” by a specific team. I also like how Anna uses the term “technical estate” to describe the entirety of the FT’s systems. #
  • Data Science is a lot like Harry Potter, except there’s no magic, it’s just math, and instead of a sorting hat you just sort the data with a Python script.

    GPT-3, shepherded by Max Woolf #

28th June 2020

  • Here’s a common piece of advice from people who create things: to make better things, make more things. Not only does it give you constant practice at making things, but it gives you more chances at lucking into making a good thing.

    Ned Batchelder #

26th June 2020

  • Reducing search indexing latency to one second. Really detailed dive into the nuts and bolts of Twitter’s latest iteration of search indexing technology, including a great explanation of skip lists. #
  • How CDNs Generate Certificates. Thomas Ptacek (now at Fly) describes in intricate detail the challenges faced by large-scale hosting providers that want to securely issue LetsEncrypt certificates for customer domains. Lots of detail here on the different ACME challenges supported by LetsEncrypt and why the new tls-alpn-01 challenge is the right option for operating at scale. #

24th June 2020

  • If you have to choose between engineering and ML, choose engineering. It’s easier for great engineers to pick up ML knowledge, but it’s a lot harder for ML experts to become great engineers.

    Chip Huyen #

23rd June 2020

  • datasette-block-robots. Another little Datasette plugin: this one adds a /robots.txt page with “Disallow: /” to block all indexing of a Datasette instance from respectable search engine crawlers. I built this in less than ten minutes from idea to deploy to PyPI thanks to the datasette-plugin cookiecutter template. #
  • click-app. While working on sqlite-generate today I built a cookiecutter template for building the skeleton for Click command-line utilities. It’s based on datasette-plugin so it automatically sets up GitHub Actions for running tests and deploying packages to PyPI. #
  • sqlite-generate (via) I wrote this tool today to generate arbitrarily large SQLite databases, for testing purposes. You tell it how many tables, columns and rows you want and it will use the Faker Python library to generate random data and populate the tables with it. #

20th June 2020

  • Without touching upon the question of who’s right and who’s wrong in the specific case of Basecamp’s Hey app, or the broader questions of what, if anything, ought to change in Apple’s App Store policies, an undeniable and important undercurrent to this story is that the business model policies of the App Store have resulted in a tremendous amount of resentment. This spans the entire gamut from one-person indies all the way up to the handful of large corporations that can be considered Apple’s peers or near-peers.

    John Gruber #

18th June 2020