Simon Willison’s Weblog

Subscribe

Weeknotes: datasette-copyable, datasette-insert-api

23rd July 2020

Two new Datasette plugins this week: datasette-copyable, helping users copy-and-paste data from Datasette into other places, and datasette-insert-api, providing a JSON API for inserting and updating data and creating tables.

datasette-copyable

Datasette has long provided CSV and JSON export, but when you’re working with smaller amounts of data it’s often most convenient to move that data around using copy and paste.

datasette-copyable uses Datasette’s register_output_renderer() plugin hook to add a whole plethora of new export options, powered by the excellent tabulate Python library (which I’m also using in sqlite-utils).

The default output format is TSV, because if you copy tab-separated values to your clipboard and then hit paste on a cell in Google Sheets, Excel or Numbers the data will be imported directly into your spreadsheet, preserving the columns and rows!

The other supported formats are everything that tabulate supports—LaTeX, GitHub Markdown, MediaWiki, JIRA and more.

You can try the plugin out on some New York Times Covid-19 data (for Harris county, Texas) over here.

Copy as tsv

Clicking the “copy to clipboard” button selects the content in the textarea and copies it to your clipboard. I wrote about how this works in a TIL.

datasette-insert-api

Ever since Datasette plugins gained the ability to write to the database back in February I’ve been looking forward to building something that really put it through its paces.

datasette-insert-api is a relatively thin wrapper around my sqlite-utils Python library which unlocks a powerful set of functionality:

  • POST a JSON array to Datasette to insert those objects as rows in a database table
  • If the table does not yet exist, Datasette can create it with the corresponding columns
  • If the table exists and has a primary key, records with matching primary keys can replace existing rows (a simple form of update)
  • If unknown columns are present in the data, the plugin can optionally alter the table to add the necessary columns

The README has the details, but the simplest example of it in action (using curl) looks like this:

curl --request POST \
  --data '[
    {
        "id": 1,
        "name": "Cleopaws",
        "age": 5
    },
    {
        "id": 2,
        "name": "Pancakes",
        "age": 4
    }
    ]' \
  'http://localhost:8001/-/insert/data/dogs?pk=id'

The plugin works without authentication—useful for piping JSON into a database on your own laptop—or you can couple it with a plugin such as datasette-auth-tokens to provide authenticated access over the public internet.

Speaking at Boston Python

I gave a talk about Datasette and Dogsheep for Boston Python on Wednesday. Ned turned around the video really quickly—I start talking at 8 minutes and 52 seconds in.

I used a collaborative Google Doc for the Q&A which worked really well, as the questions accumulated while I was presenting and I then got to address them at the end. I spent some time after the talk fleshing out my answers with extra links—here’s the resulting document.

Upgrading this blog to Django 3.1

The first Django 3.1 release candidate came out earlier this week, and I’ve upgraded my blog to this (a one-line change I made through the GitHub web interface thanks to Heroku continuous deployment).

I also upgraded my Heroku PostgreSQL database to v11 so I could take advantage of the new search_type="websearch" option for PostgreSQL full-text search added in Django 3.1—James Turk contributed that improvement and wrote about it on his blog.

This means my blog search engine now supports advanced operators, like “phrase searches” and -subtractions—try that out here: “signed cookies” -django.

content-length and datasette-clone progress bars

When Datasette runs against an immutable SQLite database it lets you download that database file directly. Datasette development version now includes a content-length header specifying the size of that download.

... which means my datasette-clone tool can now show you a progress bar!

$ datasette-clone https://fivethirtyeight.datasettes.com -v
Fetching fivethirtyeight.db, current hash 7ba632a14f29375c289d96400947e2d5fcdad3a4b5a90883d6286bd4e83ede78 != None
268.79 MB  [##########--------------------------]    28%  00:02:14

github-to-sqlite tags

I wanted to build myself a SQL query to show me my GitHub repositories that have commits which have not yet been included in a release.

It turned out I needed to build a new import command for this: github-to-sqlite tags github.db simonw/datasette—which fetches all of the tags for the specified repository.

Here’s the relevant issue and here’s a monstrous SQL query that shows me repositories with commits that are ready to be shipped in a release:

with most_recent_releases as (
  with ranked as (
    select
      repo,
      tag_name,
      published_at,
      row_number() OVER (
        partition BY repo
        ORDER BY
          published_at DESC
      ) rank
    FROM
      releases
  )
  select
    *
  from
    ranked
  where
    rank = 1
)
select
  repos.full_name as repo,
  most_recent_releases.tag_name as release,
  commits.committer_date as release_commit_date,
  (
    select
      count(*)
    from
      commits c2
    where
      c2.repo = repos.id
      and c2.committer_date > commits.committer_date
  ) as commits_since_release,
  'https://github.com/' || repos.full_name || '/compare/' || most_recent_releases.tag_name || '...' || repos.default_branch as view_commits
from
  most_recent_releases
  join repos on most_recent_releases.repo = repos.id
  join tags on tags.repo = repos.id
  and tags.name = most_recent_releases.tag_name
  join commits on tags.sha = commits.sha
order by
  commits_since_release desc

Try it here.

TIL this week

Releases this week