Simon Willison’s Weblog

Subscribe

Datasette Publish: a web app for publishing CSV files as an online database

17th January 2018

I’ve just released Datasette Publish, a web tool for turning one or more CSV files into an online database with a JSON API.

Here’s a demo application I built using Datasette Publish, showing Californian campaign finance data using CSV files released by the California Civic Data Coalition.

And here’s an animated screencast showing exactly how I built it:

Animated demo of Datasette Publish

Datasette Publish combines my Datasette tool for publishing SQLite databases as an API with my csvs-to-sqlite tool for generating them.

It’s built on top of the Zeit Now hosting service, which means anything you deploy with it lives on your own account with Zeit and stays entirely under your control. I used the brand new Zeit API 2.0.

Zeit’s generous free plan means you can try the tool out as many times as you like—and if you want to use it for an API powering a production website you can easily upgrade to a paid hosting plan.

Who should use it

Anyone who has data they want to share with the world!

The fundamental idea behind Datasette is that publishing structured data as both a web interface and a JSON API should be as quick and easy as possible.

The world is full of interesting data that often ends up trapped in PDF blobs or other hard-to-use formats, if it gets published at all. Datasette encourages using SQLite instead: a powerful, flexible format that enables analysis via SQL queries and can easily be shared and hosted online.

Since so much of the data that IS published today uses CSV, this first release of Datasette Publish focuses on CSV conversion above anything else. I plan to add support for other useful formats in the future.

The three areas I’m most excited in seeing adoption of Datasette are data journalism, civic open data and cultural institutions.

Data journalism because when I worked at the Guardian Datasette is the tool I wish I had had for publishing data. When we started the Guardian Datablog we ended up using Google Sheets for this.

Civic open data because it turns out the open data movement mostly won! It’s incredible how much high quality data is published by local and national governments these days. My San Francisco tree search project for example uses data from the Department of Public Works—a CSV of 190,000 trees around the city.

Cultural institutions because the museums and libraries of the world are sitting on enormous treasure troves of valuable information, and have an institutional mandate to share that data as widely as possible.

If you are involved in any of the above please get in touch. I’d love your help improving the Datasette ecosystem to better serve your needs.

How it works

Datasette Publish would not be possible without Zeit Now. Now is a revolutionary approach to hosting: it lets you instantly create immutable deployments with a unique URL, via a command-line tool or using their recently updated API. It’s by far the most productive hosting environment I’ve ever worked with.

I built the main Datasette Publish interface using React. Building a SPA here made a lot of sense, because it allowed me to construct the entire application without any form of server-side storage (aside from Keen for analytics).

When you sign in via Zeit OAuth I store your access token in a signed cookie. Each time you upload a CSV the file is stored directly using Zeit’s upload API, and the file metadata is persisted in JavaScript state in the React app. When you click “publish” the accumulated state is sent to the server where it is used to construct a new Zeit deployment.

The deployment itself consists of the CSV files plus a Dockerfile that installs Python, Datasette, csvs-to-sqlite and their dependencies, then runs csvs-to-sqlite against the CSV files and starts up Datasette against the resulting database.

If you specified a title, description, source or license I generate a Datasette metadata.json file and include that in the deployment as well.

Since free deployments to Zeit are “source code visible”, you can see exactly how the resulting application is structured by visiting https://datasette-onrlszntsq.now.sh/_src (the campaign finance app I built earlier).

Using the Zeit API in this way has the neat effect that I don’t ever store any user data myself—neither the access token used to access your account nor any of the CSVs that you upload. Uploaded files go straight to your own Zeit account and stay under your control. Access tokens are never persisted. The deployed application lives on your own hosting account, where you can terminate it or upgrade it to a paid plan without any further involvement from the tool I have built.

Not having to worry about storing encrypted access tokens or covering any hosting costs beyond the Datasette Publish tool itself is delightful.

This ability to build tools that themselves deploy other tools is fascinating. I can’t wait to see what other kinds of interesting new applications it enables.

Discussion on Hacker News.

This is Datasette Publish: a web app for publishing CSV files as an online database by Simon Willison, posted on 17th January 2018.

Next: Analyzing my Twitter followers with Datasette

Previous: Building a location to time zone API with SpatiaLite, OpenStreetMap and Datasette