Datasette Publish: a web app for publishing CSV files as an online database
17th January 2018
I’ve just released Datasette Publish, a web tool for turning one or more CSV files into an online database with a JSON API.
And here’s an animated screencast showing exactly how I built it:
It’s built on top of the Zeit Now hosting service, which means anything you deploy with it lives on your own account with Zeit and stays entirely under your control. I used the brand new Zeit API 2.0.
Zeit’s generous free plan means you can try the tool out as many times as you like—and if you want to use it for an API powering a production website you can easily upgrade to a paid hosting plan.
Anyone who has data they want to share with the world!
The fundamental idea behind Datasette is that publishing structured data as both a web interface and a JSON API should be as quick and easy as possible.
The world is full of interesting data that often ends up trapped in PDF blobs or other hard-to-use formats, if it gets published at all. Datasette encourages using SQLite instead: a powerful, flexible format that enables analysis via SQL queries and can easily be shared and hosted online.
Since so much of the data that IS published today uses CSV, this first release of Datasette Publish focuses on CSV conversion above anything else. I plan to add support for other useful formats in the future.
The three areas I’m most excited in seeing adoption of Datasette are data journalism, civic open data and cultural institutions.
Data journalism because when I worked at the Guardian Datasette is the tool I wish I had had for publishing data. When we started the Guardian Datablog we ended up using Google Sheets for this.
Civic open data because it turns out the open data movement mostly won! It’s incredible how much high quality data is published by local and national governments these days. My San Francisco tree search project for example uses data from the Department of Public Works—a CSV of 190,000 trees around the city.
Cultural institutions because the museums and libraries of the world are sitting on enormous treasure troves of valuable information, and have an institutional mandate to share that data as widely as possible.
If you are involved in any of the above please get in touch. I’d love your help improving the Datasette ecosystem to better serve your needs.
Datasette Publish would not be possible without Zeit Now. Now is a revolutionary approach to hosting: it lets you instantly create immutable deployments with a unique URL, via a command-line tool or using their recently updated API. It’s by far the most productive hosting environment I’ve ever worked with.
I built the main Datasette Publish interface using React. Building a SPA here made a lot of sense, because it allowed me to construct the entire application without any form of server-side storage (aside from Keen for analytics).
The deployment itself consists of the CSV files plus a Dockerfile that installs Python, Datasette, csvs-to-sqlite and their dependencies, then runs csvs-to-sqlite against the CSV files and starts up Datasette against the resulting database.
If you specified a title, description, source or license I generate a Datasette metadata.json file and include that in the deployment as well.
Since free deployments to Zeit are “source code visible”, you can see exactly how the resulting application is structured by visiting https://datasette-onrlszntsq.now.sh/_src (the campaign finance app I built earlier).
Using the Zeit API in this way has the neat effect that I don’t ever store any user data myself—neither the access token used to access your account nor any of the CSVs that you upload. Uploaded files go straight to your own Zeit account and stay under your control. Access tokens are never persisted. The deployed application lives on your own hosting account, where you can terminate it or upgrade it to a paid plan without any further involvement from the tool I have built.
Not having to worry about storing encrypted access tokens or covering any hosting costs beyond the Datasette Publish tool itself is delightful.
This ability to build tools that themselves deploy other tools is fascinating. I can’t wait to see what other kinds of interesting new applications it enables.
More recent articles
- Weeknotes: the Datasette Cloud API, a podcast appearance and more - 1st October 2023
- Things I've learned about building CLI tools in Python - 30th September 2023
- Talking Large Language Models with Rooftop Ruby - 29th September 2023
- Weeknotes: Embeddings, more embeddings and Datasette Cloud - 17th September 2023
- Build an image search engine with llm-clip, chat with models with llm chat - 12th September 2023
- LLM now provides tools for working with embeddings - 4th September 2023
- Datasette 1.0a4 and 1.0a5, plus weeknotes - 30th August 2023
- Making Large Language Models work for you - 27th August 2023
- Datasette Cloud, Datasette 1.0a3, llm-mlc and more - 16th August 2023
- How I make annotated presentations - 6th August 2023