Datasette Publish: a web app for publishing CSV files as an online database
17th January 2018
I’ve just released Datasette Publish, a web tool for turning one or more CSV files into an online database with a JSON API.
And here’s an animated screencast showing exactly how I built it:
It’s built on top of the Zeit Now hosting service, which means anything you deploy with it lives on your own account with Zeit and stays entirely under your control. I used the brand new Zeit API 2.0.
Zeit’s generous free plan means you can try the tool out as many times as you like—and if you want to use it for an API powering a production website you can easily upgrade to a paid hosting plan.
Anyone who has data they want to share with the world!
The fundamental idea behind Datasette is that publishing structured data as both a web interface and a JSON API should be as quick and easy as possible.
The world is full of interesting data that often ends up trapped in PDF blobs or other hard-to-use formats, if it gets published at all. Datasette encourages using SQLite instead: a powerful, flexible format that enables analysis via SQL queries and can easily be shared and hosted online.
Since so much of the data that IS published today uses CSV, this first release of Datasette Publish focuses on CSV conversion above anything else. I plan to add support for other useful formats in the future.
The three areas I’m most excited in seeing adoption of Datasette are data journalism, civic open data and cultural institutions.
Data journalism because when I worked at the Guardian Datasette is the tool I wish I had had for publishing data. When we started the Guardian Datablog we ended up using Google Sheets for this.
Civic open data because it turns out the open data movement mostly won! It’s incredible how much high quality data is published by local and national governments these days. My San Francisco tree search project for example uses data from the Department of Public Works—a CSV of 190,000 trees around the city.
Cultural institutions because the museums and libraries of the world are sitting on enormous treasure troves of valuable information, and have an institutional mandate to share that data as widely as possible.
If you are involved in any of the above please get in touch. I’d love your help improving the Datasette ecosystem to better serve your needs.
Datasette Publish would not be possible without Zeit Now. Now is a revolutionary approach to hosting: it lets you instantly create immutable deployments with a unique URL, via a command-line tool or using their recently updated API. It’s by far the most productive hosting environment I’ve ever worked with.
I built the main Datasette Publish interface using React. Building a SPA here made a lot of sense, because it allowed me to construct the entire application without any form of server-side storage (aside from Keen for analytics).
The deployment itself consists of the CSV files plus a Dockerfile that installs Python, Datasette, csvs-to-sqlite and their dependencies, then runs csvs-to-sqlite against the CSV files and starts up Datasette against the resulting database.
If you specified a title, description, source or license I generate a Datasette metadata.json file and include that in the deployment as well.
Since free deployments to Zeit are “source code visible”, you can see exactly how the resulting application is structured by visiting https://datasette-onrlszntsq.now.sh/_src (the campaign finance app I built earlier).
Using the Zeit API in this way has the neat effect that I don’t ever store any user data myself—neither the access token used to access your account nor any of the CSVs that you upload. Uploaded files go straight to your own Zeit account and stay under your control. Access tokens are never persisted. The deployed application lives on your own hosting account, where you can terminate it or upgrade it to a paid plan without any further involvement from the tool I have built.
Not having to worry about storing encrypted access tokens or covering any hosting costs beyond the Datasette Publish tool itself is delightful.
This ability to build tools that themselves deploy other tools is fascinating. I can’t wait to see what other kinds of interesting new applications it enables.
More recent articles
- Interesting ideas in Observable Framework - 3rd March 2024
- Weeknotes: Getting ready for NICAR - 27th February 2024
- The killer app of Gemini Pro 1.5 is video - 21st February 2024
- Weeknotes: a Datasette release, an LLM release and a bunch of new plugins - 9th February 2024
- LLM 0.13: The annotated release notes - 26th January 2024
- Weeknotes: datasette-test, datasette-build, PSF board retreat - 21st January 2024
- Talking about Open Source LLMs on Oxide and Friends - 17th January 2024
- Publish Python packages to PyPI with a python-lib cookiecutter template and GitHub Actions - 16th January 2024
- What I should have said about the term Artificial Intelligence - 9th January 2024