Simon Willison’s Weblog

Subscribe

A Datasette tutorial written by GPT-3

31st May 2022

I’ve been playing around with OpenAI’s GPT-3 language model playground for a few months now. It’s a fascinating piece of software. You can sign up here—apparently there’s no longer a waiting list.

I just got a particularly interesting result from it which I thought I would share.

Inspired by this tweet about automating documentation, the prompt I fed it was “a step by step tutorial for getting started with Datasette”. I then kept on clicking the generate button until it stopped adding additional paragraphs.

Something I’ve learned about GPT-3 is that you need to fact check everything that it produces. It’s great at writing convincing text, but there’s no guarantees that anything will actually be true!

Here’s what it came up with (original here). I’ve indented its output in order to add annotations.

This is a guide for getting started with Datasette. Datasette is a tool for creating and publishing data-driven websites. It is designed to make it easy to publish structured data, such as the results of a database query, in a way that is highly visible and interactive.

A really strong start. This almost captures exactly what Datasette does.

One tiny nitpick: Datasette is used to publish entire database tables, not just the results of specific queries.

This is honestly a really good attempt at an elevator pitch for the tool—something I’ve consistently found difficult to capture in my own writing.

Datasette can be used to create websites that allow users to explore and visualize data, or to build applications that expose data via APIs. It can also be used as a static site generator, creating a completely static HTML website that can be deployed anywhere.

Again, mostly right. Datasette doesn’t work as a static site generator, yet—though there’s an open issue to explore options around that. And of course there’s Datasette Lite.

This guide will cover the basics of how to install and use Datasette. It will also show you how to create a simple data-driven website using Datasette.

1. Install Datasette

Datasette can be installed using pip, the Python package manager. To install Datasette, simply run the following command:

pip install datasette

Yup, that’s how you install Datasette. brew install datasette works too.

2. Create a database

Datasette requires a SQLite database file in order to function. This database can be created using any SQLite client, such as the sqlite3 command-line tool.

Once you have created a SQLite database file, you can use Datasette to create a website that exposes the data in that database.

Entirely correct.

3. Run the datasette command

The datasette command is used to start a Datasette server. To start a server for the SQLite database file mydatabase.db , simply run the following command:

datasette mydatabase.db

This will start a Datasette server on http://localhost:8001 by default. You can now view the website in your web browser.

Honestly, it’s getting spooky at this point. This is exactly right—it even got the correct port number (8001). I’d love to understand the training data that helped GPT-3 reach this conclusion. Has it been trained on the entire contents of GitHub perhaps?

4. Explore the data

Datasette provides a number of ways to explore the data in your database. The homepage of the website lists all of the tables in your database, as well as a search box that can be used to find specific data.

It made up that search box on the homepage, though it’s a pretty great idea for a feature.

(You can get something a bit like that if you install the datasette-search-all plugin.)

Each table has its own page, which lists all of the columns in the table and provides a form that can be used to query the data. The results of each query are displayed in a tabular format, and can be downloaded as CSV files.

Again, spookily accurate.

5. Create an API endpoint

Datasette also provides a JSON API that can be used to programmatically access the data in your database. Each table in your database has its own API endpoint, which can be accessed using a URL of the form:

http://localhost:8001/mydatabase.db/mytable?_size=100

Here’s where things get interesting. The description here is exactly correct, but that URL isn’t quite right—it should be:

http://localhost:8001/mydatabase/mytable.json?_size=100

I’d love to know how it synthesized the incorrect URL here.

This URL will return the first 100 rows from the mytable table in JSON format. The _size parameter can be used to control the number of rows returned, and the _offset parameter can be used to skip rows (for example, _offset=100 will return rows 100-199).

I am deeply amused by this: the _size parameter is described correctly, but the _offset parameter here is a total fabrication: Datasette deliberately doesn’t offer offset/limit pagination because it’s inefficient against large tables. It implements keyset pagination instead, using a ?_next= parameter.

The Datasette JSON API can be used to build data-driven applications on top of your database.

Good copy—I may use this elsewhere!

6. Deploy your website

Once you have created a Datasette website, you can deploy it to any web hosting provider that supports Python. Datasette is designed to be lightweight and fast, so it can be easily deployed on any platform, including Heroku, Google App Engine, and Amazon Web Services.

I’ve not seen Datasette deployed on Google App Engine, but it runs great on Google Cloud Run.

Thoughts

Firstly, the amount of detail GPT-3 has accumulated about Datasette is fascinating: I really want to understand where it picked up all of this knowledge. I remember asking it about Datasette around a year ago and not getting results anything like this, so my hunch is that it’s been trained on a new corpus of data at some point in the past twelve months.

I certainly don’t see this as a replacement for professional technical writers, but as a tool for trying out new ideas, beating writer’s block and generally brainstorming how documentation could work it’s already something I would continue using.

A really interesting aspect of this is “prompt engineering”—iterating on prompts and figuring out what can get the best results out of GPT-3. I wrote this article partly to highlight how good the results for “a step by step tutorial for getting started with ...” could be.

Marketing landing pages

As a quick bonus: I think GPT-3 really excels when it comes to marketing copy (maybe because I don’t rate myself at all highly in that department).

Here’s its response to the prompt: Write a marketing landing page for Datasette Cloud

Datasette Cloud is a new way to host and manage your data. It’s built on top of the Datasette open source project and offers all the same features—including instant full-text search, SQLite database downloads and JSON API access—but with added benefits like automatic backups, 24/7 uptime monitoring and support from the Datasette team.

If you’re looking for an easy way to host and manage your data, Datasette Cloud is the perfect solution. Try it today and see how easy it is to get started.

Honestly this does a significantly better job of describing the hosted Datasette product I’ve been building than I’ve managed myself so far.

Fascinating.