Simon Willison’s Weblog


Weeknotes: Datasette Cloud preview invitations

30th September 2022

This week I finally started sending out invitations for people to try out the preview of the new Datasette Cloud, my SaaS offering for Datasette.

A screenshot of the Datasette Cloud onboarding screen, showing a list of example datasets that can be imported with one click.

The preview release includes the following features:

  • Create a new private instance of Datasette, in the geographic region of your choice
  • Add data to your instance by uploading CSV files, importing CSV data by URL or importing data from a Socrata open government data portal
  • Build a search engine against selected columns from your data
  • Invite members of your team to collaborate on your instance
  • Visualize your data on a map, or as bar charts or line charts

You can request preview access here—or come talk to me about it on the Datasette Discord.

I’m certain I haven’t built the right product yet, so feedback is incredibly valuable to me right now!

The two most important upcoming features are API access (with API keys) and the ability to publish data—right now the tool is entirely private, but publishing structured data is a big part of Datasette’s core DNA and something I’m certain people will want to be able to do with the hosted version.

Other projects

I’ve already written a lot about my other projects this week.

I came second place in the Bellingcat Hackathon with Action Transcription, A tool to run caption extraction against online videos using Whisper and GitHub Issues/Actions.

Meta AI released a new paper describing Make-A-Video, a text-to-video model. I dug into the training data using Datasette—see Exploring 10m scraped Shutterstock videos used to train Meta’s Make-A-Video text-to-video model—and found that one of the main academic datasets behind the model was entirely scraped from Shutterstock.

Andy Baio noted that this was another example of a commercial AI research team building on a dataset gathered in academia. He calls that “AI Data Laundering”, and wrote about it in AI Data Laundering: How Academic and Nonprofit Researchers Shield Tech Companies from Accountability.

I’ve continued to think about Prompt Injection, the security attack against software built on large language models that starts with “Ignore previous instructions and ...”. I wrote two more pieces about that:

I also pushed out a new Datasette alpha with some small changes that have accumulated over the past month.

Releases this week

TIL this week