Weeknotes: Datasette Cloud preview invitations

30th September 2022

This week I finally started sending out invitations for people to try out the preview of the new Datasette Cloud, my SaaS offering for Datasette.

A screenshot of the Datasette Cloud onboarding screen, showing a list of example datasets that can be imported with one click.

The preview release includes the following features:

Create a new private instance of Datasette, in the geographic region of your choice
Add data to your instance by uploading CSV files, importing CSV data by URL or importing data from a Socrata open government data portal
Build a search engine against selected columns from your data
Invite members of your team to collaborate on your instance
Visualize your data on a map, or as bar charts or line charts

You can request preview access here—or come talk to me about it on the Datasette Discord.

I’m certain I haven’t built the right product yet, so feedback is incredibly valuable to me right now!

The two most important upcoming features are API access (with API keys) and the ability to publish data—right now the tool is entirely private, but publishing structured data is a big part of Datasette’s core DNA and something I’m certain people will want to be able to do with the hosted version.

Other projects

I’ve already written a lot about my other projects this week.

I came second place in the Bellingcat Hackathon with Action Transcription, A tool to run caption extraction against online videos using Whisper and GitHub Issues/Actions.

Meta AI released a new paper describing Make-A-Video, a text-to-video model. I dug into the training data using Datasette—see Exploring 10m scraped Shutterstock videos used to train Meta’s Make-A-Video text-to-video model—and found that one of the main academic datasets behind the model was entirely scraped from Shutterstock.

Andy Baio noted that this was another example of a commercial AI research team building on a dataset gathered in academia. He calls that “AI Data Laundering”, and wrote about it in AI Data Laundering: How Academic and Nonprofit Researchers Shield Tech Companies from Accountability.

I’ve continued to think about Prompt Injection, the security attack against software built on large language models that starts with “Ignore previous instructions and ...”. I wrote two more pieces about that:

I don’t know how to solve prompt injection talks about how this is a security vulnerability which I don’t know of any good mitigations for!
You can’t solve AI security problems with more AI puts forward my argument that attempting to solve an AI security problem by layering on even more AI feels doomed to fail, because black-box unpredictable AI models do not offer the certainty and guarantees that I want from a security solution.

I also pushed out a new Datasette alpha with some small changes that have accumulated over the past month.

Releases this week

datasette-publish-fly: 1.2—(8 releases total)—2022-09-29
Datasette plugin for publishing data using Fly
datasette-pretty-json: 0.2.2—(2 releases total)—2022-09-28
Datasette plugin that pretty-prints any column values that are valid JSON objects or arrays
datasette: 0.63a0—(114 releases total)—2022-09-26
An open source multi-tool for exploring and publishing data
ttml-to-json: 0.2—(2 releases total)—2022-09-25
Convert TTML to JSON
webvtt-to-json: 0.2—(2 releases total)—2022-09-25
Convert WebVTT to JSON, optionally removing duplicate lines
image-diff: 0.2.2—(4 releases total)—2022-09-19
CLI tool for comparing images
datasette-sandstorm-support: 0.2—(2 releases total)—2022-09-16
Authentication and permissions for Datasette on Sandstorm

TIL this week

Posted 30th September 2022 at 11:05 pm · Follow me on Mastodon or Twitter or subscribe to my newsletter

Simon Willison’s Weblog