Weeknotes: Datasette Cloud preview invitations
30th September 2022
This week I finally started sending out invitations for people to try out the preview of the new Datasette Cloud, my SaaS offering for Datasette.
The preview release includes the following features:
- Create a new private instance of Datasette, in the geographic region of your choice
- Add data to your instance by uploading CSV files, importing CSV data by URL or importing data from a Socrata open government data portal
- Build a search engine against selected columns from your data
- Invite members of your team to collaborate on your instance
- Visualize your data on a map, or as bar charts or line charts
I’m certain I haven’t built the right product yet, so feedback is incredibly valuable to me right now!
The two most important upcoming features are API access (with API keys) and the ability to publish data—right now the tool is entirely private, but publishing structured data is a big part of Datasette’s core DNA and something I’m certain people will want to be able to do with the hosted version.
I’ve already written a lot about my other projects this week.
I came second place in the Bellingcat Hackathon with Action Transcription, A tool to run caption extraction against online videos using Whisper and GitHub Issues/Actions.
Meta AI released a new paper describing Make-A-Video, a text-to-video model. I dug into the training data using Datasette—see Exploring 10m scraped Shutterstock videos used to train Meta’s Make-A-Video text-to-video model—and found that one of the main academic datasets behind the model was entirely scraped from Shutterstock.
Andy Baio noted that this was another example of a commercial AI research team building on a dataset gathered in academia. He calls that “AI Data Laundering”, and wrote about it in AI Data Laundering: How Academic and Nonprofit Researchers Shield Tech Companies from Accountability.
I’ve continued to think about Prompt Injection, the security attack against software built on large language models that starts with “Ignore previous instructions and ...”. I wrote two more pieces about that:
- I don’t know how to solve prompt injection talks about how this is a security vulnerability which I don’t know of any good mitigations for!
- You can’t solve AI security problems with more AI puts forward my argument that attempting to solve an AI security problem by layering on even more AI feels doomed to fail, because black-box unpredictable AI models do not offer the certainty and guarantees that I want from a security solution.
I also pushed out a new Datasette alpha with some small changes that have accumulated over the past month.
Releases this week
datasette-publish-fly: 1.2—(8 releases total)—2022-09-29
Datasette plugin for publishing data using Fly
datasette-pretty-json: 0.2.2—(2 releases total)—2022-09-28
Datasette plugin that pretty-prints any column values that are valid JSON objects or arrays
datasette: 0.63a0—(114 releases total)—2022-09-26
An open source multi-tool for exploring and publishing data
ttml-to-json: 0.2—(2 releases total)—2022-09-25
Convert TTML to JSON
webvtt-to-json: 0.2—(2 releases total)—2022-09-25
Convert WebVTT to JSON, optionally removing duplicate lines
image-diff: 0.2.2—(4 releases total)—2022-09-19
CLI tool for comparing images
datasette-sandstorm-support: 0.2—(2 releases total)—2022-09-16
Authentication and permissions for Datasette on Sandstorm
TIL this week
- Returning related rows in a single SQL query using JSON
- Using DuckDB in Python to access Parquet data
- Deploying Python web apps as AWS Lambda functions
- Whisky sour
- Ensure labels exist in a GitHub repository
- Athena error: The specified key does not exist
- HTML video that loads when the user clicks play
- GraphQL fragments
More recent articles
- Weeknotes: datasette-enrichments, datasette-comments, sqlite-chronicle - 8th December 2023
- Datasette Enrichments: a new plugin framework for augmenting your data - 1st December 2023
- llamafile is the new best way to run a LLM on your own computer - 29th November 2023
- Prompt injection explained, November 2023 edition - 27th November 2023
- I'm on the Newsroom Robots podcast, with thoughts on the OpenAI board - 25th November 2023
- Weeknotes: DevDay, GitHub Universe, OpenAI chaos - 22nd November 2023
- Deciphering clues in a news article to understand how it was reported - 22nd November 2023
- Exploring GPTs: ChatGPT in a trench coat? - 15th November 2023
- Financial sustainability for open source projects at GitHub Universe - 10th November 2023
- ospeak: a CLI tool for speaking text in the terminal via OpenAI - 7th November 2023