Weeknotes: Datasette Cloud preview invitations
This week I finally started sending out invitations for people to try out the preview of the new Datasette Cloud, my SaaS offering for Datasette.
The preview release includes the following features:
- Create a new private instance of Datasette, in the geographic region of your choice
- Add data to your instance by uploading CSV files, importing CSV data by URL or importing data from a Socrata open government data portal
- Build a search engine against selected columns from your data
- Invite members of your team to collaborate on your instance
- Visualize your data on a map, or as bar charts or line charts
You can request preview access here—or come talk to me about it on the Datasette Discord.
I’m certain I haven’t built the right product yet, so feedback is incredibly valuable to me right now!
The two most important upcoming features are API access (with API keys) and the ability to publish data—right now the tool is entirely private, but publishing structured data is a big part of Datasette’s core DNA and something I’m certain people will want to be able to do with the hosted version.
I’ve already written a lot about my other projects this week.
I came second place in the Bellingcat Hackathon with Action Transcription, A tool to run caption extraction against online videos using Whisper and GitHub Issues/Actions.
Meta AI released a new paper describing Make-A-Video, a text-to-video model. I dug into the training data using Datasette—see Exploring 10m scraped Shutterstock videos used to train Meta’s Make-A-Video text-to-video model—and found that one of the main academic datasets behind the model was entirely scraped from Shutterstock.
Andy Baio noted that this was another example of a commercial AI research team building on a dataset gathered in academia. He calls that “AI Data Laundering”, and wrote about it in AI Data Laundering: How Academic and Nonprofit Researchers Shield Tech Companies from Accountability.
I’ve continued to think about Prompt Injection, the security attack against software built on large language models that starts with “Ignore previous instructions and ...”. I wrote two more pieces about that:
- I don’t know how to solve prompt injection talks about how this is a security vulnerability which I don’t know of any good mitigations for!
- You can’t solve AI security problems with more AI puts forward my argument that attempting to solve an AI security problem by layering on even more AI feels doomed to fail, because black-box unpredictable AI models do not offer the certainty and guarantees that I want from a security solution.
I also pushed out a new Datasette alpha with some small changes that have accumulated over the past month.
Releases this week
datasette-publish-fly: 1.2—(8 releases total)—2022-09-29
Datasette plugin for publishing data using Fly
datasette-pretty-json: 0.2.2—(2 releases total)—2022-09-28
Datasette plugin that pretty-prints any column values that are valid JSON objects or arrays
datasette: 0.63a0—(114 releases total)—2022-09-26
An open source multi-tool for exploring and publishing data
ttml-to-json: 0.2—(2 releases total)—2022-09-25
Convert TTML to JSON
webvtt-to-json: 0.2—(2 releases total)—2022-09-25
Convert WebVTT to JSON, optionally removing duplicate lines
image-diff: 0.2.2—(4 releases total)—2022-09-19
CLI tool for comparing images
datasette-sandstorm-support: 0.2—(2 releases total)—2022-09-16
Authentication and permissions for Datasette on Sandstorm
TIL this week
- Returning related rows in a single SQL query using JSON
- Using DuckDB in Python to access Parquet data
- Deploying Python web apps as AWS Lambda functions
- Whisky sour
- Ensure labels exist in a GitHub repository
- Athena error: The specified key does not exist
- HTML video that loads when the user clicks play
- GraphQL fragments
More recent articles
- ChatGPT should include inline tips - 30th May 2023
- Lawyer cites fake cases invented by ChatGPT, judge is not amused - 27th May 2023
- llm, ttok and strip-tags - CLI tools for working with ChatGPT and other LLMs - 18th May 2023
- Delimiters won't save you from prompt injection - 11th May 2023
- Weeknotes: sqlite-utils 3.31, download-esm, Python in a sandbox - 10th May 2023
- Leaked Google document: "We Have No Moat, And Neither Does OpenAI" - 4th May 2023
- Midjourney 5.1 - 4th May 2023
- Prompt injection explained, with video, slides, and a transcript - 2nd May 2023
- download-esm: a tool for downloading ECMAScript modules - 2nd May 2023
- Let's be bear or bunny - 1st May 2023