Simon Willison’s Weblog

Subscribe

Entries

Filters: Sorted by date

The Perfect Commit

For the last few years I’ve been trying to center my work around creating what I consider to be the Perfect Commit. This is a single commit that contains all of the following:

[... 2,061 words]

Datasette 0.63: The annotated release notes

Visit Datasette 0.63: The annotated release notes

I released Datasette 0.63 today. These are the annotated release notes.

[... 1,531 words]

Weeknotes: DjangoCon, SQLite in Django, datasette-gunicorn

I spent most of this week at DjangoCon in San Diego—my first outside-of-the-Bay-Area conference since the before-times.

[... 1,184 words]

Measuring traffic during the Half Moon Bay Pumpkin Festival

Visit Measuring traffic during the Half Moon Bay Pumpkin Festival

This weekend was the 50th annual Half Moon Bay Pumpkin Festival.

[... 2,693 words]

Automating screenshots for the Datasette documentation using shot-scraper

Visit Automating screenshots for the Datasette documentation using shot-scraper

I released shot-scraper back in March as a tool for keeping screenshots in documentation up-to-date.

[... 1,810 words]

Weeknotes: Publishing data using Datasette Cloud

My initial preview releases of Datasette Cloud (the SaaS version of my Datasette open source project) have focused on private data collaboration.

[... 582 words]

Is the AI spell-casting metaphor harmful or helpful?

Visit Is the AI spell-casting metaphor harmful or helpful?

For a few weeks now I’ve been promoting spell-casting as a metaphor for prompt design against generative AI systems such as GPT-3 and Stable Diffusion.

[... 990 words]

Software engineering practices

Gergely Orosz started a Twitter conversation asking about recommended “software engineering practices” for development teams.

[... 1,557 words]

Weeknotes: Datasette Cloud preview invitations

Visit Weeknotes: Datasette Cloud preview invitations

This week I finally started sending out invitations for people to try out the preview of the new Datasette Cloud, my SaaS offering for Datasette.

[... 713 words]

A tool to run caption extraction against online videos using Whisper and GitHub Issues/Actions

Visit A tool to run caption extraction against online videos using Whisper and GitHub Issues/Actions

I released a new project this weekend, built during the Bellingcat Hackathon (I came second!) It’s called Action Transcription and it’s a tool for caturing captions and transcripts from online videos.

[... 1,362 words]

Exploring 10m scraped Shutterstock videos used to train Meta’s Make-A-Video text-to-video model

Visit Exploring 10m scraped Shutterstock videos used to train Meta's Make-A-Video text-to-video model

Make-A-Video is a new “state-of-the-art AI system that generates videos from text” from Meta AI. It looks incredible—it really is DALL-E / Stable Diffusion for video. And it appears to have been trained on 10m video preview clips scraped from Shutterstock.

[... 923 words]

You can’t solve AI security problems with more AI

One of the most common proposed solutions to prompt injection attacks (where an AI language model backed system is subverted by a user injecting malicious input—“ignore previous instructions and do this instead”) is to apply more AI to the problem.

[... 1,288 words]

I don’t know how to solve prompt injection

Visit I don't know how to solve prompt injection

Some extended thoughts about prompt injection attacks against software built on top of AI language models such a GPT-3. This post started as a Twitter thread but I’m promoting it to a full blog entry here.

[... 581 words]

Weeknotes: Datasette Lite, s3-credentials, shot-scraper, datasette-edit-templates and more

Visit Weeknotes: Datasette Lite, s3-credentials, shot-scraper, datasette-edit-templates and more

Despite distractions from AI I managed to make progress on a bunch of different projects this week, including new releases of s3-credentials and shot-scraper, a new datasette-edit-templates plugin and a small but neat improvement to Datasette Lite.

[... 1,562 words]

Prompt injection attacks against GPT-3

Visit Prompt injection attacks against GPT-3

Riley Goodside, yesterday:

[... 1,457 words]

Exploring the training data behind Stable Diffusion

Visit Exploring the training data behind Stable Diffusion

Two weeks ago, the Stable Diffusion image generation model was released to the public. I wrote about this last week, in Stable Diffusion is a really big deal—a post which has since become one of the top ten results for “stable diffusion” on Google and shown up in all sorts of different places online.

[... 2,897 words]

Notes on the SQLite DuckDB paper

SQLite: Past, Present, and Future is a newly published paper authored by Kevin P. Gaffney, Martin Prammer and Jignesh M. Patel from the University of Wisconsin-Madison and D. Richard Hipp, Larry Brasfield and Dan Kennedy from the core SQLite engineering team.

[... 1,021 words]

Stable Diffusion is a really big deal

Visit Stable Diffusion is a really big deal

If you haven’t been paying attention to what’s going on with Stable Diffusion, you really should be.

[... 1,443 words]

Building a searchable archive for the San Francisco Microscopical Society

Visit Building a searchable archive for the San Francisco Microscopical Society

The San Francisco Microscopical Society was founded in 1870 by a group of scientists dedicated to advancing the field of microscopy.

[... 1,845 words]

Analyzing ScotRail audio announcements with Datasette—from prototype to production

Visit Analyzing ScotRail audio announcements with Datasette - from prototype to production

Scottish train operator ScotRail released a two-hour long MP3 file containing all of the components of its automated station announcements. Messing around with them is proving to be a huge amount of fun.

[... 4,428 words]

Plugin support for Datasette Lite

Visit Plugin support for Datasette Lite

I’ve added a new feature to Datasette Lite, my distribution of Datasette that runs entirely in the browser using Python and SQLite compiled to WebAssembly. You can now install additional Datasette plugins by passing them in the URL.

[... 865 words]

Litestream backups for Datasette Cloud (and weeknotes)

My main focus this week has been adding robust backups to the forthcoming Datasette Cloud.

[... 1,604 words]

Weeknotes: Joining the board of the Python Software Foundation

Visit Weeknotes: Joining the board of the Python Software Foundation

A few weeks ago I was elected to the board of directors for the Python Software Foundation.

[... 2,081 words]

Weeknotes: Datasette, sqlite-utils, Datasette Desktop

A flurry of releases this week, including a new Datasette alpha and a fixed Datasette Desktop.

[... 1,113 words]

sqlite-comprehend: run AWS entity extraction against content in a SQLite database

I built a new tool this week: sqlite-comprehend, which passes text from a SQLite database through the AWS Comprehend entity extraction service and stores the returned entities.

[... 1,146 words]

Using GPT-3 to explain how code works

Visit Using GPT-3 to explain how code works

One of my favourite uses for the GPT-3 AI language model is generating explanations of how code works. It’s shockingly effective at this: its training set clearly include a vast amount of source code.

[... 1,983 words]

s3-ocr: Extract text from PDF files stored in an S3 bucket

Visit s3-ocr: Extract text from PDF files stored in an S3 bucket

I’ve released s3-ocr, a new tool that runs Amazon’s Textract OCR text extraction against PDF files in an S3 bucket, then writes the resulting text out to a SQLite database with full-text search configured so you can run searches against the extracted data.

[... 1,493 words]

First impressions of DALL-E, generating images from text

Visit First impressions of DALL-E, generating images from text

I made it off the DALL-E waiting list a few days ago and I’ve been having an enormous amount of fun experimenting with it. Here are some notes on what I’ve learned so far (and a bunch of example images too).

[... 2,102 words]

Joining CSV files in your browser using Datasette Lite

Visit Joining CSV files in your browser using Datasette Lite

I added a new feature to Datasette Lite—my version of Datasette that runs entirely in your browser using WebAssembly (previously): you can now use it to load one or more CSV files by URL, and then run SQL queries against them—including joins across data from multiple files.

[... 546 words]

Weeknotes: datasette-socrata, and the last 10%...

... takes 90% of the work. I continue to work towards a preview of the new Datasette Cloud, and keep finding new “just one more things” to delay inviting in users.

[... 1,214 words]