Simon Willison’s Weblog

Series of posts

New features in sqlite-utils

Any time I introduce a significant new feature in a release of my sqlite-utils package I write about it here.

  1. sqlite-utils: a Python library and CLI tool for building SQLite databases - Feb. 25, 2019, 3:29 a.m.
  2. Fun with binary data and SQLite - July 30, 2020, 11:22 p.m.
  3. Executing advanced ALTER TABLE operations in SQLite - Sept. 23, 2020, 1 a.m.
  4. Refactoring databases with sqlite-utils extract - Sept. 23, 2020, 4:02 p.m.
  5. Joining CSV and JSON data with an in-memory SQLite database - June 19, 2021, 10:55 p.m.
  6. Apply conversion functions to data in SQLite columns with the sqlite-utils CLI tool - Aug. 6, 2021, 6:05 a.m.
  7. What's new in sqlite-utils 3.20 and 3.21: --lines, --text, --convert - Jan. 11, 2022, 6:19 p.m.

Misconceptions about large language models

Large Language Models can behave in very unintuitive ways!

  1. ChatGPT can't access the internet, even though it really looks like it can - March 10, 2023, 1:41 p.m.
  2. Don't trust AI to talk accurately about itself: Bard wasn't trained on Gmail - March 22, 2023, 3:13 a.m.

Prompt injection

A security vulnerability in software built on top of large AI language models like GPT-3.

  1. Prompt injection attacks against GPT-3 - Sept. 12, 2022, 10:20 p.m.
  2. I don't know how to solve prompt injection - Sept. 16, 2022, 4:28 p.m.
  3. You can't solve AI security problems with more AI - Sept. 17, 2022, 10:57 p.m.
  4. A new AI game: Give me ideas for crimes to do - Dec. 4, 2022, 3:11 p.m.
  5. Bing: "I will not harm you unless you harm me first" - Feb. 15, 2023, 3:05 p.m.

LLMs on personal devices

Large language models that can run on our own devices open up exciting new ways in which these tools can be used.

  1. Large language models are having their Stable Diffusion moment - March 11, 2023, 7:15 p.m.
  2. Stanford Alpaca, and the acceleration of on-device large language model development - March 13, 2023, 7:19 p.m.
  3. Could you train a ChatGPT-beating model for $85,000 and run it in a browser? - March 17, 2023, 3:43 p.m.

How it's trained

Investigating the training data behind different machine learning models.

  1. Exploring the training data behind Stable Diffusion - Sept. 5, 2022, 12:18 a.m.
  2. Exploring 10m scraped Shutterstock videos used to train Meta's Make-A-Video text-to-video model - Sept. 29, 2022, 7:31 p.m.
  3. Exploring MusicCaps, the evaluation data released to accompany Google's MusicLM text-to-music model - Jan. 27, 2023, 9:34 p.m.

Datasette Lite

A distribution of Datasette that runs entirely in the browser, using WebAssembly and Pyodide.

  1. Datasette Lite: a server-side Python web application running in a browser - May 4, 2022, 3:16 p.m.
  2. Joining CSV files in your browser using Datasette Lite - June 20, 2022, 9:20 p.m.
  3. Plugin support for Datasette Lite - Aug. 17, 2022, 6:20 p.m.
  4. Analyzing ScotRail audio announcements with Datasette - from prototype to production - Aug. 21, 2022, 2:04 a.m.
  5. Weeknotes: Datasette Lite, s3-credentials, shot-scraper, datasette-edit-templates and more - Sept. 16, 2022, 2:55 a.m.

VaccinateCA internal blog

I maintained an internal blog between February and April 2021 during my time at VaccinateCA / Vaccinate The States.

  1. Getting started - Feb. 22, 2021, 5 p.m.
  2. Spinning up a new Django app to act as a backend for VaccinateCA - Feb. 23, 2021, 5 p.m.
  3. Importing data from Airtable into Django, plus a search engine for all our code - Feb. 24, 2021, 5 p.m.
  4. Django admin customization, JSON in our PostgreSQL - Feb. 25, 2021, 5 p.m.
  5. Drawing the rest of the owl - March 1, 2021, 5 p.m.
  6. API ready for testing, first video status update - March 2, 2021, 5 p.m.
  7. Replaying logs to exercise the new API - March 3, 2021, 5 p.m.
  8. The simplest possible call queue - March 6, 2021, 5 p.m.
  9. New call queue ready to test. Also geography. - March 7, 2021, 5 p.m.
  10. APIs for importing locations - March 9, 2021, 5 p.m.
  11. VIAL is now live, plus django-sql-dashboard - March 15, 2021, 5 p.m.
  12. The Airtable formulas at the heart of everything - March 23, 2021, 5 p.m.
  13. VIAL: Preparing for some collaborative testing - April 1, 2021, 5 p.m.
  14. A CSV export, JSON import workflow for bulk updating our data - April 28, 2021, 5 p.m.

Git scraping

A technique for scraping content into a Git repository to track changes to it over time.

  1. Scraping hurricane Irma - Sept. 10, 2017, 6:21 a.m.
  2. Changelogs to help understand the fires in the North Bay - Oct. 10, 2017, 6:48 a.m.
  3. Generating a commit log for San Francisco's official list of trees - March 13, 2019, 2:49 p.m.
  4. Tracking PG&E outages by scraping to a git repo - Oct. 10, 2019, 11:32 p.m.
  5. Git scraping: track changes over time by scraping to a Git repository - Oct. 9, 2020, 6:27 p.m.
  6. Git scraping, the five minute lightning talk - March 5, 2021, 12:44 a.m.
  7. git-history: a tool for analyzing scraped data collected using Git and SQLite - Dec. 7, 2021, 10:32 p.m.
  8. Help scraping: track changes to CLI tools by recording their --help using Git - Feb. 2, 2022, 11:46 p.m.
  9. shot-scraper: automated screenshots for documentation, built on Playwright - March 10, 2022, 12:13 a.m.
  10. Scraping web pages from the command line with shot-scraper - March 14, 2022, 1:29 a.m.
  11. Automatically opening issues when tracked file content changes - April 28, 2022, 5:18 p.m.
  12. Measuring traffic during the Half Moon Bay Pumpkin Festival - Oct. 19, 2022, 3:41 p.m.
  13. Tracking Mastodon user numbers over time with a bucket of tricks - Nov. 20, 2022, 7 a.m.

Datasette: The annotated release notes

I like to accompany significant releases of my Datasette project with an annotated version of the release notes, providing extra background context on new features in the release.

  1. Datasette 0.44: The annotated release notes - June 12, 2020, 3:11 a.m.
  2. Datasette 0.45: The annotated release notes - July 1, 2020, 10:33 p.m.
  3. Datasette 0.49: The annotated release notes - Sept. 15, 2020, 11:45 p.m.
  4. Datasette 0.50: The annotated release notes - Oct. 9, 2020, 8:23 p.m.
  5. Datasette 0.54: The annotated release notes - Jan. 25, 2021, 5:31 p.m.
  6. Datasette 0.58: The annotated release notes - July 16, 2021, 2:21 a.m.
  7. Datasette Desktop 0.2.0: The annotated release notes - Sept. 13, 2021, 11:30 p.m.
  8. Datasette 0.59: The annotated release notes - Oct. 19, 2021, 4:59 a.m.
  9. Datasette 0.60: The annotated release notes - Jan. 14, 2022, 2:30 a.m.
  10. Datasette 0.61: The annotated release notes - March 24, 2022, 1:53 a.m.
  11. Datasette 0.63: The annotated release notes - Oct. 27, 2022, 10:13 p.m.
  12. Datasette's new JSON write API: The first alpha of Datasette 1.0 - Dec. 2, 2022, 11:15 p.m.
  13. Datasette 1.0a2: Upserts and finely grained permissions - Dec. 15, 2022, 5:58 p.m.
  14. Datasette 0.64, with a warning about SpatiaLite - Jan. 9, 2023, 9:22 p.m.

My open source process

Articles about the process I use for developing my open source projects.

  1. Documentation unit tests - July 28, 2018, 3:59 p.m.
  2. How to cheat at unit tests with pytest and Black - Feb. 11, 2020, 6:56 a.m.
  3. Open source projects: consider running office hours - Feb. 19, 2021, 9:54 p.m.
  4. How to build, test and publish an open source Python library - Nov. 4, 2021, 10:02 p.m.
  5. How I build a feature - Jan. 12, 2022, 6:10 p.m.
  6. Writing better release notes - Jan. 31, 2022, 8:13 p.m.
  7. Software engineering practices - Oct. 1, 2022, 3:56 p.m.
  8. Automating screenshots for the Datasette documentation using shot-scraper - Oct. 14, 2022, 11:44 p.m.
  9. The Perfect Commit - Oct. 29, 2022, 8:41 p.m.
  10. Coping strategies for the serial project hoarder - Nov. 26, 2022, 3:47 p.m.