Simon Willison’s Weblog

Subscribe
Atom feed

Releases

Filters: Sorted by date

I ran into trouble deploying a new feature using SSE to a production Datasette instance, and it turned out that instance was using datasette-gzip which uses asgi-gzip which was incorrectly compressing event/text-stream responses.

asgi-gzip was extracted from Starlette, and has a GitHub Actions scheduled workflow to check Starlette for updates that need to be ported to the library... but that action had stopped running and hence had missed Starlette's own fix for this issue.

I ran the workflow and integrated the new fix, and now datasette-gzip and asgi-gzip both correctly handle text/event-stream in SSE responses.

Release datasette-turnstile 0.1a3 — Configurable CAPTCHAs for Datasette paths using Cloudflare Turnstile
Release datasette-graphql 3.0a1 — Datasette plugin providing an automatic GraphQL API for your SQLite databases
Release datasette-atom 0.10a0 — Datasette plugin that adds a .atom output format
Release dogsheep-beta 0.11 — Build a search index across content from multiple SQLite database tables and run faceted searches against it using Datasette
Release datasette-template-sql 1.0.3 — Datasette plugin for executing SQL queries from templates
Release datasette-turnstile 0.1a2 — Configurable CAPTCHAs for Datasette paths using Cloudflare Turnstile
Release datasette-turnstile 0.1a1 — Configurable CAPTCHAs for Datasette paths using Cloudflare Turnstile
  • No longer requires Datasette - running uvx datasette-ports now works as well.
  • Installing it as a Datasette plugin continues to provide the datasette ports command.
  • New -r/--redact option which shows the list of matches, asks for confirmation and then replaces every match with REDACTED, taking escaping rules into account.
  • New Python function redact_file(file_path: str | Path, secrets: list[str], replacement: str = "REDACTED") -> int.

Another example of README-driven development, this time solving a problem that might be unique to me.

I often find myself running a bunch of different Datasette instances with different databases and different in-development plugins, spreads across dozens of different terminal windows - enough that I frequently lose them!

Now I can run this:

datasette install datasette-ports
datasette ports

And get a list of every running instance that looks something like this:

http://127.0.0.1:8333/ - v1.0a26
  Databases: data
  Plugins: datasette-enrichments, datasette-enrichments-llm, datasette-llm, datasette-secrets
http://127.0.0.1:8001/ - v1.0a26
  Databases: creatures
  Plugins: datasette-extract, datasette-llm, datasette-secrets
http://127.0.0.1:8900/ - v0.65.2
  Databases: logs
  • CLI tool now streams results as they are found rather than waiting until the end, which is better for large directories.
  • -d/--directory option can now be used multiple times to scan multiple directories.
  • New -f/--file option for specifying one or more individual files to scan.
  • New scan_directory_iter(), scan_file() and scan_file_iter() Python API functions.
  • New -v/--verbose option which shows each directory that is being scanned.
  • Added documentation of the escaping schemes that are also scanned.
  • Removed unnecessary repr escaping scheme, which was already covered by json.

I like publishing transcripts of local Claude Code sessions using my claude-code-transcripts tool but I'm often paranoid that one of my API keys or similar secrets might inadvertently be revealed in the detailed log files.

I built this new Python scanning tool to help reassure me. You can feed it secrets and have it scan for them in a specified directory:

uvx scan-for-secrets $OPENAI_API_KEY -d logs-to-publish/

If you leave off the -d it defaults to the current directory.

It doesn't just scan for the literal secrets - it also scans for common encodings of those secrets e.g. backslash or JSON escaping, as described in the README.

If you have a set of secrets you always want to protect you can list commands to echo them in a ~/.scan-for-secrets.conf.sh file. Mine looks like this:

llm keys get openai
llm keys get anthropic
llm keys get gemini
llm keys get mistral
awk -F= '/aws_secret_access_key/{print $2}' ~/.aws/credentials | xargs

I built this tool using README-driven-development: I carefully constructed the README describing exactly how the tool should work, then dumped it into Claude Code and told it to build the actual tool (using red/green TDD, naturally.)

I'm working on a major change to my LLM Python library and CLI tool. LLM provides an abstraction layer over hundreds of different LLMs from dozens of different vendors thanks to its plugin system, and some of those vendors have grown new features over the past year which LLM's abstraction layer can't handle, such as server-side tool execution.

To help design that new abstraction layer I had Claude Code read through the Python client libraries for Anthropic, OpenAI, Gemini and Mistral and use those to help craft curl commands to access the raw JSON for both streaming and non-streaming modes across a range of different scenarios. Both the scripts and the captured outputs now live in this new repo.

New models gemini-3.1-flash-lite-preview, gemma-4-26b-a4b-it and gemma-4-31b-it. See my notes on Gemma 4.

  • The same model ID no longer needs to be repeated in both the default model and allowed models lists - setting it as a default model automatically adds it to the allowed models list. #6
  • Improved documentation for Python API usage.
  • The actor who triggers an enrichment is now passed to the llm.mode(... actor=actor) method. #3
  • This plugin now uses datasette-llm to configure and manage models. This means it's possible to specify which models should be made available for enrichments, using the new enrichments purpose.
  • Removed features relating to allowances and estimated pricing. These are now the domain of datasette-llm-accountant.
  • Now depends on datasette-llm for model configuration. #3
  • Full prompts and responses and tool calls can now be logged to the llm_usage_prompt_log table in the internal database if you set the new datasette-llm-usage.log_prompts plugin configuration setting.
  • Redesigned the /-/llm-usage-simple-prompt page, which now requires the llm-usage-simple-prompt permission.
  • The llm_prompt_context() plugin hook wrapper mechanism now tracks prompts executed within a chain as well as one-off prompts, which means it can be used to track tool call loops. #5

I released llm-echo 0.3 to provide an API key testing utility I needed for the tests for this new feature.

LLM plugins can define new models in both sync and async varieties. The async variants are most common for API-backed models - sync variants tend to be things that run the model directly within the plugin.

My llm-mrchatterbox plugin is sync only. I wanted to try it out with various Datasette LLM features (specifically datasette-enrichments-llm) but Datasette can only use async models.

So... I had Claude spin up this plugin that turns sync models into async models using a thread pool. This ended up needing an extra plugin hook mechanism in LLM itself, which I shipped just now in LLM 0.30.

  • The register_models() plugin hook now takes an optional model_aliases parameter listing all of the models, async models and aliases that have been registered so far by other plugins. A plugin with @hookimpl(trylast=True) can use this to take previously registered models into account. #1389
  • Added docstrings to public classes and methods and included those directly in the documentation.
  • Prompts now have the input_tokens and output_tokens fields populated on the response.

I'm working on integrating datasette-files into other plugins, such as datasette-extract. This necessitated a new release of the base plugin.

  • owners_can_edit and owners_can_delete configuration options, plus the files-edit and files-delete actions are now scoped to a new FileResource which is a child of FileSourceResource. #18
  • The file picker UI is now available as a <datasette-file-picker> Web Component. Thanks, Alex Garcia. #19
  • New from datasette_files import get_file Python API for other plugins that need to access file data. #20

Adds the ability to configure which LLMs are available for which purpose, which means you can restrict the list of models that can be used with a specific plugin. #3

Release llm-mrchatterbox 0.1.1 — Chat with Mr Chatterbox, trained on a corpus of over 28,000 Victorian-era British texts published between 1837 and 1899