Simon Willison’s Weblog

Subscribe
Atom feed

Elsewhere

Filters: Sorted by date

Release datasette-atom 0.10a0 — Datasette plugin that adds a .atom output format
Release dogsheep-beta 0.11 — Build a search index across content from multiple SQLite database tables and run faceted searches against it using Datasette
Release datasette-template-sql 1.0.3 — Datasette plugin for executing SQL queries from templates
Release datasette-turnstile 0.1a2 — Configurable CAPTCHAs for Datasette paths using Cloudflare Turnstile
Release datasette-turnstile 0.1a1 — Configurable CAPTCHAs for Datasette paths using Cloudflare Turnstile

Inspired by this conversation on Hacker News about whether two SQLite processes in separate Docker containers that share the same volume might run into problems due to WAL shared memory. The answer is that everything works fine - Docker containers on the same host and filesystem share the same shared memory in a way that allows WAL to collaborate as it should.

  • No longer requires Datasette - running uvx datasette-ports now works as well.
  • Installing it as a Datasette plugin continues to provide the datasette ports command.
  • New -r/--redact option which shows the list of matches, asks for confirmation and then replaces every match with REDACTED, taking escaping rules into account.
  • New Python function redact_file(file_path: str | Path, secrets: list[str], replacement: str = "REDACTED") -> int.

Super-niche tool this. I sometimes copy prompts out of the Claude Code terminal app and they come out with a bunch of weird additional whitespace. This tool cleans that up.

Screenshot of a web tool titled "Cleanup Claude Code Paste" with the subtitle "Paste terminal output to remove the ❯ prompt, fix wrapped-line whitespace, and join lines into clean text." An input textarea contains pasted terminal output starting with "❯ Add a -r/--redact option which asks for user approval (after telling it how many replacements will happen and in which files and which lines – standard output basically) and then rewrites the files in that folder to replace all matched secrets with REDACTED. Run tests with 'uv run pytest' and use red/green TDD". Below is a "Cleaned output:" section showing the same text with the ❯ prompt removed and whitespace cleaned up. A blue "Copy to clipboard" button appears at the bottom.

Another example of README-driven development, this time solving a problem that might be unique to me.

I often find myself running a bunch of different Datasette instances with different databases and different in-development plugins, spreads across dozens of different terminal windows - enough that I frequently lose them!

Now I can run this:

datasette install datasette-ports
datasette ports

And get a list of every running instance that looks something like this:

http://127.0.0.1:8333/ - v1.0a26
  Databases: data
  Plugins: datasette-enrichments, datasette-enrichments-llm, datasette-llm, datasette-secrets
http://127.0.0.1:8001/ - v1.0a26
  Databases: creatures
  Plugins: datasette-extract, datasette-llm, datasette-secrets
http://127.0.0.1:8900/ - v0.65.2
  Databases: logs

Lalit Maganti's syntaqlite is currently being discussed on Hacker News thanks to Eight years of wanting, three months of building with AI, a deep dive into how it was built.

This inspired me to revisit a research project I ran when Lalit first released it a couple of weeks ago, where I tried it out and then compiled it to a WebAssembly wheel so it could run in Pyodide in a browser (the library itself uses C and Rust).

This new playground loads up the Python library and provides a UI for trying out its different features: formating, parsing into an AST, validating, and tokenizing SQLite SQL queries.

Screenshot of a dark-themed SQL validation playground called SyntaqLite. The "Validate" tab is selected from options including Format, Parse, Validate, and Tokenize. The SQL input contains "SELECT id, name FROM usr WHERE active = 1" with a schema defining "users" and "posts" tables. Example buttons for "Table typo", "Column typo", and "Valid query" are shown above a red "Validate SQL" button. The Diagnostics panel shows an error for unknown table 'usr' with the suggestion "did you mean 'users'?", and the JSON panel displays the corresponding error object with severity, message, and offset fields.

Update: not sure how I missed this but syntaqlite has its own WebAssembly playground linked to from the README.

  • CLI tool now streams results as they are found rather than waiting until the end, which is better for large directories.
  • -d/--directory option can now be used multiple times to scan multiple directories.
  • New -f/--file option for specifying one or more individual files to scan.
  • New scan_directory_iter(), scan_file() and scan_file_iter() Python API functions.
  • New -v/--verbose option which shows each directory that is being scanned.
  • Added documentation of the escaping schemes that are also scanned.
  • Removed unnecessary repr escaping scheme, which was already covered by json.

I like publishing transcripts of local Claude Code sessions using my claude-code-transcripts tool but I'm often paranoid that one of my API keys or similar secrets might inadvertently be revealed in the detailed log files.

I built this new Python scanning tool to help reassure me. You can feed it secrets and have it scan for them in a specified directory:

uvx scan-for-secrets $OPENAI_API_KEY -d logs-to-publish/

If you leave off the -d it defaults to the current directory.

It doesn't just scan for the literal secrets - it also scans for common encodings of those secrets e.g. backslash or JSON escaping, as described in the README.

If you have a set of secrets you always want to protect you can list commands to echo them in a ~/.scan-for-secrets.conf.sh file. Mine looks like this:

llm keys get openai
llm keys get anthropic
llm keys get gemini
llm keys get mistral
awk -F= '/aws_secret_access_key/{print $2}' ~/.aws/credentials | xargs

I built this tool using README-driven-development: I carefully constructed the README describing exactly how the tool should work, then dumped it into Claude Code and told it to build the actual tool (using red/green TDD, naturally.)

I'm working on a major change to my LLM Python library and CLI tool. LLM provides an abstraction layer over hundreds of different LLMs from dozens of different vendors thanks to its plugin system, and some of those vendors have grown new features over the past year which LLM's abstraction layer can't handle, such as server-side tool execution.

To help design that new abstraction layer I had Claude Code read through the Python client libraries for Anthropic, OpenAI, Gemini and Mistral and use those to help craft curl commands to access the raw JSON for both streaming and non-streaming modes across a range of different scenarios. Both the scripts and the captured outputs now live in this new repo.

In trying to build my own version of Claude Artifacts I got curious about options for applying CSP headers to content in sandboxed iframes without using a separate domain to host the files. Turns out you can inject <meta http-equiv="Content-Security-Policy"...> tags at the top of the iframe content and they'll be obeyed even if subsequent untrusted JavaScript tries to manipulate them.

New models gemini-3.1-flash-lite-preview, gemma-4-26b-a4b-it and gemma-4-31b-it. See my notes on Gemma 4.

  • The same model ID no longer needs to be repeated in both the default model and allowed models lists - setting it as a default model automatically adds it to the allowed models list. #6
  • Improved documentation for Python API usage.
  • The actor who triggers an enrichment is now passed to the llm.mode(... actor=actor) method. #3
  • This plugin now uses datasette-llm to configure and manage models. This means it's possible to specify which models should be made available for enrichments, using the new enrichments purpose.
  • Removed features relating to allowances and estimated pricing. These are now the domain of datasette-llm-accountant.
  • Now depends on datasette-llm for model configuration. #3
  • Full prompts and responses and tool calls can now be logged to the llm_usage_prompt_log table in the internal database if you set the new datasette-llm-usage.log_prompts plugin configuration setting.
  • Redesigned the /-/llm-usage-simple-prompt page, which now requires the llm-usage-simple-prompt permission.
  • The llm_prompt_context() plugin hook wrapper mechanism now tracks prompts executed within a chain as well as one-off prompts, which means it can be used to track tool call loops. #5

I released llm-echo 0.3 to provide an API key testing utility I needed for the tests for this new feature.

LLM plugins can define new models in both sync and async varieties. The async variants are most common for API-backed models - sync variants tend to be things that run the model directly within the plugin.

My llm-mrchatterbox plugin is sync only. I wanted to try it out with various Datasette LLM features (specifically datasette-enrichments-llm) but Datasette can only use async models.

So... I had Claude spin up this plugin that turns sync models into async models using a thread pool. This ended up needing an extra plugin hook mechanism in LLM itself, which I shipped just now in LLM 0.30.

  • The register_models() plugin hook now takes an optional model_aliases parameter listing all of the models, async models and aliases that have been registered so far by other plugins. A plugin with @hookimpl(trylast=True) can use this to take previously registered models into account. #1389
  • Added docstrings to public classes and methods and included those directly in the documentation.
  • Prompts now have the input_tokens and output_tokens fields populated on the response.

I'm working on integrating datasette-files into other plugins, such as datasette-extract. This necessitated a new release of the base plugin.

  • owners_can_edit and owners_can_delete configuration options, plus the files-edit and files-delete actions are now scoped to a new FileResource which is a child of FileSourceResource. #18
  • The file picker UI is now available as a <datasette-file-picker> Web Component. Thanks, Alex Garcia. #19
  • New from datasette_files import get_file Python API for other plugins that need to access file data. #20

Adds the ability to configure which LLMs are available for which purpose, which means you can restrict the list of models that can be used with a specific plugin. #3