Simon Willison’s Weblog

Subscribe
Atom feed

Elsewhere

Filters: Sorted by date

It would be neat if arbitrary SQL queries in Datasette could be rendered with additional information based on which columns from which tables were included in the results.

To build that, we would need to be able to look at a SQL query like select users.name, orders.total from users join orders on orders.user_id = users.id and programmatically identify the table.column for each result - navigating not just joins but also more complex syntax like CTEs.

I decided to set Claude Code (Opus 4.8, since Fable is currently banned by the US government) on the problem. It found several promising solutions - one using apsw, another that uses ctypes to access the SQLite sqlite3_column_table_name() C function (which is not otherwise exposed to Python), and one using clever interrogation of the output of EXPLAIN.

Sighting 7:49 PM — Harbor Seal, in Monterey Bay National Marine Sanctuary, CA, US, CA
Harbor Seal
Harbor Seal
Harbor Seal
Harbor Seal

This alpha is a significant step on the road to a stable 1.0, finally extending the ?_extra= pattern I introduced in Datasette 1.0a3 to cover queries and rows in addition to tables. That pattern is also now documented!

I wrote a whole lot more about the new release on the Datasette project blog: Datasette 1.0a33 with JSON extras in the API.

Because API explorer tools are almost free to build now I had Claude Fable 5 in Claude Code (for the plan) and GPT-5.5 xhigh in Codex Desktop (for the implementation) build me this custom extras API explorer to help demonstrate the feature:

Screenshot of a web application titled "Datasette extras explorer". A URL input field contains https://latest.datasette.io/fixtures/facetable.json with a teal Explore button next to it. Below, a left panel labeled EXTRAS (30) lists checkboxes: all_columns - All columns in the table, regardless of _col/_nocol filtering; column_types - Column type assignments for this table; columns (checked) - Column names returned by this query; count - Total count of rows matching these filters; count_sql - SQL query used to calculate the total count; custom_table_templates - Custom template names considered for this table; database - Database name; database_color - Color assigned to the database. A right panel labeled RESPONSE shows GET /fixtures/fac… with Copy JSON and Copy URL buttons, then a dark JSON viewer showing 200 - 9.9 KB - 114ms and JSON: "ok": true, "next": null, "columns": (highlighted array) "pk", "created", "planet_int", "on_earth", "state", "_city_id", "_neighborhood", "tags", "complex_array", "distinct_some_null", "n", "rows": list of objects.

Tool Datasette extras explorer — Explore Datasette JSON endpoints and discover available extras by entering a URL and selecting which extras to request. The interface displays the returned JSON response with syntax highlighting and allows you to examine which extras are supported by your Datasette instance, with support for sharing configurations via URL hash parameters.

I built this utility library to support an asyncio dependency injection pattern a few years ago. I was using it with Datasette and Claude Fable 5 spotted some bugs in the dependency which it then fixed for me. It's a very proactive model!

Research Running untrusted queries: Datasette/SQLite vs psycopg/PostgreSQL — Exploring how untrusted SQL queries are safely run in Datasette (using SQLite) and whether similar protections can be applied with psycopg and PostgreSQL, this project shows that both can provide robust safeguards against data corruption and resource exhaustion. Datasette leverages hard read-only file modes and a VM progress handler for query timeouts, while PostgreSQL's privilege system enforces SELECT-only access and its `statement_timeout` cancels resource-intensive or sleeping queries.

Highlights from the release notes:

  • Tools can now ask the user questions mid-execution. Tools that declare a context parameter receive a ToolContext object, and await context.ask_user(...) can ask a yes/no, multiple-choice (options=[...]) or free-text (free_text=True) question. While a question is unanswered the agent turn suspends: the question renders as a form in the chat UI and persists to the internal database, so suspended conversations survive a server restart. Once answered, the tool re-executes from the top with stored answers replayed, so call ask_user() before performing side effects. #20
  • New built-in save_query tool: the agent can save SQL it has written as a Datasette stored query. Saving always requires human approval - the agent shows the full SQL plus the proposed name, database and visibility, and nothing is stored until you click Yes. #20

The ask_user() feature was enabled by the new LLM alpha I built yesterday with the help of Claude Fable 5.

Research Can DuckDB run untrusted SQL as safely as Datasette runs SQLite? — Investigating the security of running untrusted SQL in DuckDB compared to Datasette with SQLite, this project establishes that DuckDB can be sandboxed to match—and sometimes exceed—the safety of SQLite, but requires more than its basic `read_only=True` option. Datasette achieves safe SQL exposure by using engine-level read-only connections and opcode-based time limits in SQLite, which inherently prevents unauthorized file or network access.
Sighting 10:00 AM – 10:20 AM — Brown Pelican, European Starling, Great Blue Heron, in Monterey Bay National Marine Sanctuary, CA, US, CA
Brown Pelican
Brown Pelican
European Starling
European Starling
Great Blue Heron
Great Blue Heron
Great Blue Heron
Great Blue Heron

Almost entirely written by the new Claude Fable 5, see my write-up for more details.

None

I've been really enjoying AgentsView by Wes McKinney as a tool for exploring my token usage across different coding agents running on my laptop.

Claude Fable 5 came out today and wasn't yet included in the pricing database AgentsView uses. I used Fable to reverse-engineer AgentsView and figured out this recipe for setting custom prices.

Here's my Claude Fable 5 usage for today so far, plotted by AgentsView as a treemap across my different local projects:

Screenshot of a cost analytics dashboard. Cost Attribution - Click to hide from chart - toggle buttons for Project / Model / Agent and Treemap / List. A treemap shows a large red block: prod_datasette_agent $74.06 89.3%, then blue: cloud $3.98 4.8%, teal: datasette $2.81 3.4%, pink: money $1.92 2.3%, and a thin orange sliver. A legend lists 1 prod_datasette_agent $74.06, 2 cloud $3.98, 3 datasette $2.81, 4 money $1.92, 5 simon $0.15. Below left, Top Sessions by Cost: 1 Claude - Review ./datasette-agent and ./datasette-apps - we are going to a... - prod_datasette_agent · 08a1f374-0e77-420f-be2d-af805d67e8aa - 55.9M $74.06; 2 Claude - issues.db is a copy of the Datasette issues database. There are a... - datasette · 8caa2d2d-b91f-43b3-bf3a-4268995b6011 - 826.8k $2.81; 3 Claude - Consult fly-docs and then look at datasette.cloud (which launche... - cloud · bfcacc70-09d7-4b27-aaec-4bb8accd9fec - 924.7k $2.61; 4 Claude - simonwillisonblog.db is a copy of my blog, plus all my software re... - money · 0c0fb9dc-6347-4e1b-9307-3709a7cdf0c8 - 542.9k $1.92; 5 Claude - Look in datasette.cloud and figure out all remaining steps and dec... - cloud · 45963b5f-608a-4caa-ad6b-6ae81e1dbf0d - 455k $1.37; 6 Claude - simon - simon · deeccb5d-9e90-4b1e-bfe6-c2b271e1b1d4 - 26.4k $0.15. Below right, Cache Efficiency with horizontal bars: Cache Reads 57.6M (nearly full green bar), Cache Writes 769.3K, Uncached Input 64.4K, Output 300.9K (all tiny bars), and a green highlighted note: $516.62 saved vs uncached.

I'm planning several plugins for Datasette Agent which can make edits to existing pieces of text - things like collaborative Markdown editing, updating large SQL queries, and editing SVG files.

Agentic editing of text is a little tricky to get right. My favorite published design for this is for the Claude text editor, which implements the following tools:

  • view - view sections of a file, with line numbers added to every line.
  • str_replace - find an exact old_str and replace it with new_str - fail if the original string is not unique
  • insert - insert the specified text after the specified line number

Rather than recreate these patterns for every plugin that needs them I decided to create this base plugin, datasette-agent-edit, which implements the core tools in a way that allows them to be adapted for other plugins.

Sighting 8:37 AM – 9:38 AM — Western Bluebird, California Brown Pelican, Osprey, in San Mateo County, CA, US
Western Bluebird
Western Bluebird
California Brown Pelican
California Brown Pelican
Osprey
Osprey
Osprey
Osprey
Sighting 5:08 PM — Hummingbirds, in San Mateo County, CA, US
Hummingbirds
Hummingbirds
Hummingbirds
Hummingbirds
Hummingbirds
Hummingbirds
Sighting 1:07 PM — Common Raven, in San Mateo County, CA, US
Common Raven
Common Raven
Common Raven
Common Raven

I added a CLI to micropython-wasm (issue #7), inspired by the first draft of the blog entry when I realized it would be a great way to illustrate the Try it yourself section.

Sighting 7:41 PM – 7:44 PM — Pacific Harbor Seal, California Brown Pelican, in Monterey Bay National Marine Sanctuary, CA, US, CA
Pacific Harbor Seal
Pacific Harbor Seal
Pacific Harbor Seal
Pacific Harbor Seal
California Brown Pelican
California Brown Pelican
Sighting 11:31 AM — Black Phoebe, in San Mateo County, CA, US
Black Phoebe
Black Phoebe

I want Datasette Agent to be able to generate and execute Python code safely. This alpha is looking promising so far. GPT-5.5 has so far failed to break out of the sandbox!

Fixes for some limitations that emerged while I was trying to use this to build datasette-agent-micropython.

Sighting 11:17 AM — California Brown Pelican, in Fort Mason, CA, US
California Brown Pelican
California Brown Pelican

I'm at the Microsoft Build conference today, held at Fort Mason in San Francisco. There are California Brown Pelicans diving into the water directly behind venue!

I really like how you can paste a large volume of text into claude.ai (or the Claude desktop/mobile apps) and it will detect it as a large paste and turn it into a file attachment instead.

I decided to have Codex desktop build me a version of that as a prototype.

You can also open files directly - including images which will be shown as thumbnails - or drag files onto the textarea.

My latest sandboxing experiment: This alpha package bundles a lightly customized WASM build of MicroPython with a wrapper to execute code in it via wasmtime.

A minor bugfix release. Fixes a bug with INSERT ... RETURNING queries via the new /db/-/execute-write endpoint and a bunch of base_url issues which showed up when I was experimenting with Service Workers yesterday.

Sighting 2:52 PM – 2:55 PM — California Brown Pelican, Pacific Harbor Seal, in Monterey Bay National Marine Sanctuary, CA, US, CA
California Brown Pelican
California Brown Pelican
California Brown Pelican
California Brown Pelican
Pacific Harbor Seal
Pacific Harbor Seal
California Brown Pelican
California Brown Pelican
Sighting 4:00 PM — California Sea Lion, in Pillar Point Harbor, CA, US
California Sea Lion
California Sea Lion

Datasette Lite is my version of Datasette that runs entirely in the browser using Pyodide in WebAssembly.

When I first built it four years ago I used Web Workers and code that intercepts navigation operations and fetches the generated HTML by running the Python app.

This worked, but had the disadvantage that any JavaScript in <script> tags would not be executed - breaking some Datasette functionality and a whole lot of Datasette plugins.

This morning I set Claude Opus 4.8 the task (in Claude Code for web) of figuring out how to run Python ASGI apps in Pyodide using Service Workers instead, and it seems to work! Here's a basic ASGI FastCGI demo and here's a demo that runs Datasette 1.0a31.

I'm still getting my head around exactly how it works, but once I've done that I plan to upgrade Datasette Lite itself.

Another significant alpha release, with two new headline features.

Datasette now offers users with the necessary permissions the ability to both execute write queries against their database and to save stored queries (renamed from "canned queries") both privately and for use by other members of their Datasette instance.

There's more detail in SQL write queries and stored queries in Datasette 1.0a31 on the Datasette blog, which now has three posts introducing new features since the blog launched two weeks ago.

Here's an animated demo from the blog post showing how the new execute query interface lets people get started with templated insert/update/delete queries from tables they have permission to edit:

The user starts on the data database page, selects actions and "Execute write SQL", then selects the insert document template on the next page and executes it with a title of "My document!". Also demonstrates that a create table statement cannot be executed because the user does not have create-table permission.

  • New model: Claude Opus 4.8 (claude-opus-4.8).
  • New -o fast 1 option for fast mode, for organizations with that feature enabled on their account.
  • Default max_tokens for each model now defaults to that model's maximum output rather than 8,192. #72

See also my notes on Opus 4.8 - I used this new release of llm-anthropic to generate the pelicans.