Simon Willison's Weblog: Research

Using DSPy to evaluate and improve Datasette Agent's SQL system prompts

2026-07-02T18:25:00+00:00

Using DSPy to evaluate and improve Datasette Agent's SQL system prompts

Leveraging the DSPy framework, this project evaluates and refines the core production system prompts used by Datasette Agent’s read-only SQL question answerer. The methodology involves a harness where DSPy agents invoke Datasette Agent’s actual tool implementations and prompts against a live in-process Datasette, and a gold-standard, auto-generated dataset provides rigorous evaluation via custom metrics.

Mapping SQLite result columns back to their source `table.column`

2026-06-13T23:05:00+00:00

Mapping SQLite result columns back to their source `table.column`

Determining the source `table.column` for each result column in arbitrary SQLite queries is feasible because SQLite computes this internally and exposes it via its column-metadata API when compiled with `SQLITE_ENABLE_COLUMN_METADATA`. While Python’s standard `sqlite3` module doesn’t surface this information, robust methods exist: using the third-party `apsw` library provides direct access with `cursor.description_full`, or a pure-stdlib ctypes bridge (`column_provenance.py`) can retrieve the…

Running untrusted queries: Datasette/SQLite vs psycopg/PostgreSQL

2026-06-11T04:17:00+00:00

Running untrusted queries: Datasette/SQLite vs psycopg/PostgreSQL

Exploring how untrusted SQL queries are safely run in Datasette (using SQLite) and whether similar protections can be applied with psycopg and PostgreSQL, this project shows that both can provide robust safeguards against data corruption and resource exhaustion. Datasette leverages hard read-only file modes and a VM progress handler for query timeouts, while PostgreSQL's privilege system enforces SELECT-only access and its `statement_timeout` cancels resource-intensive or sleeping queries.

Can DuckDB run untrusted SQL as safely as Datasette runs SQLite?

2026-06-10T23:41:00+00:00

Can DuckDB run untrusted SQL as safely as Datasette runs SQLite?

Investigating the security of running untrusted SQL in DuckDB compared to Datasette with SQLite, this project establishes that DuckDB can be sandboxed to match—and sometimes exceed—the safety of SQLite, but requires more than its basic `read_only=True` option. Datasette achieves safe SQL exposure by using engine-level read-only connections and opcode-based time limits in SQLite, which inherently prevents unauthorized file or network access.

Running Python ASGI apps in the browser via Pyodide + a service worker

2026-05-30T15:34:00+00:00

Running Python ASGI apps in the browser via Pyodide + a service worker

By running Python ASGI web applications entirely in the browser using Pyodide and a dedicated service worker, this project intercepts all same-origin requests under `/app/` and executes them against the Python app via the ASGI protocol—removing the need for a backend server except for static files. The mechanism is demonstrated with both a FastAPI demo and the full Datasette app, confirming its generality across ASGI apps.

pydantic-monty investigation

2026-05-22T22:41:00+00:00

pydantic-monty investigation

Reviewing `pydantic-monty` reveals it as a fast, minimal Python interpreter designed for controlled sandboxed execution, primarily useful when transforming data, branching, looping, and interacting with a select set of trusted host tools or a virtual filesystem. The interpreter purposefully omits large portions of CPython’s functionality, with clear boundaries: unsupported features and missing resources generally fail cleanly as structured errors rather than escaping into the host runtime.

TRE Python binding — ReDoS robustness demo

2026-05-04T17:52:00+00:00

TRE Python binding — ReDoS robustness demo

Demonstrating robust regex performance, this project offers a minimal Python ctypes binding to the TRE regex library, highlighting TRE’s immunity to regular expression denial-of-service (ReDoS) attacks that cripple Python's built-in `re` module. Key benchmarks show that TRE processes even notorious "evil" patterns on gigantic inputs (10 million characters) much faster than `re` on tiny ones, and scales linearly with input size instead of exponentially.

Claude system prompts as a git timeline

2026-04-18T12:17:00+00:00

Claude system prompts as a git timeline

Anthropic's published system prompt history for Claude is transformed into a git-based exploration tool, breaking up the monolithic markdown source into granular files and timestamped commits. By structuring extracted prompts per model, family, and revision, researchers can leverage `git log`, `diff`, and `blame` to trace prompt evolution, compare differences, and attribute changes to specific dates—all without manual parsing.

Exploring the new `servo` crate

2026-04-13T15:04:00+00:00

Exploring the new `servo` crate

After the April 2026 release of the `servo` v0.1.0 crate (blog post), a concise investigation shows that Servo is now an embeddable browser engine for Rust, with a clear API centered on the `ServoBuilder`, `WebView`, and pixel readback methods. A headless CLI (`servo-shot`) successfully renders URLs or HTML files to PNG, building against stable Rust with a robust software-based rendering pipeline.

QuickJS Python Sandbox — Investigation Report

2026-04-12T23:15:00+00:00

QuickJS Python Sandbox — Investigation Report

Exploring the `quickjs` Python package, this project implements an asyncio-compatible JavaScript sandbox with robust resource controls and seamless exposure of both synchronous and asynchronous Python functions (including async httpx fetches) to JavaScript code.

SQLite WAL Mode Across Docker Containers Sharing a Volume

2026-04-07T15:41:00+00:00

SQLite WAL Mode Across Docker Containers Sharing a Volume

SQLite’s WAL mode reliably supports concurrent access when two Docker containers share a volume on the same host, due to shared kernel and filesystem semantics. The experiment, using Docker Desktop for macOS and a named volume, demonstrated real-time propagation of database changes and effective memory-mapped file sharing by monitoring `.db-shm`.

Can JavaScript Escape a CSP Meta Tag Inside an Iframe?

2026-04-03T16:05:00+00:00

Can JavaScript Escape a CSP Meta Tag Inside an Iframe?

JavaScript running inside a `sandbox="allow-scripts"` iframe cannot escape or disable a `` tag, even through removal, modification, or document replacement. Extensive testing across Chromium and Firefox confirmed that CSP policies defined via meta tags are enforced at parse time, and persist even when the iframe is navigated to a data: URI.

Starlette 1.0 skill

2026-03-23T00:05:00+00:00

Starlette 1.0 skill

Starlette 1.0 Skill offers a concise guide for building robust web applications with Starlette, a lightweight ASGI framework. The accompanying demo showcases a task management app featuring projects, tasks, comments, and labels, illustrating Starlette's flexibility in handling routing, templating (Jinja2), async database operations (aiosqlite), and real-time updates.

PCGamer Article Performance Audit

2026-03-22T22:49:00+00:00

PCGamer Article Performance Audit

A performance audit of the March 2026 PCGamer article on RSS readers reveals severe page bloat, with over 82% of network traffic and transferred bytes traced to ad-tech, tracking, and programmatic advertising scripts. Despite the core content consisting of just 10-15 KB of text and a handful of images (~150 KB total), the page triggers over 431 network requests and 5.5 MB of transfer (18.8 MB decoded) within 60 seconds—ballooning to 200+ MB in Firefox due to autoplay video carousels and…

JavaScript Sandboxing Research

2026-03-22T19:53:00+00:00

JavaScript Sandboxing Research

Analyzing current JavaScript sandboxing options for running untrusted code, this research compares core approaches in Node.js (including worker_threads, node:vm, and the Permission Model), prominent npm packages (isolated-vm, vm2), and alternative engines like quickjs-emscripten.