Simon Willison’s Weblog

Subscribe

Research

Filters: Sorted by date

Research blog-header-alignment — Addressing a subtle header alignment issue on simonwillison.net, this investigation tracked down a persistent ~1px height mismatch between left and right headers caused by anchor elements generating taller inline boxes than plain text due to font metrics. Multiple fixes—including removing position:relative/top:1px hacks and setting explicit heights—proved fragile.
Research sqlite-hamming-extension — Exploring efficient Hamming distance search in SQLite for binary embeddings, this project implements both a scalar function extension and a virtual table extension as described in "Hamming Distance for Hybrid Search in SQLite". The scalar function scans and sorts rows to locate nearest matches, while the virtual table caches embeddings and leverages a max-heap to deliver top-k results up to seven times faster.
Research sqlite-chronicle-vs-history-json — Using both sqlite-chronicle and sqlite-history-json on the same SQLite table is feasible, as each library installs its own set of triggers and companion tables without interfering with standard CRUD operations. Chronicle focuses on efficient sync/versioning, while history-json offers a complete audit log, and both operate independently even with compound primary keys or concurrent audit groups.
Research guidepup-screenreader-investigation — An investigation into Guidepup reveals that its core package does not support Linux—only macOS (VoiceOver) and Windows (NVDA). However, two practical methods were proven for generating audio screen reader sessions on Linux: one uses the AT-SPI accessibility stack and Orca to walk a real browser's accessibility tree and synthesize narration; the other employs the virtual screen reader (pure JS, fast) to simulate navigation, then builds audio from spoken phrases.
Research seaweedfs-testing — SeaweedFS version 4.12 was evaluated on Linux x86_64, demonstrating its functionality as a scalable distributed file system through its core blob store, filer, S3-compatible, and WebDAV APIs. All-in-one deployment via `weed mini` enables access to web UIs for cluster administration, filer usage, and volume monitoring (Admin UI screenshot). Testing confirmed seamless file operations across HTTP, S3, WebDAV, including directory management, standard HTTP features, and multiple URL formats.
Research openai-api-skills — OpenAI's Skills API enables models to execute reusable, self-contained scripts and tools by packaging instructions and code (plus optional assets) with a `SKILL.md` manifest. This project demonstrates crafting a custom skill (“csv-insights”), uploading it via the `/v1/skills` endpoint, and invoking it in natural language through the Responses API’s hosted shell environment, where the model installs dependencies, executes scripts, and returns outputs such as markdown reports and plots.
Research cysqlite-wasm-wheel — By cross-compiling cysqlite, a high-performance Cython-based SQLite3 binding, to WebAssembly with Emscripten, this project delivers a ready-to-use wheel for Pyodide that enables rapid, native-like SQLite operations directly in browser-based Python environments. The build pipeline automates all necessary steps, from fetching dependencies to ensuring compatibility with Pyodide 0.25.x (Python 3.11, Emscripten 3.1.46).
Research go-rod-cli — Leveraging the rod browser automation library, rod-cli provides a lightweight Go-based command-line tool for scripting persistent headless Chrome sessions. Each CLI command connects to and manipulates the same long-running Chrome instance via DevTools Protocol, enabling seamless multi-step browser automation in shell scripts or interactive use.
Research rod-library-research — Rod is an advanced Go library designed to automate Chrome browsers using the Chrome DevTools Protocol, providing a comprehensive API for web scraping, browser control, element interaction, and robust waiting strategies. With high-level convenience methods (such as Must-prefixed methods for fast scripting) and direct protocol access, Rod enables streamlined workflows from simple scraping to complex automation scenarios, all without third-party drivers.
Research libkrun-go-cli-tool — Krunsh is a minimal Go CLI tool that executes newline-delimited shell commands inside an ephemeral KVM-based microVM, leveraging the libkrun library for lightweight virtualization. By piping commands from stdin, krunsh spins up a microVM, runs the specified commands using `/bin/sh -c`, captures the output, and discards the VM afterward, ensuring zero persistent state and strong process isolation.
Research monty-wasm-pyodide — Monty WASM + Pyodide explores compiling Monty—a Rust-based, sandboxed Python interpreter—into WebAssembly for seamless browser access. It provides two integration paths: a standalone WASM module accessible directly from JavaScript, and a Pyodide-compatible wheel for usage in Python-in-the-browser environments. The project enables safe, dependency-free Python code execution with features like variable injection, output capturing (including print statements), and robust error handling.
Research pyo3-pyodide-wasm — Compiling Rust-based Python extension modules (via PyO3 and maturin) into WebAssembly wheels for Pyodide involves precise coordination of toolchain versions and build flags to ensure compatibility. The process relies on maturin (≥1.0) for packaging, the Emscripten SDK (with the exact version used by Pyodide), and a Rust nightly toolchain matching Pyodide's ABI, particularly the `-Z emscripten-wasm-eh` flag and a compatible sysroot for Python 3.13 (Pyodide 0.28+).
Research just-bash-deno-python — Exploring the capabilities of just-bash, this project integrates the TypeScript-based bash emulator into a persistent, JSONL-over-stdio server in Deno, accessible via a robust Python client library. The solution enables sandboxed bash scripting with comprehensive built-in commands, a virtual filesystem, and optional network access, with persistent state and fine-grained request control (env, cwd, timeout) supported.
Research wasm-repl-cli — WASM REPL CLI Tools enable JavaScript and Python REPLs from the command line by leveraging WebAssembly runtimes in Go, built on the wazero engine. The project supplies separate binaries for each language—one using QuickJS WASI and the other CPython WASI—offering direct code execution, interactive shells, and a JSONL mode. JSONL mode lets external applications submit code for execution while maintaining persistent state across requests, facilitating programmatic integration.
Research chatgpt-container-environment — Experiments in the ChatGPT sandbox reveal that general outbound internet access from Python and other user code (such as HTTP requests) is entirely blocked, while package managers like pip and npm are permitted to fetch dependencies using curated internal registry proxies. The container provides a privileged fetching mechanism (`container.download`) for select public URLs, which is more powerful than standard code-based networking.
Research cloudflare-workers-python-sqlite — Exploring the intersection of Cloudflare Workers, Python (via Pyodide), and SQLite persistence, this project demonstrates practical techniques for building serverless applications with both JavaScript and Python runtimes on the Cloudflare platform. JavaScript Workers, paired with D1 for persistent SQLite storage, handled form input, basic routing, and a page view counter.
Research duckdb-security — Evaluating DuckDB’s sandboxing features for secure untrusted query execution, this project demonstrates how to configure read-only access, restrict file and network operations, and enforce query timeouts in Python environments. Native settings like `read_only`, `enable_external_access`, and `allowed_paths` effectively limit users to preapproved data sources, while locking configuration via `lock_configuration=true` ensures that these controls cannot be altered by malicious queries.
Research string-redaction-library — Designed to detect secrets in text, the String Redaction Library leverages statistical analysis of character patterns—such as vowel/consonant ratios and digit presence—rather than relying on specific secret formats or regular expressions. It identifies highly random or non-English-like alphanumeric strings, hashes, and tokens without context awareness, making it easy to scan for hard-to-spot secrets in source code or logs.
Research whenwords-esoteric-langs — Showcasing the versatility of the whenwords time formatting specification, this project features parallel implementations in three esoteric programming languages: LOLCODE, Rockstar, and WebAssembly Text (WAT). Each version adapts the time formatting logic—such as "3 hours ago" and duration parsing—using the idiomatic constructs and limitations of its language, producing transpiled or compiled code for JavaScript, Python, or a compact WASM binary.
Research memchr-c-wrapper — Offering a pure C reimplementation of the Rust-based pymemchr, pymemchr-c delivers high-performance byte and substring search functions to Python with extensive SIMD (SSE2/AVX2/NEON) optimizations and runtime CPU feature detection. Its unique "Packed Pair" substring search algorithm enables the C version to outperform both Python's built-in methods (up to 28x faster) and the original Rust extension (up to 1.5x faster for substring operations), all while removing the need for a Rust toolchain.
Research sqlite-wasm-library — Seeking to enable Python's SQLite interface with WebAssembly, the project developed a `sqlite3_wasm` library—a drop-in replacement for Python's standard `sqlite3` module. By compiling SQLite 3.45.3 to WASM with wasi-sdk and wrapping the resulting binary with a Python API, the solution delivers fully functional, in-memory, WASM-powered database operations using the wasmtime runtime.
Research memchr-python-wrapper — pymemchr is a Python library that provides ultra-fast byte and substring search functions by binding to the memchr Rust crate, leveraging SIMD optimizations for superior performance. Using PyO3 and Maturin for cross-language integration, pymemchr offers efficient routines for finding single bytes, searching for multiple bytes, and locating substring patterns, both forwards and backwards, with highly competitive speedup over native Python methods.
Research sqlite-time-limit-extension — Designed as a Python C extension, the SQLite Time Limit Extension introduces a function, execute_with_timeout, enabling SQL queries against a SQLite database to be terminated if they exceed a specified millisecond threshold. This is achieved using SQLite's progress handler, ensuring that long-running queries do not block application responsiveness. Usage is simple via standard import, and rigorous tests are provided with pytest to validate both normal operation and timeouts.
Research http-range-wheel-metadata — Leveraging ZIP file structure and HTTP range requests, tools like uv efficiently extract wheel metadata for Python packages without downloading entire archives. By fetching just the last 16KB of the wheel (central directory and EOCD), parsing for the METADATA file offset, and then requesting exactly its byte range, uv and the accompanying Python prototype routinely reduce bandwidth usage by over 70%.
Research vibium-python-client — Examining the Vibium browser automation project, this investigation developed a Python client library that interoperates with Vibium’s Go-powered "clicker" binary and existing Node.js tools. The Python client exposes both synchronous and asynchronous APIs, replicating advanced browser automation features such as auto-waiting, visibility checks, and custom commands (e.g., `vibium:find`, `vibium:click`) via WebDriver BiDi over WebSocket.
Research debug-failed-fix — Debugging investigation into why commit 0dcfad4's fix for cog code rendering didn't work. The fix correctly used string concatenation to avoid `-->` in Python strings, but the explanatory comment itself contained the literal `-->` sequence, which closed the HTML comment early. Solution: rewrote the comment to avoid the problematic character sequence.
Research microquickjs-in-redis — Expanding Redis’s scripting capabilities, the Redis JavaScript Module enables users to execute JavaScript scripts in Redis through the fast, embedded mquickjs engine, paralleling the Lua scripting features but with a JavaScript syntax. This module introduces commands like `JS.EVAL`, `JS.LOAD`, and `JS.CALL`, supporting script execution, caching, and invocation by SHA1 hash, along with native integrations for running Redis commands, logging, and error handling within scripts.
Research url-limits-investigation — Major browser engines demonstrate significant differences in how they enforce URL length limits. Chromium sets a 2 MB cap at its inter-process communication boundary, rejecting longer URLs when crossing processes. Firefox relies on user-configurable preferences, employing a 1 MB "standard" limit but permitting up to 512 MB in absolute terms, with stricter limits (2,000 characters) for history and bookmarks.
Research mquickjs-sandbox — Exploring mquickjs, a highly minimal JavaScript engine, this project rigorously evaluates its suitability as a safe sandbox for running untrusted code. Various integration approaches are implemented, including Python FFI, C extensions, subprocess invocation, and WebAssembly runtimes—each tested for startup and execution performance, security isolation, and feature compatibility.
Research environment-report — Running Claude Code on the web offers developers a versatile coding sandbox on Ubuntu 24.04, leveraging a broad toolkit that includes Python 3.11, Node.js 22, Go, Rust, and more, alongside developer utilities (Git, Make) and database clients (SQLite, PostgreSQL).

Years