Example dashboard

Various statistics from my blog.

Owned by simonw, visibility: Public

Entries

3255

SQL query
select 'Entries' as label, count(*) as big_number from blog_entry

Blogmarks

8221

SQL query
select 'Blogmarks' as label, count(*) as big_number from blog_blogmark

Quotations

1310

SQL query
select 'Quotations' as label, count(*) as big_number from blog_quotation

Chart of number of entries per month over time

SQL query
select '<h2>Chart of number of entries per month over time</h2>' as html
SQL query
select to_char(date_trunc('month', created), 'YYYY-MM') as bar_label,
count(*) as bar_quantity from blog_entry group by bar_label order by count(*) desc

Ten most recent blogmarks (of 8221 total)

SQL query
select '## Ten most recent blogmarks (of ' || count(*) || ' total)' as markdown from blog_blogmark
SQL query
select link_title, link_url, commentary, created from blog_blogmark order by created desc limit 10

10 rows

link_title link_url commentary created
Codex cloud is now called Codex web https://developers.openai.com/codex/cloud/ It looks like OpenAI's **Codex cloud** (the cloud version of their Codex coding agent) was quietly rebranded to **Codex web** at some point in the last few days. Here's a screenshot of the Internet Archive copy from [18th December](https://web.archive.org/web/20251218043013/https://developers.openai.com/codex/cloud/) (the [capture on the 28th](https://web.archive.org/web/20251228124455/https://developers.openai.com/codex/cloud/) maintains that Codex cloud title but did not fully load CSS for me): ![Screenshot of the Codex cloud documentation page](https://static.simonwillison.net/static/2025/codex-cloud.jpg) And here's that same page today with the updated product name: ![Same documentation page only now it says Codex web](https://static.simonwillison.net/static/2025/codex-web.jpg) Anthropic's equivalent product has the incredibly clumsy name [Claude Code on the web](https://code.claude.com/docs/en/claude-code-on-the-web), which I shorten to "Claude Code for web" but even then bugs me because I mostly interact with it via Anthropic's native mobile app. I was hoping to see Claude Code for web rebrand to Claude Code Cloud - I did *not* expect OpenAI to rebrand in the opposite direction! 2025-12-31 16:35:28+00:00
TIL: Downloading archived Git repositories from archive.softwareheritage.org https://til.simonwillison.net/github/software-archive-recovery Back in February I [blogged about](https://simonwillison.net/2025/Feb/7/sqlite-s3vfs/) a neat Python library called `sqlite-s3vfs` for accessing SQLite databases hosted in an S3 bucket, released as MIT licensed open source by the UK government's Department for Business and Trade. I went looking for it today and found that the [github.com/uktrade/sqlite-s3vfs](https://github.com/uktrade/sqlite-s3vfs) repository is now a 404. Since this is taxpayer-funded open source software I saw it as my moral duty to try and restore access! It turns out [a full copy](https://archive.softwareheritage.org/browse/origin/directory/?origin_url=https://github.com/uktrade/sqlite-s3vfs) had been captured by [the Software Heritage archive](https://archive.softwareheritage.org/), so I was able to restore the repository from there. My copy is now archived at [simonw/sqlite-s3vfs](https://github.com/simonw/sqlite-s3vfs). The process for retrieving an archive was non-obvious, so I've written up a TIL and also published a new [Software Heritage Repository Retriever](https://tools.simonwillison.net/software-heritage-repo#https%3A%2F%2Fgithub.com%2Fuktrade%2Fsqlite-s3vfs) tool which takes advantage of the CORS-enabled APIs provided by Software Heritage. Here's [the Claude Code transcript](https://gistpreview.github.io/?3a76a868095c989d159c226b7622b092/index.html) from building that. 2025-12-30 23:51:33+00:00
shot-scraper 1.9 https://github.com/simonw/shot-scraper/releases/tag/1.9 New release of my [shot-scraper](https://shot-scraper.datasette.io/) CLI tool for taking screenshots and scraping websites with JavaScript from the terminal. > - The `shot-scraper har` command has a new `-x/--extract` option which extracts all of the resources loaded by the page out to a set of files. This location can be controlled by the `-o dir/` option. [#184](https://github.com/simonw/shot-scraper/issues/184) > - Fixed the `shot-scraper accessibility` command for compatibility with the latest Playwright. [#185](https://github.com/simonw/shot-scraper/issues/185) The new `shot-scraper har -x https://simonwillison.net/` command is really neat. The inspiration was [the digital forensics expedition](https://simonwillison.net/2025/Dec/26/slop-acts-of-kindness/#digital-forensics-with-shot-scraper-har) I went on to figure out why Rob Pike got spammed. You can now perform a version of that investigation like this: cd /tmp shot-scraper har --wait 10000 'https://theaidigest.org/village?day=265' -x Then dig around in the resulting JSON files in the `/tmp/theaidigest-org-village` folder. 2025-12-29 22:33:13+00:00
Copyright Release for Contributions To SQLite https://www.sqlite.org/copyright-release.html D. Richard Hipp [called me out](https://news.ycombinator.com/item?id=46420453#46424225) for spreading misinformation on Hacker News that SQLite refuses outside contributions: > No, Simon, we don't "refuse". We are just very selective and there is a lot of paperwork involved to confirm the contribution is in the public domain and does not contaminate the SQLite core with licensed code. I deeply regret this error! I'm linking to the copyright release document here - it looks like SQLite's public domain nature makes this kind of clause extremely important: > [...] To the best of my knowledge and belief, the changes and enhancements that I have contributed to SQLite are either originally written by me or are derived from prior works which I have verified are also in the public domain and are not subject to claims of copyright by other parties. Out of curiosity I decided to see how many people have contributed to SQLite outside of the core team of Richard, Dan and Joe. I ran that query using Fossil, SQLite's own SQLite-based version control system, like this: brew install fossil fossil clone https://www.sqlite.org/src sqlite.fossil fossil sql -R sqlite.fossil " SELECT user, COUNT(*) as commits FROM event WHERE type='ci' GROUP BY user ORDER BY commits DESC " I got back 38 rows, though I think `danielk1977` and `dan` may be duplicates. **Update**: The SQLite team have clarified this on their [SQLite is Public Domain](https://sqlite.org/copyright.html) page. It used to read "In order to keep SQLite completely free and unencumbered by copyright, the project does not accept patches." - it now reads: > In order to keep SQLite completely free and unencumbered by copyright, the project does not accept patches from random people on the internet. There is a process to get a patch accepted, but that process is involved and for smaller changes is not normally worth the effort. 2025-12-29 19:58:45+00:00
simonw/actions-latest https://github.com/simonw/actions-latest Today in extremely niche projects, I got fed up of Claude Code creating GitHub Actions workflows for me that used stale actions: `actions/setup-python@v4` when the latest is `actions/setup-python@v6` for example. I couldn't find a good single place listing those latest versions, so I had Claude Code for web (via my phone, I'm out on errands) build a Git scraper to publish those versions in one place: [https://simonw.github.io/actions-latest/versions.txt](https://simonw.github.io/actions-latest/versions.txt) Tell your coding agent of choice to fetch that any time it wants to write a new GitHub Actions workflows. (I may well bake this into a Skill.) Here's the [first](https://gistpreview.github.io/?7883c719a25802afa5cdde7d3ed68b32/index.html) and [second](https://gistpreview.github.io/?0ddaa82aac2c062ff157c7a01db0a274/page-001.html) transcript I used to build this, shared using my [claude-code-transcripts](https://simonwillison.net/2025/Dec/25/claude-code-transcripts/) tool (which just [gained a search feature](https://github.com/simonw/claude-code-transcripts/issues/15).) 2025-12-28 22:45:10+00:00
textarea.my on GitHub https://github.com/antonmedv/textarea Anton Medvedev built [textarea.my](https://textarea.my/), which he describes as: > A *minimalist* text editor that lives entirely in your browser and stores everything in the URL hash. It's ~160 lines of HTML, CSS and JavaScript and it's worth reading the whole thing. I picked up a bunch of neat tricks from this! - `<article contenteditable="plaintext-only">` - I did not know about the `plaintext-only` value, supported across [all the modern browsers](https://developer.mozilla.org/en-US/docs/Web/API/HTMLElement/contentEditable). - It uses `new CompressionStream('deflate-raw')` to compress the editor state so it can fit in a shorter fragment URL. - It has a neat custom save option which triggers if you hit `((e.metaKey || e.ctrlKey) && e.key === 's')` - on [browsers that support it](https://developer.mozilla.org/en-US/docs/Web/API/Window/showSaveFilePicker) (mainly Chrome variants) this uses `window.showSaveFilePicker()`, other browsers get a straight download - in both cases generated using `URL.createObjectURL(new Blob([html], {type: 'text/html'}))` The `debounce()` function it uses deserves a special note: <pre><span class="pl-k">function</span> <span class="pl-en">debounce</span><span class="pl-kos">(</span><span class="pl-s1">ms</span><span class="pl-kos">,</span> <span class="pl-s1">fn</span><span class="pl-kos">)</span> <span class="pl-kos">{</span> <span class="pl-k">let</span> <span class="pl-s1">timer</span> <span class="pl-k">return</span> <span class="pl-kos">(</span>...<span class="pl-s1">args</span><span class="pl-kos">)</span> <span class="pl-c1">=&gt;</span> <span class="pl-kos">{</span> <span class="pl-en">clearTimeout</span><span class="pl-kos">(</span><span class="pl-s1">timer</span><span class="pl-kos">)</span> <span class="pl-s1">timer</span> <span class="pl-c1">=</span> <span class="pl-en">setTimeout</span><span class="pl-kos">(</span><span class="pl-kos">(</span><span class="pl-kos">)</span> <span class="pl-c1">=&gt;</span> <span class="pl-s1">fn</span><span class="pl-kos">(</span>...<span class="pl-s1">args</span><span class="pl-kos">)</span><span class="pl-kos">,</span> <span class="pl-s1">ms</span><span class="pl-kos">)</span> <span class="pl-kos">}</span> <span class="pl-kos">}</span></pre> That's really elegant. The goal of `debounce(ms, fn)` is to take a function and a timeout (e.g. 100ms) and ensure that the function runs at most once every 100ms. This one works using a closure variable `timer` to capture the `setTimeout` time ID. On subsequent calls that timer is cancelled and a new one is created - so if you call the function five times in quick succession it will execute just once, 100ms after the last of that sequence of calls. 2025-12-27 03:23:34+00:00
How uv got so fast https://nesbitt.io/2025/12/26/how-uv-got-so-fast.html Andrew Nesbitt provides an insightful teardown of why [uv](https://github.com/astral-sh/uv) is so much faster than `pip`. It's not nearly as simple as just "they rewrote it in Rust" - `uv` gets to skip a huge amount of Python packaging history (which `pip` needs to implement for backwards compatibility) and benefits enormously from work over recent years that makes it possible to resolve dependencies across most packages without having to execute the code in `setup.py` using a Python interpreter. Two notes that caught my eye that I hadn't understood before: > **HTTP range requests for metadata.** [Wheel files](https://packaging.python.org/en/latest/specifications/binary-distribution-format/) are zip archives, and zip archives put their file listing at the end. uv tries PEP 658 metadata first, falls back to HTTP range requests for the zip central directory, then full wheel download, then building from source. Each step is slower and riskier. The design makes the fast path cover 99% of cases. None of this requires Rust. > > [...] > > **Compact version representation**. uv packs versions into u64 integers where possible, making comparison and hashing fast. Over 90% of versions fit in one u64. This is micro-optimization that compounds across millions of comparisons. I wanted to learn more about these tricks, so I fired up [an asynchronous research task](https://simonwillison.net/2025/Nov/6/async-code-research/) and told it to checkout the `astral-sh/uv` repo, find the Rust code for both of those features and try porting it to Python to help me understand how it works. Here's [the report that it wrote for me](https://github.com/simonw/research/tree/main/http-range-wheel-metadata), the [prompts I used](https://github.com/simonw/research/pull/57) and the [Claude Code transcript](https://gistpreview.github.io/?0f04e4d1a240bfc3065df5082b629884/index.html). You can try [the script](https://github.com/simonw/research/blob/main/http-range-wheel-metadata/wheel_metadata.py) it wrote for extracting metadata from a wheel using HTTP range requests like this: `uv run --with httpx https://raw.githubusercontent.com/simonw/research/refs/heads/main/http-range-wheel-metadata/wheel_metadata.py https://files.pythonhosted.org/packages/8b/04/ef95b67e1ff59c080b2effd1a9a96984d6953f667c91dfe9d77c838fc956/playwright-1.57.0-py3-none-macosx_11_0_arm64.whl -v` The Playwright wheel there is ~40MB. Adding `-v` at the end causes the script to spit out verbose details of how it fetched the data - [which looks like this](https://gist.github.com/simonw/a5ef83b6e4605d2577febb43fa9ad018). Key extract from that output: [1] HEAD request to get file size... File size: 40,775,575 bytes [2] Fetching last 16,384 bytes (EOCD + central directory)... Received 16,384 bytes [3] Parsed EOCD: Central directory offset: 40,731,572 Central directory size: 43,981 Total entries: 453 [4] Fetching complete central directory... ... [6] Found METADATA: playwright-1.57.0.dist-info/METADATA Offset: 40,706,744 Compressed size: 1,286 Compression method: 8 [7] Fetching METADATA content (2,376 bytes)... [8] Decompressed METADATA: 3,453 bytes Total bytes fetched: 18,760 / 40,775,575 (100.0% savings) The section of the report [on compact version representation](https://github.com/simonw/research/tree/main/http-range-wheel-metadata#bonus-compact-version-representation) is interesting too. Here's how it illustrates sorting version numbers correctly based on their custom u64 representation: Sorted order (by integer comparison of packed u64): 1.0.0a1 (repr=0x0001000000200001) 1.0.0b1 (repr=0x0001000000300001) 1.0.0rc1 (repr=0x0001000000400001) 1.0.0 (repr=0x0001000000500000) 1.0.0.post1 (repr=0x0001000000700001) 1.0.1 (repr=0x0001000100500000) 2.0.0.dev1 (repr=0x0002000000100001) 2.0.0 (repr=0x0002000000500000) 2025-12-26 23:43:15+00:00
uv-init-demos https://github.com/simonw/uv-init-demos `uv` has a useful `uv init` command for setting up new Python projects, but it comes with a bunch of different options like `--app` and `--package` and `--lib` and I wasn't sure how they differed. So I created this GitHub repository which demonstrates all of those options, generated using this [update-projects.sh](https://github.com/simonw/uv-init-demos/blob/main/update-projects.sh) script ([thanks, Claude](https://gistpreview.github.io/?9cff2d3b24ba3d5f423b34abc57aec13)) which will run on a schedule via GitHub Actions to capture any changes made by future releases of `uv`. 2025-12-24 22:05:23+00:00
MicroQuickJS https://github.com/bellard/mquickjs New project from programming legend Fabrice Bellard, of ffmpeg and QEMU and QuickJS and [so much more](https://bellard.org) fame: > MicroQuickJS (aka. MQuickJS) is a Javascript engine targetted at embedded systems. It compiles and runs Javascript programs with as low as 10 kB of RAM. The whole engine requires about 100 kB of ROM (ARM Thumb-2 code) including the C library. The speed is comparable to QuickJS. It supports [a subset of full JavaScript](https://github.com/bellard/mquickjs/blob/17ce6fe54c1ea4f500f26636bd22058fce2ce61a/README.md#javascript-subset-reference), though it looks like a rich and full-featured subset to me. One of my ongoing interests is sandboxing: mechanisms for executing untrusted code - from end users or generated by LLMs - in an environment that restricts memory usage and applies a strict time limit and restricts file or network access. Could MicroQuickJS be useful in that context? I fired up Claude Code for web (on my iPhone) and kicked off [an asynchronous research project](https://simonwillison.net/2025/Nov/6/async-code-research/) to see explore that question: My full prompt [is here](https://github.com/simonw/research/pull/50#issue-3757781692). It started like this: > `Clone https://github.com/bellard/mquickjs to /tmp` > > `Investigate this code as the basis for a safe sandboxing environment for running untrusted code such that it cannot exhaust memory or CPU or access files or the network` > > `First try building python bindings for this using FFI - write a script that builds these by checking out the code to /tmp and building against that, to avoid copying the C code in this repo permanently. Write and execute tests with pytest to exercise it as a sandbox` > > `Then build a "real" Python extension not using FFI and experiment with that` > > `Then try compiling the C to WebAssembly and exercising it via both node.js and Deno, with a similar suite of tests [...]` I later added to the interactive session: > `Does it have a regex engine that might allow a resource exhaustion attack from an expensive regex?` (The answer was no - the regex engine calls the interrupt handler even during pathological expression backtracking, meaning that any configured time limit should still hold.) Here's [the full transcript](https://gistpreview.github.io/?6e07c54db7bb8ed8aa0eccfe4a384679) and the [final report](https://github.com/simonw/research/blob/main/mquickjs-sandbox/README.md). Some key observations: - MicroQuickJS is *very* well suited to the sandbox problem. It has robust near and time limits baked in, it doesn't expose any dangerous primitive like filesystem of network access and even has a regular expression engine that protects against exhaustion attacks (provided you configure a time limit). - Claude span up and tested a Python library that calls a MicroQuickJS shared library (involving a little bit of extra C), a compiled a Python binding and a library that uses the original MicroQuickJS CLI tool. All of those approaches work well. - Compiling to WebAssembly was a little harder. It got a version working in Node.js and Deno and Pyodide, but the Python libraries wasmer and wasmtime proved harder, apparently because "mquickjs uses setjmp/longjmp for error handling". It managed to get to a working wasmtime version with [a gross hack](https://github.com/simonw/research/blob/main/mquickjs-sandbox/README.md#working-solution). I'm really excited about this. MicroQuickJS is tiny, full featured, looks robust and comes from excellent pedigree. I think this makes for a very solid new entrant in the quest for a robust sandbox. **Update**: I had Claude Code build [tools.simonwillison.net/microquickjs](https://tools.simonwillison.net/microquickjs), an interactive web playground for trying out the WebAssembly build of MicroQuickJS, adapted from my previous [QuickJS plaground](https://tools.simonwillison.net/quickjs). My QuickJS page loads 2.28 MB (675 KB transferred). The MicroQuickJS one loads 303 KB (120 KB transferred). Here are [the prompts I used](https://github.com/simonw/tools/pull/180#issue-3758595291) for that. 2025-12-23 20:53:40+00:00
Sam Rose explains how LLMs work with a visual essay https://ngrok.com/blog/prompt-caching/ Sam Rose is one of my favorite authors of [explorable interactive explanations](https://simonwillison.net/tags/explorables/) - here's [his previous collection](https://samwho.dev/). Sam joined ngrok in September as a developer educator. Here's his first big visual explainer for them, ostensibly about how prompt caching works but it quickly expands to cover tokenization, embeddings, and the basics of the transformer architecture. The result is one of the clearest and most accessible introductions to LLM internals I've seen anywhere. <div style="text-align: center"><img alt="Animation. Starts in tokens mode with an array of 75, 305, 24, 887 - clicking embeddings animates those into a 2D array showing each one to be composed of three floating point numbers." src="https://static.simonwillison.net/static/2025/tokens-embeddings.gif" style="max-width: 100%"></div> 2025-12-19 18:33:41+00:00
Copy and export data

Duration: 4.48ms