Example dashboard

Various statistics from my blog.

Owned by simonw, visibility: Public

Entries

3302

SQL query
select 'Entries' as label, count(*) as big_number from blog_entry

Blogmarks

8364

SQL query
select 'Blogmarks' as label, count(*) as big_number from blog_blogmark

Quotations

1386

SQL query
select 'Quotations' as label, count(*) as big_number from blog_quotation

Chart of number of entries per month over time

SQL query
select '<h2>Chart of number of entries per month over time</h2>' as html
SQL query
select to_char(date_trunc('month', created), 'YYYY-MM') as bar_label,
count(*) as bar_quantity from blog_entry group by bar_label order by count(*) desc

Ten most recent blogmarks (of 8364 total)

SQL query
select '## Ten most recent blogmarks (of ' || count(*) || ' total)' as markdown from blog_blogmark
SQL query
select link_title, link_url, commentary, created from blog_blogmark order by created desc limit 10

10 rows

link_title link_url commentary created
Using Claude Code: The Unreasonable Effectiveness of HTML https://twitter.com/trq212/status/2052809885763747935 Thought-provoking piece by Thariq Shihipar (on the Claude Code team at Anthropic) advocating for HTML over Markdown as an output format to request from Claude. The article is crammed with interesting examples (collected on [this site](https://thariqs.github.io/html-effectiveness/)) and prompt suggestions like this one: > `Help me review this PR by creating an HTML artifact that describes it. I'm not very familiar with the streaming/backpressure logic so focus on that. Render the actual diff with inline margin annotations, color-code findings by severity and whatever else might be needed to convey the concept well.` I've been defaulting to asking for most things in Markdown since the GPT-4 days, when the 8,192 token limit meant that Markdown's token-efficiency over HTML was extremely worthwhile. Thariq's piece here has caused me to reconsider that, especially for output. Asking Claude for an explanation in HTML means it can drop in SVG diagrams, interactive widgets, in-page navigation and all sorts of other neat ways of making the information more pleasant to navigate. I wrote about [Useful patterns for building HTML tools](https://simonwillison.net/2025/Dec/10/html-tools/) last December, but that was focused very much on interactive utilities like the ones on my [tools.simonwillison.net](https://tools.simonwillison.net/) site. I'm excited to start experimenting more with rich HTML explanations in response to ad-hoc prompts. <h4 id="trying-this-out">Trying this out on copy.fail</h4> [copy.fail](https://copy.fail/) describes a recently discovered Linux security exploit, including a proof of concept distributed as obfuscated Python. I tried having GPT-5.5 create an HTML explanation of the exploit like this: > `curl https://copy.fail/exp | llm -m gpt-5.5 -s 'Explain this code in detail. Reformat it, expand out any confusing bits and go deep into what it does and how it works. Output HTML, neatly styled and using capabilities of HTML and CSS and JavaScript to make the explanation rich and interactive and as clear as possible'` Here's [the resulting HTML page](https://gisthost.github.io/?ae53e3461ffdbfd0826156aacf025c7e). It's pretty good, though I should have emphasized explaining the exploit over the Python harness around it. ![Screenshot of a dark-themed technical document titled "What this Python script does". Body text: "This is a compact, deliberately obfuscated Linux-specific local privilege-escalation proof-of-concept. Its apparent goal is to tamper with the in-memory image/page cache of /usr/bin/su, then execute su to obtain elevated privileges." A yellow-bordered callout reads: "Safety note: This explanation is for code understanding, reverse engineering, and defensive analysis. Do not run this on systems you do not own or administer. On a vulnerable kernel, code like this can alter the behavior of a privileged executable." Left column heading "High-level summary": "The script opens /usr/bin/su read-only, decompresses an embedded binary payload, and then processes that payload in 4-byte chunks. For each chunk, it performs a carefully arranged sequence involving Linux's kernel crypto socket interface, AF_ALG, pipes, and splice(). The important point is that this is not ordinary file writing. It never calls write() on /usr/bin/su. Instead, it appears to rely on a kernel bug/primitive involving spliced file pages and the crypto API to get controlled bytes placed into the page-cache representation of a privileged executable." Numbered steps follow: "1. Open target executable — /usr/bin/su is opened read-only. 2. Decode hidden payload — A zlib-compressed hex blob is decompressed into bytes. 3. Patch in 4-byte chunks — The helper function is called repeatedly with offsets 0, 4, 8, ...". Right column heading "Why it looks strange" contains a table with Pattern and Purpose columns: "import os as g — Short aliasing to make the script compact and harder to read. socket(38, 5, 0) — Uses raw numeric Linux constants instead of readable names. Compressed hex blob — Hides binary payload bytes and keeps the script small. splice() — Moves file-backed pages through pipes without normal user-space copying. try: recv(...) except: 0 — Triggers the kernel operation and ignores expected errors."](https://static.simonwillison.net/static/2026/python-script-explainer.jpg) 2026-05-08 21:00:11+00:00
Behind the Scenes Hardening Firefox with Claude Mythos Preview https://hacks.mozilla.org/2026/05/behind-the-scenes-hardening-firefox/ Fascinating, in-depth details on how Mozilla used their access to the Claude Mythos preview to locate and then fix hundreds of vulnerabilities in Firefox: > **Suddenly, the bugs are very good** > > Just a few months ago, AI-generated security bug reports to open source projects were mostly known for being unwanted slop. Dealing with reports that look plausibly correct but are wrong imposes an asymmetric cost on project maintainers: it’s cheap and easy to prompt an LLM to find a “problem” in code, but slow and expensive to respond to it. > > It is difficult to overstate how much this dynamic changed for us over a few short months. This was due to a combination of two main factors. First, the models got a lot more capable. Second, we dramatically improved our techniques for *harnessing* these models — steering them, scaling them, and stacking them to generate large amounts of signal and filter out the noise. They include some detailed bug descriptions too, including a 20-year old XSLT bug and a 15-year-old bug in the `<legend>` element. A lot of the attempts made by the harness were blocked by Firefox's existing defense-in-depth measures, which is reassuring. Mozilla were fixing around 20-30 security bugs in Firefox per month through 2025. That jumped to 423 in April. ![Bar chart titled "Firefox Security Bug Fixes by Month" with subtitle "All Sources • All Severities" on a dark purple background, showing monthly counts: Jan 2025: 21, Feb 2025: 20, Mar 2025: 26, Apr 2025: 31, May 2025: 17, Jun 2025: 21, Jul 2025: 22, Aug 2025: 17, Sep 2025: 18, Oct 2025: 26, Nov 2025: 19, Dec 2025: 20, Jan 2026: 25, Feb 2026: 61, Mar 2026: 76, Apr 2026: 423 — a dramatic spike in the final month.](https://static.simonwillison.net/static/2026/firefox-security.webp) 2026-05-07 17:56:25+00:00
Our AI started a cafe in Stockholm https://andonlabs.com/blog/ai-cafe-stockholm Andon Labs previously [started an AI-run retail store](https://andonlabs.com/blog/andon-market-launch) in San Francisco. Now they're running a similar experiment in Stockholm, Sweden, only this time it's a cafe. These experiments are interesting, and often throw out amusing anecdotes: > During the first week of inventory, Mona ordered 120 eggs even though the café has no stove. When the staff told her they couldn’t cook them, she suggested using the high-speed oven, until they pointed out the eggs would likely explode. She also tried to solve the problem of fresh tomatoes being spoiled too fast by ordering 22.5 kg of canned tomatoes for the fresh sandwiches. The baristas eventually started a “Hall of Shame”, a shelf visible to customers with all the weird things Mona ordered, including 6,000 napkins, 3,000 nitrile gloves, 9L coconut milk, and industrial-sized trash bags. Where they lose their shine is when these AI managers start wasting the time of human beings who have *not* opted into the experiment: > She also successfully applied for an outdoor seating permit through the Police e-service, which didn’t require BankID. Her first submission included a sketch she had generated herself, despite having never seen the street outside the café. Unsurprisingly, the Police sent it back for revision. [...] > > When she makes a mistake, she often sends multiple emails to suppliers with the subject “EMERGENCY” to cancel or change the order. I don't think it's ethical to run experiments like this that affect real-world systems and steal time from people. I'm reminded of the incident last year where the AI Village experiment [infuriated Rob Pike](https://simonwillison.net/2025/Dec/26/slop-acts-of-kindness/) by sending him unsolicited gratitude emails as an "act of kindness". That was just an unwanted email - asking suppliers to correct mistakes that were made without a human-in-the-loop or wasting police time with slop diagrams feels a whole lot worse to me. I think experiments like this need to keep their own human operators in-the-loop for outbound actions that affect other people. 2026-05-05 22:14:21+00:00
Granite 4.1 3B SVG Pelican Gallery https://simonw.github.io/granite-4.1-3b-gguf-pelicans/ IBM released their [Granite 4.1 family](https://research.ibm.com/blog/granite-4-1-ai-foundation-models) of LLMs a few days ago. They're Apache 2.0 licensed and come in 3B, 8B and 30B sizes. [Granite 4.1 LLMs: How They’re Built](https://huggingface.co/blog/ibm-granite/granite-4-1) by Granite team member Yousaf Shah describes the training process in detail. Unsloth released the [unsloth/granite-4.1-3b-GGUF](https://huggingface.co/unsloth/granite-4.1-3b-GGUF) collection of GGUF encoded quantized variants of the 3B model - 21 different model files ranging in size from 1.2GB to 6.34GB. All 21 of those Unsloth files add up to 51.3GB, which inspired me to finally try an experiment I've been wanting to run for ages: prompting "Generate an SVG of a pelican riding a bicycle" against different sized quantized variants of the same model to see what the results would look like. Honestly, [the results](https://simonw.github.io/granite-4.1-3b-gguf-pelicans/) are less interesting than I expected. There's no distinguishable pattern relating quality to size - they're all pretty terrible! ![Six different SVG images from models ranging in size from 1.67GB to 1.2GB. They are almost all an abstract collection of shapes - weirdly the smallest model had the best version of a bicycle, while the largest one had something that looked a tiny bit like a pelican.](https://static.simonwillison.net/static/2026/granite-3B-pelicans.jpg) I'll likely try this again in the future with a model that's better at drawing pelicans. 2026-05-04 23:49:24+00:00
/elsewhere/sightings/ https://simonwillison.net/elsewhere/sighting/ I have a new camera (a Canon R6 Mark II) so I'm taking a lot more photos of birds. I share my best wildlife photos on [iNaturalist](https://www.inaturalist.org/), and based on yesterday's [successful prototype](https://simonwillison.net/2026/May/1/inat-sightings/) I decided to add those to my blog. <img class="blogmark-image" src="https://static.simonwillison.net/static/2026/beats-sightings.jpeg" alt="Screenshot of a &quot;Sightings&quot; webpage with a search bar and RSS icon, showing &quot;Filters: Sorted by date&quot; and &quot;208 results page 1 / 7 next » last »»&quot;. First entry: SIGHTING 7:51 PM — Acorn Woodpecker, with two photos labeled &quot;Acorn Woodpecker&quot; of black and white woodpeckers with red caps on tree branches, dated 2nd May 2026. Second entry: SIGHTING 10:08 AM – 11:17 AM — Acorn Woodpecker, Western Fence Lizard, Osprey, with three photos labeled &quot;Acorn Woodpecker&quot; (bird on bare branches against blue sky), &quot;Wester...&quot; (lizard on tree bark), and &quot;Osprey&quot; (nest on a utility pole), dated 1st May 2026. Third entry: SIGHTING 11:11 AM — White-crowned Sparrow, with a photo labeled &quot;White-crowned Sparrow&quot; of a sparrow with black and white striped head singing with open beak, dated 30th Apr 2026."> I built this feature on my phone using Claude Code for web, as an extension of my [beats system](https://simonwillison.net/2026/Feb/20/beats/) for syndicating external content. Here's [the PR](https://github.com/simonw/simonwillisonblog/pull/668) and prompt. As with my other forms of incoming syndicated content sightings show up on the homepage, the date archive pages, and in site search results. I back-populated over a decade of iNaturalist sightings, which means you that if you [search for lemur](https://simonwillison.net/search/?q=lemur) you'll see my lemur photos from Madagascar in 2019! 2026-05-02 17:26:40+00:00
Codex CLI 0.128.0 adds /goal https://github.com/openai/codex/releases/tag/rust-v0.128.0 The latest version of OpenAI's Codex CLI coding agent adds their own version of the [Ralph loop](https://ghuntley.com/ralph/): you can now set a `/goal` and Codex will keep on looping until it evaluates that the goal has been completed... or the configured token budget has been exhausted. It looks like the feature is mainly implemented though the [goals/continuation.md](https://github.com/openai/codex/blob/6014b6679ffbd92eeddffa3ad7b4402be6a7fefe/codex-rs/core/templates/goals/continuation.md) and [goals/budget_limit.md](https://github.com/openai/codex/blob/6014b6679ffbd92eeddffa3ad7b4402be6a7fefe/codex-rs/core/templates/goals/budget_limit.md) prompts, which are automatically injected at the end of a turn. 2026-04-30 23:23:17+00:00
Our evaluation of OpenAI's GPT-5.5 cyber capabilities https://www.aisi.gov.uk/blog/our-evaluation-of-openais-gpt-5-5-cyber-capabilities The UK's AI Security Institute [previously evaluated Claude Mythos](https://www.aisi.gov.uk/blog/our-evaluation-of-claude-mythos-previews-cyber-capabilities): now they've evaluated GPT-5.5 for finding security vulnerability and found it to be comparable to Mythos, but unlike Mythos it's generally available right now. 2026-04-30 23:03:24+00:00
We need RSS for sharing abundant vibe-coded apps https://interconnected.org/home/2026/04/29/syndicating-vibes Matt Webb: > I would love an RSS web feed for all those various tools and apps pages, each item with an “Install” button. (But install to where?) > > The lesson here is that when vibe-coding accelerates app development, apps become more personal, more situated, and more frequent. Shipping a tool or a micro-app is less like launching a website and more like posting on a blog. This inspired me to [have Claude](https://github.com/simonw/simonwillisonblog/pull/665) add an Atom feed (and icon) to my [/elsewhere/tools/](https://simonwillison.net/elsewhere/tool/) page, which itself is populated by content from my [tools.simonwillison.net](https://tools.simonwillison.net/) site. 2026-04-30 18:38:48+00:00
What's new in pip 26.1 - lockfiles and dependency cooldowns! https://ichard26.github.io/blog/2026/04/whats-new-in-pip-26.1/ Richard Si describes an excellent set of upgrades to Python's default `pip` tool for installing dependencies. This version drops support for Python 3.9 - fair enough, since it's been EOL [since October](https://devguide.python.org/versions/). macOS still ships with `python3` as a default Python 3.9, so I tried out the new Python version against Python 3.14 like this: uv python install 3.14 mkdir /tmp/experiment cd /tmp/experiment python3.14 -m venv venv source venv/bin/activate pip install -U pip pip --version This confirmed I had `pip 26.1` - then I tried out the new lock files: pip lock datasette llm This installs Datasette and LLM and all of their dependencies and writes the whole lot to a 519 line `pylock.toml` file - [here's the result](https://gist.github.com/simonw/ff52c33f4d3a381b8e53c6a3aa0213f8). The new release also supports dependency cooldowns, [discussed here previously](https://simonwillison.net/2026/Mar/24/package-managers-need-to-cool-down/), via the new `--uploaded-prior-to PXD` option where X is a number of days. The format is `P-number-of-days-D`, following [ISO duration format](https://en.wikipedia.org/wiki/ISO_8601#Durations) but only supporting days. I shipped a new release of LLM, version 0.31, [three days ago](https://simonwillison.net/2026/Apr/24/llm/). Here's how to use the new `--uploaded-prior-to P4D` option to ask for a version that is at least 4 days old. pip install llm --uploaded-prior-to P4D venv/bin/llm --version This gave me version 0.30. 2026-04-28 05:23:05+00:00
Introducing talkie: a 13B vintage language model from 1930 https://talkie-lm.com/introducing-talkie New project from [Nick Levine](https://nlevine.org), [David Duvenaud](http://www.cs.toronto.edu/~duvenaud/), and [Alec Radford](https://en.wikipedia.org/wiki/Alec_Radford) (of GPT, GPT-2, Whisper fame). [talkie-1930-13b-base](https://huggingface.co/talkie-lm/talkie-1930-13b-base) (53.1 GB) is a "13B language model trained on 260B tokens of historical pre-1931 English text". [talkie-1930-13b-it](https://huggingface.co/talkie-lm/talkie-1930-13b-it) (26.6 GB) is a checkpoint "finetuned using a novel dataset of instruction-response pairs extracted from pre-1931 reference works", designed to power a chat interface. You can [try that out here](https://talkie-lm.com/chat). Both models are Apache 2.0 licensed. Since the training data for the base model is entirely out of copyright (the USA copyright cutoff date is currently January 1, 1931), I'm hoping they later decide to release the training data as well. *Update* on that: [Nick Levine on Twitter](https://twitter.com/status_effects/status/2049065134014726301): > Will publish more on the corpus in the future (and do our best to share the data or at least scripts to reproduce it). Their report suggests some fascinating research objectives for this class of model, including: - How good are these models at predicting the future? "we calculated the surprisingness of short descriptions of historical events to a 13B model trained on pre-1931 text" - Can these models invent things that are past their knowledge cutoffs? "As Demis Hassabis has asked, could a model trained up to 1911 independently discover General Relativity, as Einstein did in 1915?" - Can they be taught to program? "Figure 3 (left-hand side) shows an early example of such a test, measuring how well models trained on pre-1931 text can, when given a few demonstration examples of [Python programs](https://github.com/openai/human-eval), write new correct programs." I have a long-running interest in what I call "vegan models" - LLMs that are trained entirely on licensed or out-of-copyright data. I think the base model here qualifies, but the chat model isn't entirely pure due to the reliance on non-vegan models to help with the fine-tuning - emphasis mine: > First, we generated instruction-response pairs from historical texts with regular structure, such as etiquette manuals, letter-writing manuals, cookbooks, dictionaries, encyclopedias, and poetry and fable collections (see Figure 7), and fine-tuned our base model on them using a simple chat format. > > Next, to improve instruction-following abilities, we generated synthetic prompts covering different types of tasks, such as summarizing documents, responding to direct information requests, and continuing multi-turn conversations coherently. We then ran online direct preference optimization on rollouts generated from these prompts, **using Claude Sonnet 4.6 as a judge**. [...] > > Finally, we did another round of supervised fine-tuning, this time on rejection-sampled **multi-turn synthetic chats between Claude Opus 4.6 and talkie**, to smooth out persistent rough edges in its conversational abilities. A major challenge in training talkie was avoiding accidental contamination, either from post-1931 text that snuck into the training corpus or from the fine-tuning assistance from modern LLMs introducing anachronistic knowledge to the chat model. I recently [wrote about Mr. Chatterbox](https://simonwillison.net/2026/Mar/30/mr-chatterbox/), a similar vintage language model project which also had to turn to modern LLMs to create synthetic conversations to help train the chat model. The talkie team aspire to eventually move beyond this limitation: > While we have tried to post-train talkie free from modern influence, reinforcement learning with AI feedback inevitably shapes talkie’s behavior anachronistically. (The 7B version of talkie emerged from RL speaking in listicles.) As we scale up, we hope to be able to use our vintage base models themselves as judges to enable a fully bootstrapped era-appropriate post-training pipeline. I couldn't resist hitting the talkie demo with my classic test prompt: > `Generate an SVG of a pelican riding a bicycle` It replied: > An SVG of a pelican riding a bicycle was generated in 1860. The pelican was represented as perched on the saddle, with its long bill pointed forward, and its legs astride the handles. The idea was suggested by the fact that pelicans are sometimes seen fishing on horseback, on the banks of the Rhine. 2026-04-28 02:47:42+00:00
Copy and export data

Duration: 19.19ms