Example dashboard

Various statistics from my blog.

Owned by simonw, visibility: Public

Entries

3246

SQL query
select 'Entries' as label, count(*) as big_number from blog_entry

Blogmarks

8198

SQL query
select 'Blogmarks' as label, count(*) as big_number from blog_blogmark

Quotations

1297

SQL query
select 'Quotations' as label, count(*) as big_number from blog_quotation

Chart of number of entries per month over time

SQL query
select '<h2>Chart of number of entries per month over time</h2>' as html
SQL query
select to_char(date_trunc('month', created), 'YYYY-MM') as bar_label,
count(*) as bar_quantity from blog_entry group by bar_label order by count(*) desc

Ten most recent blogmarks (of 8198 total)

SQL query
select '## Ten most recent blogmarks (of ' || count(*) || ' total)' as markdown from blog_blogmark
SQL query
select link_title, link_url, commentary, created from blog_blogmark order by created desc limit 10

10 rows

link_title link_url commentary created
The Normalization of Deviance in AI https://embracethered.com/blog/posts/2025/the-normalization-of-deviance-in-ai/ This thought-provoking essay from Johann Rehberger directly addresses something that I’ve been worrying about for quite a while: in the absence of any headline-grabbing examples of prompt injection vulnerabilities causing real economic harm, is anyone going to care? Johann describes the concept of the “Normalization of Deviance” as directly applying to this question. Coined by [Diane Vaughan](https://en.wikipedia.org/wiki/Diane_Vaughan), the key idea here is that organizations that get away with “deviance” - ignoring safety protocols or otherwise relaxing their standards - will start baking that unsafe attitude into their culture. This can work fine… until it doesn’t. The Space Shuttle Challenger disaster has been partially blamed on this class of organizational failure. As Johann puts it: > In the world of AI, we observe companies treating probabilistic, non-deterministic, and sometimes adversarial model outputs as if they were reliable, predictable, and safe. > > Vendors are normalizing trusting LLM output, but current understanding violates the assumption of reliability. > > The model will not consistently follow instructions, stay aligned, or maintain context integrity. This is especially true if there is an attacker in the loop (e.g indirect prompt injection). > > However, we see more and more systems allowing untrusted output to take consequential actions. Most of the time it goes well, and over time vendors and organizations lower their guard or skip human oversight entirely, because “it worked last time.” > > This dangerous bias is the fuel for normalization: organizations confuse the absence of a successful attack with the presence of robust security. 2025-12-10 20:18:58+00:00
10 Years of Let's Encrypt https://letsencrypt.org/2025/12/09/10-years Internet Security Research Group co-founder and Executive Director Josh Aas: > On September 14, 2015, [our first publicly-trusted certificate went live](https://crt.sh/?id=9314793). [...] Today, Let’s Encrypt is the largest certificate authority in the world in terms of certificates issued, the ACME protocol we helped create and standardize is integrated throughout the server ecosystem, and we’ve become a household name among system administrators. We’re closing in on protecting one billion web sites. Their growth rate and numbers are wild: > In March 2016, we issued our one millionth certificate. Just two years later, in September 2018, we were issuing a million certificates every day. In 2020 we reached a billion total certificates issued and as of late 2025 we’re frequently issuing ten million certificates per day. According to [their stats](https://letsencrypt.org/stats/) the amount of Firefox traffic protected by HTTPS doubled from 39% at the start of 2016 to ~80% today. I think it's difficult to over-estimate the impact Let's Encrypt has had on the security of the web. 2025-12-10 00:34:15+00:00
Devstral 2 https://mistral.ai/news/devstral-2-vibe-cli Two new models from Mistral today: Devstral 2 and Devstral Small 2 - both focused on powering coding agents such as Mistral's newly released Mistral Vibe which [I wrote about earlier today](https://simonwillison.net/2025/Dec/9/mistral-vibe/). > - Devstral 2: SOTA open model for code agents with a fraction of the parameters of its competitors and achieving 72.2% on SWE-bench Verified. > - Up to 7x more cost-efficient than Claude Sonnet at real-world tasks. Devstral 2 is a 123B model released under a janky license - it's "modified MIT" where [the modification](https://huggingface.co/mistralai/Devstral-2-123B-Instruct-2512/blob/main/LICENSE) is: > You are not authorized to exercise any rights under this license if the global consolidated monthly revenue of your company (or that of your employer) exceeds $20 million (or its equivalent in another currency) for the preceding month. This restriction in (b) applies to the Model and any derivatives, modifications, or combined works based on it, whether provided by Mistral AI or by a third party. [...] Mistral Small 2 is under a proper Apache 2 license with no weird strings attached. It's a 24B model which is [51.6GB on Hugging Face](https://huggingface.co/mistralai/Devstral-Small-2-24B-Instruct-2512) and should quantize to significantly less. I tried out the larger model via [my llm-mistral plugin](https://github.com/simonw/llm-mistral) like this: llm install llm-mistral llm mistral refresh llm -m mistral/devstral-2512 "Generate an SVG of a pelican riding a bicycle" ![Bicycle looks a bit like a cybertruck](https://static.simonwillison.net/static/2025/devstral-2.jpg) For a ~120B model that one is pretty good! Here's the same prompt with `-m mistral/labs-devstral-small-2512` for the API hosted version of Devstral Small 2: ![A small white pelican on what looks more like a child's cart.](https://static.simonwillison.net/static/2025/devstral-small-2.jpg) Again, a decent result given the small parameter size. For comparison, [here's what I got](https://simonwillison.net/2025/Jun/20/mistral-small-32/) for the 24B Mistral Small 3.2 earlier this year. 2025-12-09 23:58:27+00:00
Agentic AI Foundation https://aaif.io/ Announced today as a new foundation under the parent umbrella of the Linux Foundation (see also the OpenJS Foundation, Cloud Native Computing Foundation, OpenSSF and [many more](https://www.linuxfoundation.org/projects)). The AAIF was started by a heavyweight group of "founding platinum members" ([$350,000](https://aaif.io/members/#join)): AWS, Anthropic, Block, Bloomberg, Cloudflare, Google, Microsoft, and OpenAI. The [stated goal](https://aaif.io/press/linux-foundation-announces-the-formation-of-the-agentic-ai-foundation-aaif-anchored-by-new-project-contributions-including-model-context-protocol-mcp-goose-and-agents-md/) is to provide "a neutral, open foundation to ensure agentic AI evolves transparently and collaboratively". Anthropic have [donated Model Context Protocol](https://www.anthropic.com/news/donating-the-model-context-protocol-and-establishing-of-the-agentic-ai-foundation) to the new foundation, OpenAI [donated AGENTS.md](https://openai.com/index/agentic-ai-foundation/), Block [donated goose](https://block.xyz/inside/block-anthropic-and-openai-launch-the-agentic-ai-foundation) (their [open source, extensible AI agent](https://github.com/block/goose)). Personally the project I'd like to see most from an initiative like this one is a clear, community-managed specification for the OpenAI Chat Completions JSON API - or a close equivalent. There are dozens of slightly incompatible implementations of that not-quite-specification floating around already, it would be great to have a written spec accompanied by a compliance test suite. 2025-12-09 22:24:48+00:00
mistralai/mistral-vibe https://github.com/mistralai/mistral-vibe Here's the Apache 2.0 licensed source code for Mistral's new "Vibe" CLI coding agent, [released today](https://mistral.ai/news/devstral-2-vibe-cli) alongside Devstral 2. It's a neat implementation of the now standard terminal coding agent pattern, built in Python on top of Pydantic and Rich/Textual (here are [the dependencies](https://github.com/mistralai/mistral-vibe/blob/v1.0.4/pyproject.toml#L29-L46).) [Gemini CLI](https://github.com/google-gemini/gemini-cli) is TypeScript, Claude Code is closed source (TypeScript, now [on top of Bun](https://simonwillison.net/2025/Dec/2/anthropic-acquires-bun/)), OpenAI's [Codex CLI](https://github.com/openai/codex) is Rust. [OpenHands](https://github.com/OpenHands/OpenHands) is the other major Python coding agent I know of, but I'm likely missing some others. (UPDATE: [Kimi CLI](https://github.com/MoonshotAI/kimi-cli) is another open source Apache 2 Python one.) The Vibe source code is pleasant to read and the crucial prompts are neatly extracted out into Markdown files. Some key places to look: - [core/prompts/cli.md](https://github.com/mistralai/mistral-vibe/blob/v1.0.4/vibe/core/prompts/cli.md) is the main system prompt ("You are operating as and within Mistral Vibe, a CLI coding-agent built by Mistral AI...") - [core/prompts/compact.md](https://github.com/mistralai/mistral-vibe/blob/v1.0.4/vibe/core/prompts/compact.md) is the prompt used to generate compacted summaries of conversations ("Create a comprehensive summary of our entire conversation that will serve as complete context for continuing this work...") - Each of the core tools has its own prompt file: - [.../prompts/bash.md](https://github.com/mistralai/mistral-vibe/blob/v1.0.4/vibe/core/tools/builtins/prompts/bash.md) - [.../prompts/grep.md](https://github.com/mistralai/mistral-vibe/blob/v1.0.4/vibe/core/tools/builtins/prompts/grep.md) - [.../prompts/read_file.md](https://github.com/mistralai/mistral-vibe/blob/v1.0.4/vibe/core/tools/builtins/prompts/read_file.md) - [.../prompts/write_file.md](https://github.com/mistralai/mistral-vibe/blob/v1.0.4/vibe/core/tools/builtins/prompts/write_file.md) - [.../prompts/search_replace.md](https://github.com/mistralai/mistral-vibe/blob/v1.0.4/vibe/core/tools/builtins/prompts/search_replace.md) - [.../prompts/todo.md](https://github.com/mistralai/mistral-vibe/blob/v1.0.4/vibe/core/tools/builtins/prompts/todo.md) The Python implementations of those tools [can be found here](https://github.com/mistralai/mistral-vibe/tree/v1.0.4/vibe/core/tools/builtins). I tried it out and had it build me a Space Invaders game using three.js with the following prompt: > `make me a space invaders game as HTML with three.js loaded from a CDN` ![Animated screenshot demo of Mistral Vibe running in a terminal. The text reads: I've created a Space Invaders game using HTML and Three. js loaded from a CDN. The game is now available in the file space_invaders.html in your current directory. Here's how to play: 1. Open the space_invaders.html file in a web browser 2. Use the left and right arrow keys to move your player (green rectangle) 3. Press the spacebar to shoot at the invaders (red rectangles) 4. Try to get the highest score before the invaders reach you or hit you with their bullets The game features: © Player movement with arrow keys © Shooting mechanics with spacebar © Enemy invaders that move back and forth © Collision detection « Score tracking * Game over screen © Increasing difficulty Writing file (64s esc to interrupt) »» auto-approve on (shift-tab to toggle) - 7% of 100k tokens](https://static.simonwillison.net/static/2025/vibe.gif) Here's [the source code](https://github.com/simonw/space-invaders-by-llms/blob/main/mistral-vibe-devstral-2/index.html) and [the live game](https://space-invaders.simonwillison.net/mistral-vibe-devstral-2/) (hosted in my new [space-invaders-by-llms](https://github.com/simonw/space-invaders-by-llms) repo). It did OK. 2025-12-09 20:19:21+00:00
Prediction: AI will make formal verification go mainstream https://martin.kleppmann.com/2025/12/08/ai-formal-verification.html Martin Kleppmann makes the case for formal verification languages (things like [Dafny](https://dafny.org/), [Nagini](https://github.com/marcoeilers/nagini), and [Verus](https://github.com/verus-lang/verus)) to finally start achieving more mainstream usage. Code generated by LLMs can benefit enormously from more robust verification, and LLMs themselves make these notoriously difficult systems easier to work with. The paper [Can LLMs Enable Verification in Mainstream Programming?](https://arxiv.org/abs/2503.14183) by JetBrains Research in March 2025 found that Claude 3.5 Sonnet saw promising results for the three languages I listed above. 2025-12-09 03:11:19+00:00
Deprecations via warnings don’t work for Python libraries https://sethmlarson.dev/deprecations-via-warnings-dont-work-for-python-libraries Seth Larson reports that [urllib3 2.6.0](https://github.com/urllib3/urllib3/blob/main/CHANGES.rst#260-2025-12-05) released on the 5th of December and finally removed the `HTTPResponse.getheaders()` and `HTTPResponse.getheader(name, default)` methods, which have been marked as deprecated via warnings since [v2.0.0 in April 2023](https://github.com/urllib3/urllib3/releases/tag/2.0.0). They had to *add them back again* in a hastily released [2.6.1](https://github.com/urllib3/urllib3/blob/main/CHANGES.rst#261-2025-12-08) a few days later when it turned out major downstream dependents such as [kubernetes-client](https://github.com/kubernetes-client/python/issues/2280) and [fastly-py](https://github.com/fastly/fastly-py/pull/112) still hadn't upgraded. Seth says: > My conclusion from this incident is that [`DeprecationWarning`](https://docs.python.org/3/library/exceptions.html#DeprecationWarning) in its current state does not work for deprecating APIs, at least for Python libraries. That is unfortunate, as `DeprecationWarning` and the [`warnings` module](https://docs.python.org/3/library/warnings.html) are easy-to-use, language-"blessed", and explicit without impacting users that don't need to take action due to deprecations. On Lobste.rs James Bennett [advocates for watching for warnings more deliberately](https://lobste.rs/s/pvaalr/deprecations_via_warnings_don_t_work_for#c_smnajm): > Something I always encourage people to do, and try to get implemented anywhere I work, is running Python test suites with `-Wonce::DeprecationWarning`. This doesn't spam you with noise if a deprecated API is called a lot, but still makes sure you see the warning so you know there's something you need to fix. I didn't know about the `-Wonce` option - [the documentation](https://docs.python.org/3/using/cmdline.html#cmdoption-W) describes that as "Warn once per Python process". 2025-12-09 01:13:39+00:00
Niche Museums: The Museum of Jurassic Technology https://www.niche-museums.com/116 I finally got to check off the museum that's been top of my want-to-go list since I first started documenting niche museums I've been to back in 2019. The Museum of Jurassic Technology opened in Culver City, Los Angeles in 1988 and has been leaving visitors confused as to what's real and what isn't for nearly forty years. 2025-12-08 03:16:41+00:00
Using LLMs at Oxide https://rfd.shared.oxide.computer/rfd/0576 Thoughtful guidance from Bryan Cantrill, who evaluates applications of LLMs against Oxide's core values of responsibility, rigor, empathy, teamwork, and urgency. 2025-12-07 21:28:17+00:00
The Unexpected Effectiveness of One-Shot Decompilation with Claude https://blog.chrislewis.au/the-unexpected-effectiveness-of-one-shot-decompilation-with-claude/ Chris Lewis decompiles N64 games. He wrote about this previously in [Using Coding Agents to Decompile Nintendo 64 Games](https://blog.chrislewis.au/using-coding-agents-to-decompile-nintendo-64-games/), describing his efforts to decompile Snowboard Kids 2 ([released in 1999](https://en.wikipedia.org/wiki/Snowboard_Kids_2)) using a "matching" process: > The matching decompilation process involves analysing the MIPS assembly, inferring its behaviour, and writing C that, when compiled with the same toolchain and settings, reproduces the exact code: same registers, delay slots, and instruction order. [...] > > A good match is more than just C code that compiles to the right bytes. It should look like something an N64-era developer would plausibly have written: simple, idiomatic C control flow and sensible data structures. Chris was getting some useful results from coding agents earlier on, but this [new post](https://blog.chrislewis.au/the-unexpected-effectiveness-of-one-shot-decompilation-with-claude/) describes how a switching to a new processing Claude Opus 4.5 and Claude Code has massively accelerated the project - as demonstrated started by this chart on [the decomp.dev page](https://decomp.dev/cdlewis/snowboardkids2-decomp?mode=history) for his project: ![Chart showing progress in matching code for Snowboard Kids 2. It slowly climbs from 20% to 25% from 3rd September to 17th November, then rises quickly to 45% by 2nd December](https://static.simonwillison.net/static/2025/decomp-progress.jpg) Here's [the prompt he was using](https://github.com/cdlewis/snowboardkids2-decomp/blob/852f47a4905a08d5d652387597bc5b47d29582f2/CLAUDE.md). The big productivity boost was unlocked by switching to use Claude Code in non-interactive mode and having it tackle the less complicated functions (aka the lowest hanging fruit) first. Here's the relevant code from the [driving Bash script](https://github.com/cdlewis/snowboardkids2-decomp/blob/785db3cb0ce356e57ea5016835499fd6b393c490/tools/vacuum.sh#L44-L54): <pre>simplest_func=<span class="pl-s"><span class="pl-pds">$(</span>python3 tools/score_functions.py asm/nonmatchings/ <span class="pl-k">2&gt;&amp;1</span><span class="pl-pds">)</span></span> <span class="pl-c"><span class="pl-c">#</span> ...</span> output=<span class="pl-s"><span class="pl-pds">$(</span>claude -p <span class="pl-s"><span class="pl-pds">"</span>decompile the function <span class="pl-smi">$simplest_func</span><span class="pl-pds">"</span></span> <span class="pl-k">2&gt;&amp;1</span> <span class="pl-k">|</span> tee -a tools/vacuum.log<span class="pl-pds">)</span></span></pre> [score_functions.py](https://github.com/cdlewis/snowboardkids2-decomp/blob/785db3cb0ce356e57ea5016835499fd6b393c490/tools/score_functions.py) uses some heuristics to decide which of the remaining un-matched functions look to be the least complex. 2025-12-06 18:30:56+00:00
Copy and export data

Duration: 3.99ms