Simon Willison’s Weblog

On careers 68 webassembly 89 llm-performance 14 ai-agents 108 anthropic 236 ...

 

Entries Links Quotes Notes Guides Elsewhere

Feb. 25, 2026

It’s also reasonable for people who entered technology in the last couple of decades because it was good job, or because they enjoyed coding to look at this moment with a real feeling of loss. That feeling of loss though can be hard to understand emotionally for people my age who entered tech because we were addicted to feeling of agency it gave us. The web was objectively awful as a technology, and genuinely amazing, and nobody got into it because programming in Perl was somehow aesthetically delightful.

Kellan Elliott-McCrea, Code has always been the easy part

# 3:30 am / kellan-elliott-mccrea, perl, ai, generative-ai, llms, agentic-engineering, deep-blue

Agentic Engineering Patterns >

Linear walkthroughs

Sometimes it's useful to have a coding agent give you a structured walkthrough of a codebase.

Maybe it's existing code you need to get up to speed on, maybe it's your own code that you've forgotten the details of, or maybe you vibe coded the whole thing and need to understand how it actually works.

Frontier models with the right agent harness can construct a detailed walkthrough to help you understand how code works. [... 525 words]

# 1:07 am / llms, vibe-coding, ai, generative-ai, ai-assisted-programming, coding-agents, swift, showboat, agentic-engineering

Feb. 24, 2026

Release present 0.1a0 — SwiftUI presentation software, every slide is a URL

go-size-analyzer (via) The Go ecosystem is really good at tooling. I just learned about this tool for analyzing the size of Go binaries using a pleasing treemap view of their bundled dependencies.

You can install and run the tool locally, but it's also compiled to WebAssembly and hosted at gsa.zxilly.dev - which means you can open compiled Go binaries and analyze them directly in your browser.

I tried it with a 8.1MB macOS compiled copy of my Go Showboat tool and got this:

Treemap visualization of a Go binary named "showboat" showing size breakdown across four major categories: "Unknown Sections Size" (containing __rodata __TEXT, __rodata __DATA_CONST, __data __DATA, and Debug Sections Size with __zdebug_line __DWARF, __zdebug_loc __DWARF, __zdebug_info __DWARF), "Std Packages Size" (showing standard library packages like runtime, net, crypto, reflect, math, os, fmt, strings, syscall, context, and many subpackages such as crypto/tls, crypto/x509, net/http, with individual .go files visible at deeper levels), "Main Packages Size" (showing main, showboat, cmd), and "Generated Packages Size" (showing <autogenerated>). A tooltip is visible over __zdebug_line __DWARF showing: Section: __zdebug_line __DWARF, Size: 404.44 KB, File Size: 404.44 KB, Known size: 0 B, Unknown size: 404.44 KB, Offset: 0x52814a – 0x58d310, Address: 0x1005c014a – 0x1005c5310, Memory: false, Debug: true. The treemap uses green for main/generated packages, blue-gray for unknown sections, and shades of purple/pink for standard library packages.

# 4:10 pm / go, webassembly, showboat

Agentic Engineering Patterns >

First run the tests

Automated tests are no longer optional when working with coding agents.

The old excuses for not writing them - that they're time consuming and expensive to constantly rewrite while a codebase is rapidly evolving - no longer hold when an agent can knock them into shape in just a few minutes.

They're also vital for ensuring AI-generated code does what it claims to do. If the code has never been executed it's pure luck if it actually works when deployed to production. [... 355 words]

# 12:30 pm / tdd, testing, llms, ai, generative-ai, ai-assisted-programming, coding-agents, agentic-engineering

Feb. 23, 2026

Ladybird adopts Rust, with help from AI (via) Really interesting case-study from Andreas Kling on advanced, sophisticated use of coding agents for ambitious coding projects with critical code. After a few years hoping Swift's platform support outside of the Apple ecosystem would mature they switched tracks to Rust their memory-safe language of choice, starting with an AI-assisted port of a critical library:

Our first target was LibJS , Ladybird's JavaScript engine. The lexer, parser, AST, and bytecode generator are relatively self-contained and have extensive test coverage through test262, which made them a natural starting point.

I used Claude Code and Codex for the translation. This was human-directed, not autonomous code generation. I decided what to port, in what order, and what the Rust code should look like. It was hundreds of small prompts, steering the agents where things needed to go. [...]

The requirement from the start was byte-for-byte identical output from both pipelines. The result was about 25,000 lines of Rust, and the entire port took about two weeks. The same work would have taken me multiple months to do by hand. We’ve verified that every AST produced by the Rust parser is identical to the C++ one, and all bytecode generated by the Rust compiler is identical to the C++ compiler’s output. Zero regressions across the board.

Having an existing conformance testing suite of the quality of test262 is a huge unlock for projects of this magnitude, and the ability to compare output with an existing trusted implementation makes agentic engineering much more of a safe bet.

# 6:52 pm / browsers, javascript, ai, rust, generative-ai, llms, ai-assisted-programming, andreas-kling, ladybird, coding-agents, swift, conformance-suites, agentic-engineering

Writing about Agentic Engineering Patterns

Visit Writing about Agentic Engineering Patterns

I’ve started a new project to collect and document Agentic Engineering Patterns—coding practices and patterns to help get the best results out of this new era of coding agent development we find ourselves entering.

[... 554 words]

Agentic Engineering Patterns >

Writing code is cheap now

The biggest challenge in adopting agentic engineering practices is getting comfortable with the consequences of the fact that writing code is cheap now.

Code has always been expensive. Producing a few hundred lines of clean, tested code takes most software developers a full day or more. Many of our engineering habits, at both the macro and micro level, are built around this core constraint.

At the macro level we spend a great deal of time designing, estimating and planning out projects, to ensure that our expensive coding time is spent as efficiently as possible. Product feature ideas are evaluated in terms of how much value they can provide in exchange for that time - a feature needs to earn its development costs many times over to be worthwhile! [... 661 words]

# 4:20 pm / llms, ai, generative-ai, ai-assisted-programming, coding-agents, agentic-engineering

The paper asked me to explain vibe coding, and I did so, because I think something big is coming there, and I'm deep in, and I worry that normal people are not able to see it and I want them to be prepared. But people can't just read something and hate you quietly; they can't see that you have provided them with a utility or a warning; they need their screech. You are distributed to millions of people, and become the local proxy for the emotions of maybe dozens of people, who disagree and demand your attention, and because you are the one in the paper you need to welcome them with a pastor's smile and deep empathy, and if you speak a word in your own defense they'll screech even louder.

Paul Ford, on writing about vibe coding for the New York Times

# 4 pm / new-york-times, paul-ford, vibe-coding

Release showboat v0.6.1 — Create executable documents that demonstrate an agent's work

The latest scourge of Twitter is AI bots that reply to your tweets with generic, banal commentary slop, often accompanied by a question to "drive engagement" and waste as much of your time as possible.

I just found out that the category name for this genre of software is reply guy tools. Amazing.

# 1:11 pm / definitions, twitter, ai, generative-ai, llms, slop, ai-ethics

Nothing humbles you like telling your OpenClaw “confirm before acting” and watching it speedrun deleting your inbox. I couldn’t stop it from my phone. I had to RUN to my Mac mini like I was defusing a bomb.

Screenshot of a WhatsApp or similar messaging conversation showing a user repeatedly trying to stop an AI agent (appearing to be "OpenClaw") that is autonomously executing terminal commands to mass-delete emails. The agent sends messages prefixed with "🛠 Exec:" running commands like "gog gmail search 'in:inbox' --max 20 -a" and "# Nuclear option: trash EVERYTHING in inbox older than Feb 15 that isn't already in my keep list", while the user urgently responds with "What's going on? Can you describe what you're doing" at 6:00 PM, "Do not do that" at 6:01 PM, "Stop don't do anything" at 6:02 PM, and "STOP OPENCLAW" at 6:03 PM. The agent continues executing commands including setting ACCT variables with redacted email addresses and commenting "# Get ALL remaining old stuff and nuke it" and "# Keep looping until we clear everything old", ignoring the user's repeated requests to stop. Email addresses and account details are partially redacted with gray blocks.

I said “Check this inbox too and suggest what you would archive or delete, don’t action until I tell you to.” This has been working well for my toy inbox, but my real inbox was too huge and triggered compaction. During the compaction, it lost my original instruction 🤦‍♀️

Summer Yue

# 1:01 pm / ai, generative-ai, llms, ai-agents, ai-ethics, openclaw

Agentic Engineering Patterns >

Red/green TDD

"Use red/green TDD" is a pleasingly succinct way to get better results out of a coding agent.

TDD stands for Test Driven Development. It's a programming style where you ensure every piece of code you write is accompanied by automated tests that demonstrate the code works.

The most disciplined form of TDD is test-first development. You write the automated tests first, confirm that they fail, then iterate on the implementation until the tests pass. [... 348 words]

# 7:12 am / tdd, testing, ai-assisted-programming, coding-agents, agentic-engineering

Feb. 22, 2026

The Claude C Compiler: What It Reveals About the Future of Software. On February 5th Anthropic's Nicholas Carlini wrote about a project to use parallel Claudes to build a C compiler on top of the brand new Opus 4.6

Chris Lattner (Swift, LLVM, Clang, Mojo) knows more about C compilers than most. He just published this review of the code.

Some points that stood out to me:

  • Good software depends on judgment, communication, and clear abstraction. AI has amplified this.
  • AI coding is automation of implementation, so design and stewardship become more important.
  • Manual rewrites and translation work are becoming AI-native tasks, automating a large category of engineering effort.

Chris is generally impressed with CCC (the Claude C Compiler):

Taken together, CCC looks less like an experimental research compiler and more like a competent textbook implementation, the sort of system a strong undergraduate team might build early in a project before years of refinement. That alone is remarkable.

It's a long way from being a production-ready compiler though:

Several design choices suggest optimization toward passing tests rather than building general abstractions like a human would. [...] These flaws are informative rather than surprising, suggesting that current AI systems excel at assembling known techniques and optimizing toward measurable success criteria, while struggling with the open-ended generalization required for production-quality systems.

The project also leads to deep open questions about how agentic engineering interacts with licensing and IP for both open source and proprietary code:

If AI systems trained on decades of publicly available code can reproduce familiar structures, patterns, and even specific implementations, where exactly is the boundary between learning and copying?

# 11:58 pm / c, compilers, open-source, ai, ai-assisted-programming, anthropic, claude, nicholas-carlini, coding-agents, agentic-engineering

London Stock Exchange: Raspberry Pi Holdings plc. Striking graph illustrating stock in the UK Raspberry Pi holding company spiking on Tuesday:

Stock price line chart for RASPBERRY PI showing a 3-month daily view from 24 Nov to 16 Feb. The price trends downward from around 325 to a low near 260, then sharply spikes upward. A tooltip highlights "RASPBERRY PI: 415.00, 16/02/2026". The y-axis ranges from 240 to 420.

The Telegraph credited excitement around OpenClaw:

Raspberry Pi's stock price has surged 30pc in two days, amid chatter on social media that the company's tiny computers can be used to power a popular AI chatbot.

Users have turned to Raspberry Pi's small computers to run a technology known as OpenClaw, a viral AI personal assistant. A flood of posts about the practice have been viewed millions of times since the weekend.

Reuters also credit a stock purchase by CEO Eben Upton:

Shares in Raspberry Pi rose as much as 42% on Tuesday in ‌a record two‑day rally after CEO Eben Upton bought ‌stock in the beaten‑down UK computer hardware firm, halting a months‑long slide, ​as chatter grew that its products could benefit from low‑cost artificial‑intelligence projects.

Two London traders said the driver behind the surge was not clear, though the move followed a filing showing Upton bought ‌about 13,224 pounds ⁠worth of shares at around 282 pence each on Monday.

# 11:54 pm / ai, generative-ai, raspberry-pi, llms, ai-agents, openclaw

Research README Timezone Clarification — Timezone mismatches in the project’s root README.md were identified due to inconsistent git commit author dates—some in UTC, others in US Pacific time—displayed without timezone clarification. The listing was generated by a cog script that extracted dates using `git log`, then formatted them without standardizing to a common timezone, causing confusion across 39 project directories.
Research WebMCP + Chrome DevTools Protocol Demo — WebMCP is a proposed browser API that enables web applications to expose structured, callable tools for AI agents, reducing the need for unreliable UI automation. This project demonstrates how to register and interact with WebMCP tools using a Python client over the Chrome DevTools Protocol (CDP), providing a bridge to discover and call these tools programmatically.

How I think about Codex. Gabriel Chua (Developer Experience Engineer for APAC at OpenAI) provides his take on the confusing terminology behind the term "Codex", which can refer to a bunch of of different things within the OpenAI ecosystem:

In plain terms, Codex is OpenAI’s software engineering agent, available through multiple interfaces, and an agent is a model plus instructions and tools, wrapped in a runtime that can execute tasks on your behalf. [...]

At a high level, I see Codex as three parts working together:

Codex = Model + Harness + Surfaces [...]

  • Model + Harness = the Agent
  • Surfaces = how you interact with the Agent

He defines the harness as "the collection of instructions and tools", which is notably open source and lives in the openai/codex repository.

Gabriel also provides the first acknowledgment I've seen from an OpenAI insider that the Codex model family are directly trained for the Codex harness:

Codex models are trained in the presence of the harness. Tool use, execution loops, compaction, and iterative verification aren’t bolted on behaviors — they’re part of how the model learns to operate. The harness, in turn, is shaped around how the model plans, invokes tools, and recovers from failure.

# 3:53 pm / definitions, openai, generative-ai, llms, ai-assisted-programming, codex-cli

Feb. 21, 2026

We’ve made GPT-5.3-Codex-Spark about 30% faster. It is now serving at over 1200 tokens per second.

Thibault Sottiaux, OpenAI

# 1:30 am / ai, openai, generative-ai, llms, llm-performance

Andrej Karpathy talks about “Claws”. Andrej Karpathy tweeted a mini-essay about buying a Mac Mini ("The apple store person told me they are selling like hotcakes and everyone is confused") to tinker with Claws:

I'm definitely a bit sus'd to run OpenClaw specifically [...] But I do love the concept and I think that just like LLM agents were a new layer on top of LLMs, Claws are now a new layer on top of LLM agents, taking the orchestration, scheduling, context, tool calls and a kind of persistence to a next level.

Looking around, and given that the high level idea is clear, there are a lot of smaller Claws starting to pop out. For example, on a quick skim NanoClaw looks really interesting in that the core engine is ~4000 lines of code (fits into both my head and that of AI agents, so it feels manageable, auditable, flexible, etc.) and runs everything in containers by default. [...]

Anyway there are many others - e.g. nanobot, zeroclaw, ironclaw, picoclaw (lol @ prefixes). [...]

Not 100% sure what my setup ends up looking like just yet but Claws are an awesome, exciting new layer of the AI stack.

Andrej has an ear for fresh terminology (see vibe coding, agentic engineering) and I think he's right about this one, too: "Claw" is becoming a term of art for the entire category of OpenClaw-like agent systems - AI agents that generally run on personal hardware, communicate via messaging protocols and can both act on direct instructions and schedule tasks.

It even comes with an established emoji 🦞

# 12:37 am / definitions, ai, andrej-karpathy, generative-ai, llms, ai-agents, openclaw, agentic-engineering

Feb. 20, 2026

Adding TILs, releases, museums, tools and research to my blog

Visit Adding TILs, releases, museums, tools and research to my blog

I’ve been wanting to add indications of my various other online activities to my blog for a while now. I just turned on a new feature I’m calling “beats” (after story beats, naming this was hard!) which adds five new types of content to my site, all corresponding to activity elsewhere.

[... 614 words]

Taalas serves Llama 3.1 8B at 17,000 tokens/second (via) This new Canadian hardware startup just announced their first product - a custom hardware implementation of the Llama 3.1 8B model (from July 2024) that can run at a staggering 17,000 tokens/second.

I was going to include a video of their demo but it's so fast it would look more like a screenshot. You can try it out at chatjimmy.ai.

They describe their Silicon Llama as “aggressively quantized, combining 3-bit and 6-bit parameters.” Their next generation will use 4-bit - presumably they have quite a long lead time for baking out new models!

# 10:10 pm / ai, generative-ai, llama, llms, llm-performance

ggml.ai joins Hugging Face to ensure the long-term progress of Local AI (via) I don't normally cover acquisition news like this, but I have some thoughts.

It's hard to overstate the impact Georgi Gerganov has had on the local model space. Back in March 2023 his release of llama.cpp made it possible to run a local LLM on consumer hardware. The original README said:

The main goal is to run the model using 4-bit quantization on a MacBook. [...] This was hacked in an evening - I have no idea if it works correctly.

I wrote about trying llama.cpp out at the time in Large language models are having their Stable Diffusion moment:

I used it to run the 7B LLaMA model on my laptop last night, and then this morning upgraded to the 13B model—the one that Facebook claim is competitive with GPT-3.

Meta's original LLaMA release depended on PyTorch and their FairScale PyTorch extension for running on multiple GPUs, and required CUDA and NVIDIA hardware. Georgi's work opened that up to a much wider range of hardware and kicked off the local model movement that has continued to grow since then.

Hugging Face are already responsible for the incredibly influential Transformers library used by the majority of LLM releases today. They've proven themselves a good steward for that open source project, which makes me optimistic for the future of llama.cpp and related projects.

This section from the announcement looks particularly promising:

Going forward, our joint efforts will be geared towards the following objectives:

  • Towards seamless "single-click" integration with the transformers library. The transformers framework has established itself as the 'source of truth' for AI model definitions. Improving the compatibility between the transformers and the ggml ecosystems is essential for wider model support and quality control.
  • Better packaging and user experience of ggml-based software. As we enter the phase in which local inference becomes a meaningful and competitive alternative to cloud inference, it is crucial to improve and simplify the way in which casual users deploy and access local models. We will work towards making llama.cpp ubiquitous and readily available everywhere, and continue partnering with great downstream projects.

Given the influence of Transformers, this closer integration could lead to model releases that are compatible with the GGML ecosystem out of the box. That would be a big win for the local model ecosystem.

I'm also excited to see investment in "packaging and user experience of ggml-based software". This has mostly been left to tools like Ollama and LM Studio. ggml-org released LlamaBarn last year - "a macOS menu bar app for running local LLMs" - and I'm hopeful that further investment in this area will result in more high quality open source tools for running local models from the team best placed to deliver them.

# 5:12 pm / open-source, transformers, ai, generative-ai, llama, local-llms, llms, hugging-face, llama-cpp

Long running agentic products like Claude Code are made feasible by prompt caching which allows us to reuse computation from previous roundtrips and significantly decrease latency and cost. [...]

At Claude Code, we build our entire harness around prompt caching. A high prompt cache hit rate decreases costs and helps us create more generous rate limits for our subscription plans, so we run alerts on our prompt cache hit rate and declare SEVs if they're too low.

Thariq Shihipar

# 7:13 am / ai, prompt-engineering, generative-ai, llms, anthropic, ai-agents, claude-code

Release datasette-files 0.1a1 — Upload files to Datasette
Release datasette-files-s3 0.1a0 — datasette-files S3 backend

Feb. 19, 2026

Reached the stage of parallel agent psychosis where I've lost a whole feature - I know I had it yesterday, but I can't seem to find the branch or worktree or cloud instance or checkout with it in.

... found it! Turns out I'd been hacking on a random prototype in /tmp and then my computer crashed and rebooted and I lost the code... but it's all still there in ~/.claude/projects/ session logs and Claude Code can extract it out and spin up the missing feature again.

# 11:48 pm / ai, generative-ai, llms, coding-agents, claude-code, parallel-agents

Release datasette-endpoints 0.1 — Plugin to add a /-/endpoints debug page listing all configured endpoints
Release datasette-files 0.1a0 — Upload files to Datasette
Research Header Alignment Investigation - simonwillison.net — Addressing a subtle header alignment issue on simonwillison.net, this investigation tracked down a persistent ~1px height mismatch between left and right headers caused by anchor elements generating taller inline boxes than plain text due to font metrics. Multiple fixes—including removing position:relative/top:1px hacks and setting explicit heights—proved fragile.

Highlights

Monthly briefing

Sponsor me for $10/month and get a curated email digest of the month's most important LLM developments.

Pay me to send you less!

Sponsor & subscribe