<?xml version="1.0" encoding="utf-8"?>
<feed xml:lang="en-us" xmlns="http://www.w3.org/2005/Atom"><title>Simon Willison's Weblog: definitions</title><link href="http://simonwillison.net/" rel="alternate"/><link href="http://simonwillison.net/tags/definitions.atom" rel="self"/><id>http://simonwillison.net/</id><updated>2026-02-23T13:11:57+00:00</updated><author><name>Simon Willison</name></author><entry><title>Reply guy</title><link href="https://simonwillison.net/2026/Feb/23/reply-guy/#atom-tag" rel="alternate"/><published>2026-02-23T13:11:57+00:00</published><updated>2026-02-23T13:11:57+00:00</updated><id>https://simonwillison.net/2026/Feb/23/reply-guy/#atom-tag</id><summary type="html">
    &lt;p&gt;The latest scourge of Twitter is AI bots that reply to your tweets with generic, banal commentary slop, often accompanied by a question to "drive engagement" and waste as much of your time as possible.&lt;/p&gt;
&lt;p&gt;I just &lt;a href="https://twitter.com/simonw/status/2025918174894673986"&gt;found out&lt;/a&gt; that the category name for this genre of software is &lt;strong&gt;reply guy&lt;/strong&gt; tools. Amazing.&lt;/p&gt;

    &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/ai-ethics"&gt;ai-ethics&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/twitter"&gt;twitter&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/slop"&gt;slop&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/generative-ai"&gt;generative-ai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/definitions"&gt;definitions&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/ai"&gt;ai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/llms"&gt;llms&lt;/a&gt;&lt;/p&gt;



</summary><category term="ai-ethics"/><category term="twitter"/><category term="slop"/><category term="generative-ai"/><category term="definitions"/><category term="ai"/><category term="llms"/></entry><entry><title>How I think about Codex</title><link href="https://simonwillison.net/2026/Feb/22/how-i-think-about-codex/#atom-tag" rel="alternate"/><published>2026-02-22T15:53:43+00:00</published><updated>2026-02-22T15:53:43+00:00</updated><id>https://simonwillison.net/2026/Feb/22/how-i-think-about-codex/#atom-tag</id><summary type="html">
    
&lt;p&gt;&lt;strong&gt;&lt;a href="https://www.linkedin.com/pulse/how-i-think-codex-gabriel-chua-ukhic"&gt;How I think about Codex&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;
Gabriel Chua (Developer Experience Engineer for APAC at OpenAI) provides his take on the confusing terminology behind the term "Codex", which can refer to a bunch of of different things within the OpenAI ecosystem:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;In plain terms, Codex is OpenAI’s software engineering agent, available through multiple interfaces, and an agent is a model plus instructions and tools, wrapped in a runtime that can execute tasks on your behalf. [...]&lt;/p&gt;
&lt;p&gt;At a high level, I see Codex as three parts working together:&lt;/p&gt;
&lt;p&gt;&lt;em&gt;Codex = Model + Harness + Surfaces&lt;/em&gt; [...]&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Model + Harness = the Agent&lt;/li&gt;
&lt;li&gt;Surfaces = how you interact with the Agent&lt;/li&gt;
&lt;/ul&gt;
&lt;/blockquote&gt;
&lt;p&gt;He defines the harness as "the collection of instructions and tools", which is notably open source and lives in the &lt;a href="https://github.com/openai/codex"&gt;openai/codex&lt;/a&gt; repository.&lt;/p&gt;
&lt;p&gt;Gabriel also provides the first acknowledgment I've seen from an OpenAI insider that the Codex model family are directly trained for the Codex harness:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;Codex models are trained in the presence of the harness. Tool use, execution loops, compaction, and iterative verification aren’t bolted on behaviors — they’re part of how the model learns to operate. The harness, in turn, is shaped around how the model plans, invokes tools, and recovers from failure.&lt;/p&gt;
&lt;/blockquote&gt;


    &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/definitions"&gt;definitions&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/openai"&gt;openai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/generative-ai"&gt;generative-ai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/llms"&gt;llms&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/ai-assisted-programming"&gt;ai-assisted-programming&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/codex-cli"&gt;codex-cli&lt;/a&gt;&lt;/p&gt;



</summary><category term="definitions"/><category term="openai"/><category term="generative-ai"/><category term="llms"/><category term="ai-assisted-programming"/><category term="codex-cli"/></entry><entry><title>Andrej Karpathy talks about "Claws"</title><link href="https://simonwillison.net/2026/Feb/21/claws/#atom-tag" rel="alternate"/><published>2026-02-21T00:37:45+00:00</published><updated>2026-02-21T00:37:45+00:00</updated><id>https://simonwillison.net/2026/Feb/21/claws/#atom-tag</id><summary type="html">
    
&lt;p&gt;&lt;strong&gt;&lt;a href="https://twitter.com/karpathy/status/2024987174077432126"&gt;Andrej Karpathy talks about &amp;quot;Claws&amp;quot;&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;
Andrej Karpathy tweeted a mini-essay about buying a Mac Mini ("The apple store person told me they are selling like hotcakes and everyone is confused") to tinker with Claws:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;I'm definitely a bit sus'd to run OpenClaw specifically [...] But I do love the concept and I think that just like LLM agents were a new layer on top of LLMs, Claws are now a new layer on top of LLM agents, taking the orchestration, scheduling, context, tool calls and a kind of persistence to a next level.&lt;/p&gt;
&lt;p&gt;Looking around, and given that the high level idea is clear, there are a lot of smaller Claws starting to pop out. For example, on a quick skim NanoClaw looks really interesting in that the core engine is ~4000 lines of code (fits into both my head and that of AI agents, so it feels manageable, auditable, flexible, etc.) and runs everything in containers by default. [...]&lt;/p&gt;
&lt;p&gt;Anyway there are many others - e.g. nanobot, zeroclaw, ironclaw, picoclaw (lol @ prefixes). [...]&lt;/p&gt;
&lt;p&gt;Not 100% sure what my setup ends up looking like just yet but Claws are an awesome, exciting new layer of the AI stack.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;Andrej has an ear for fresh terminology (see &lt;a href="https://simonwillison.net/2025/Mar/19/vibe-coding/"&gt;vibe coding&lt;/a&gt;, &lt;a href="https://simonwillison.net/2026/Feb/11/glm-5/"&gt;agentic engineering&lt;/a&gt;) and I think he's right about this one, too: "&lt;strong&gt;Claw&lt;/strong&gt;" is becoming a term of art for the entire category of OpenClaw-like agent systems - AI agents that generally run on personal hardware, communicate via messaging protocols and can both act on direct instructions and schedule tasks.&lt;/p&gt;
&lt;p&gt;It even comes with an established emoji 🦞


    &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/definitions"&gt;definitions&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/ai"&gt;ai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/andrej-karpathy"&gt;andrej-karpathy&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/generative-ai"&gt;generative-ai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/llms"&gt;llms&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/ai-agents"&gt;ai-agents&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/openclaw"&gt;openclaw&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/agentic-engineering"&gt;agentic-engineering&lt;/a&gt;&lt;/p&gt;



</summary><category term="definitions"/><category term="ai"/><category term="andrej-karpathy"/><category term="generative-ai"/><category term="llms"/><category term="ai-agents"/><category term="openclaw"/><category term="agentic-engineering"/></entry><entry><title>Deep Blue</title><link href="https://simonwillison.net/2026/Feb/15/deep-blue/#atom-tag" rel="alternate"/><published>2026-02-15T21:06:44+00:00</published><updated>2026-02-15T21:06:44+00:00</updated><id>https://simonwillison.net/2026/Feb/15/deep-blue/#atom-tag</id><summary type="html">
    &lt;p&gt;We coined a new term on the &lt;a href="https://simonwillison.net/2026/Jan/8/llm-predictions-for-2026/"&gt;Oxide and Friends podcast&lt;/a&gt; last month (primary credit to Adam Leventhal) covering the sense of psychological ennui leading into existential dread that many software developers are feeling thanks to the encroachment of generative AI into their field of work.&lt;/p&gt;
&lt;p&gt;We're calling it &lt;strong&gt;Deep Blue&lt;/strong&gt;.&lt;/p&gt;
&lt;p&gt;You can listen to it being coined in real time &lt;a href="https://www.youtube.com/watch?v=lVDhQMiAbR8&amp;amp;t=2835s"&gt;from 47:15 in the episode&lt;/a&gt;. I've included &lt;a href="https://simonwillison.net/2026/Feb/15/deep-blue/#transcript"&gt;a transcript below&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;Deep Blue is a very real issue.&lt;/p&gt;
&lt;p&gt;Becoming a professional software engineer is &lt;em&gt;hard&lt;/em&gt;. Getting good enough for people to pay you money to write software takes years of dedicated work. The rewards are significant: this is a well compensated career which opens up a lot of great opportunities.&lt;/p&gt;
&lt;p&gt;It's also a career that's mostly free from gatekeepers and expensive prerequisites. You don't need an expensive degree or accreditation. A laptop, an internet connection and a lot of time and curiosity is enough to get you started.&lt;/p&gt;
&lt;p&gt;And it rewards the nerds! Spending your teenage years tinkering with computers turned out to be a very smart investment in your future.&lt;/p&gt;
&lt;p&gt;The idea that this could all be stripped away by a chatbot is &lt;em&gt;deeply&lt;/em&gt; upsetting.&lt;/p&gt;
&lt;p&gt;I've seen signs of Deep Blue in most of the online communities I spend time in. I've even faced accusations from my peers that I am actively harming their future careers through my work helping people understand how well AI-assisted programming can work.&lt;/p&gt;
&lt;p&gt;I think this is an issue which is causing genuine mental anguish for a lot of people in our community. Giving it a name makes it easier for us to have conversations about it.&lt;/p&gt;
&lt;h4 id="my-experiences-of-deep-blue"&gt;My experiences of Deep Blue&lt;/h4&gt;
&lt;p&gt;I distinctly remember my first experience of Deep Blue. For me it was triggered by ChatGPT Code Interpreter back in early 2023.&lt;/p&gt;
&lt;p&gt;My primary project is &lt;a href="https://datasette.io/"&gt;Datasette&lt;/a&gt;, an ecosystem of open source tools for telling stories with data. I had dedicated myself to the challenge of helping people (initially focusing on journalists) clean up, analyze and find meaning in data, in all sorts of shapes and sizes.&lt;/p&gt;
&lt;p&gt;I expected I would need to build a lot of software for this! It felt like a challenge that could keep me happily engaged for many years to come.&lt;/p&gt;
&lt;p&gt;Then I tried uploading a CSV file of &lt;a href="https://data.sfgov.org/Public-Safety/Police-Department-Incident-Reports-2018-to-Present/wg3w-h783/about_data"&gt;San Francisco Police Department Incident Reports&lt;/a&gt; - hundreds of thousands of rows - to ChatGPT Code Interpreter and... it did every piece of data cleanup and analysis I had on my napkin roadmap for the next few years with a couple of prompts.&lt;/p&gt;
&lt;p&gt;It even converted the data into a neatly normalized SQLite database and let me download the result!&lt;/p&gt;
&lt;p&gt;I remember having two competing thoughts in parallel.&lt;/p&gt;
&lt;p&gt;On the one hand, as somebody who wants journalists to be able to do more with data, this felt like a &lt;em&gt;huge&lt;/em&gt; breakthrough. Imagine giving every journalist in the world an on-demand analyst who could help them tackle any data question they could think of!&lt;/p&gt;
&lt;p&gt;But on the other hand... &lt;em&gt;what was I even for&lt;/em&gt;? My confidence in the value of my own projects took a painful hit. Was the path I'd chosen for myself suddenly a dead end?&lt;/p&gt;
&lt;p&gt;I've had some further pangs of Deep Blue just in the past few weeks, thanks to the Claude Opus 4.5/4.6 and GPT-5.2/5.3 coding agent effect. As many other people are also observing, the latest generation of coding agents, given the right prompts, really can churn away for a few minutes to several hours and produce working, documented and fully tested software that exactly matches the criteria they were given.&lt;/p&gt;
&lt;p&gt;"The code they write isn't any good" doesn't really cut it any more.&lt;/p&gt;
&lt;h4 id="transcript"&gt;A lightly edited transcript&lt;/h4&gt;
&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Bryan&lt;/strong&gt;: I think that we're going to see a real problem with AI induced ennui where software engineers in particular get listless because the AI can do anything. Simon, what do you think about that?&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Simon&lt;/strong&gt;: Definitely. Anyone who's paying close attention to coding agents is feeling some of that already. There's an extent where you sort of get over it when you realize that you're still useful, even though your ability to memorize the syntax of program languages is completely irrelevant now.&lt;/p&gt;
&lt;p&gt;Something I see a lot of is people out there who are having existential crises and are very, very unhappy because they're like, "I dedicated my career to learning this thing and now it just does it. What am I even for?". I will very happily try and convince those people that they are for a whole bunch of things and that none of that experience they've accumulated has gone to waste, but psychologically it's a difficult time for software engineers.&lt;/p&gt;
&lt;p&gt;[...]&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Bryan&lt;/strong&gt;: Okay, so I'm going to predict that we name that. Whatever that is, we have a name for that kind of feeling and that kind of, whether you want to call it a blueness or a loss of purpose, and that we're kind of trying to address it collectively in a directed way.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Adam&lt;/strong&gt;: Okay, this is your big moment. Pick the name. If you call your shot from here, this is you pointing to the stands. You know, I – Like deep blue, you know.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Bryan&lt;/strong&gt;: Yeah, deep blue. I like that. I like deep blue. Deep blue. Oh, did you walk me into that, you bastard? You just blew out the candles on my birthday cake.&lt;/p&gt;
&lt;p&gt;It wasn't my big moment at all. That was your big moment. No, that is, Adam, that is very good. That is deep blue.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Simon&lt;/strong&gt;: All of the chess players and the Go players went through this a decade ago and they have come out stronger.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;Turns out it was more than a decade ago: &lt;a href="https://en.wikipedia.org/wiki/Deep_Blue_versus_Garry_Kasparov"&gt;Deep Blue defeated Garry Kasparov in 1997&lt;/a&gt;.&lt;/p&gt;
    
        &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/definitions"&gt;definitions&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/careers"&gt;careers&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/ai"&gt;ai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/generative-ai"&gt;generative-ai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/llms"&gt;llms&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/ai-assisted-programming"&gt;ai-assisted-programming&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/oxide"&gt;oxide&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/bryan-cantrill"&gt;bryan-cantrill&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/ai-ethics"&gt;ai-ethics&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/coding-agents"&gt;coding-agents&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/deep-blue"&gt;deep-blue&lt;/a&gt;&lt;/p&gt;
    

</summary><category term="definitions"/><category term="careers"/><category term="ai"/><category term="generative-ai"/><category term="llms"/><category term="ai-assisted-programming"/><category term="oxide"/><category term="bryan-cantrill"/><category term="ai-ethics"/><category term="coding-agents"/><category term="deep-blue"/></entry><entry><title>How Generative and Agentic AI Shift Concern from Technical Debt to Cognitive Debt</title><link href="https://simonwillison.net/2026/Feb/15/cognitive-debt/#atom-tag" rel="alternate"/><published>2026-02-15T05:20:11+00:00</published><updated>2026-02-15T05:20:11+00:00</updated><id>https://simonwillison.net/2026/Feb/15/cognitive-debt/#atom-tag</id><summary type="html">
    
&lt;p&gt;&lt;strong&gt;&lt;a href="https://margaretstorey.com/blog/2026/02/09/cognitive-debt/"&gt;How Generative and Agentic AI Shift Concern from Technical Debt to Cognitive Debt&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;
This piece by Margaret-Anne Storey is the best explanation of the term &lt;strong&gt;cognitive debt&lt;/strong&gt; I've seen so far.&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;&lt;em&gt;Cognitive debt&lt;/em&gt;, a term gaining &lt;a href="https://www.media.mit.edu/publications/your-brain-on-chatgpt/"&gt;traction&lt;/a&gt; recently, instead communicates the notion that the debt compounded from going fast lives in the brains of the developers and affects their lived experiences and abilities to “go fast” or to make changes. Even if AI agents produce code that could be easy to understand, the humans involved may have simply lost the plot and may not understand what the program is supposed to do, how their intentions were implemented, or how to possibly change it.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;Margaret-Anne expands on this further with an anecdote about a student team she coached:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;But by weeks 7 or 8, one team hit a wall. They could no longer make even simple changes without breaking something unexpected. When I met with them, the team initially blamed technical debt: messy code, poor architecture, hurried implementations. But as we dug deeper, the real problem emerged: no one on the team could explain why certain design decisions had been made or how different parts of the system were supposed to work together. The code might have been messy, but the bigger issue was that the theory of the system, their shared understanding, had fragmented or disappeared entirely. They had accumulated cognitive debt faster than technical debt, and it paralyzed them.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;I've experienced this myself on some of my more ambitious vibe-code-adjacent projects. I've been experimenting with prompting entire new features into existence without reviewing their implementations and, while it works surprisingly well, I've found myself getting lost in my own projects.&lt;/p&gt;
&lt;p&gt;I no longer have a firm mental model of what they can do and how they work, which means each additional feature becomes harder to reason about, eventually leading me to lose the ability to make confident decisions about where to go next.

    &lt;p&gt;&lt;small&gt;&lt;/small&gt;Via &lt;a href="https://martinfowler.com/fragments/2026-02-13.html"&gt;Martin Fowler&lt;/a&gt;&lt;/small&gt;&lt;/p&gt;


    &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/definitions"&gt;definitions&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/ai"&gt;ai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/generative-ai"&gt;generative-ai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/llms"&gt;llms&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/ai-assisted-programming"&gt;ai-assisted-programming&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/vibe-coding"&gt;vibe-coding&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/cognitive-debt"&gt;cognitive-debt&lt;/a&gt;&lt;/p&gt;



</summary><category term="definitions"/><category term="ai"/><category term="generative-ai"/><category term="llms"/><category term="ai-assisted-programming"/><category term="vibe-coding"/><category term="cognitive-debt"/></entry><entry><title>GLM-5: From Vibe Coding to Agentic Engineering</title><link href="https://simonwillison.net/2026/Feb/11/glm-5/#atom-tag" rel="alternate"/><published>2026-02-11T18:56:14+00:00</published><updated>2026-02-11T18:56:14+00:00</updated><id>https://simonwillison.net/2026/Feb/11/glm-5/#atom-tag</id><summary type="html">
    
&lt;p&gt;&lt;strong&gt;&lt;a href="https://z.ai/blog/glm-5"&gt;GLM-5: From Vibe Coding to Agentic Engineering&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;
This is a &lt;em&gt;huge&lt;/em&gt; new MIT-licensed model: 744B parameters and &lt;a href="https://huggingface.co/zai-org/GLM-5"&gt;1.51TB on Hugging Face&lt;/a&gt; twice the size of &lt;a href="https://huggingface.co/zai-org/GLM-4.7"&gt;GLM-4.7&lt;/a&gt; which was 368B and 717GB (4.5 and 4.6 were around that size too).&lt;/p&gt;
&lt;p&gt;It's interesting to see Z.ai take a position on what we should call professional software engineers building with LLMs - I've seen &lt;strong&gt;Agentic Engineering&lt;/strong&gt; show up in a few other places recently. most notable &lt;a href="https://twitter.com/karpathy/status/2019137879310836075"&gt;from Andrej Karpathy&lt;/a&gt; and &lt;a href="https://addyosmani.com/blog/agentic-engineering/"&gt;Addy Osmani&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;I ran my "Generate an SVG of a pelican riding a bicycle" prompt through GLM-5 via &lt;a href="https://openrouter.ai/"&gt;OpenRouter&lt;/a&gt; and got back &lt;a href="https://gist.github.com/simonw/cc4ca7815ae82562e89a9fdd99f0725d"&gt;a very good pelican on a disappointing bicycle frame&lt;/a&gt;:&lt;/p&gt;
&lt;p&gt;&lt;img alt="The pelican is good and has a well defined beak. The bicycle frame is a wonky red triangle. Nice sun and motion lines." src="https://static.simonwillison.net/static/2026/glm-5-pelican.png" /&gt;

    &lt;p&gt;&lt;small&gt;&lt;/small&gt;Via &lt;a href="https://news.ycombinator.com/item?id=46977210"&gt;Hacker News&lt;/a&gt;&lt;/small&gt;&lt;/p&gt;


    &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/definitions"&gt;definitions&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/ai"&gt;ai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/generative-ai"&gt;generative-ai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/llms"&gt;llms&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/ai-assisted-programming"&gt;ai-assisted-programming&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/pelican-riding-a-bicycle"&gt;pelican-riding-a-bicycle&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/llm-release"&gt;llm-release&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/vibe-coding"&gt;vibe-coding&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/openrouter"&gt;openrouter&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/ai-in-china"&gt;ai-in-china&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/glm"&gt;glm&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/agentic-engineering"&gt;agentic-engineering&lt;/a&gt;&lt;/p&gt;



</summary><category term="definitions"/><category term="ai"/><category term="generative-ai"/><category term="llms"/><category term="ai-assisted-programming"/><category term="pelican-riding-a-bicycle"/><category term="llm-release"/><category term="vibe-coding"/><category term="openrouter"/><category term="ai-in-china"/><category term="glm"/><category term="agentic-engineering"/></entry><entry><title>My answers to the questions I posed about porting open source code with LLMs</title><link href="https://simonwillison.net/2026/Jan/11/answers/#atom-tag" rel="alternate"/><published>2026-01-11T22:59:23+00:00</published><updated>2026-01-11T22:59:23+00:00</updated><id>https://simonwillison.net/2026/Jan/11/answers/#atom-tag</id><summary type="html">
    &lt;p&gt;Last month I &lt;a href="https://simonwillison.net/2025/Dec/15/porting-justhtml/"&gt;wrote about porting JustHTML from Python to JavaScript&lt;/a&gt; using Codex CLI and GPT-5.2 in a few hours while also buying a Christmas tree and watching Knives Out 3. I ended that post with a series of open questions about the ethics and legality of this style of work. Alexander Petros on &lt;a href="https://lobste.rs/"&gt;lobste.rs&lt;/a&gt; just &lt;a href="https://lobste.rs/s/cmsfbu/don_t_fall_into_anti_ai_hype#c_cqkdve"&gt;challenged me to answer them&lt;/a&gt;, which is fair enough! Here's my attempt at that.&lt;/p&gt;
&lt;p&gt;You can read &lt;a href="https://simonwillison.net/2025/Dec/15/porting-justhtml/"&gt;the original post&lt;/a&gt; for background, but the short version is that it's now possible to point a coding agent at some other open source project and effectively tell it "port this to language X and make sure the tests still pass" and have it do exactly that.&lt;/p&gt;
&lt;p&gt;Here are the questions I posed along with my answers based on my current thinking. Extra context is that I've since tried variations on a similar theme a few more times using Claude Code and Opus 4.5 and found it to be &lt;em&gt;astonishingly&lt;/em&gt; effective.&lt;/p&gt;
&lt;h4 id="does-this-library-represent-a-legal-violation-of-copyright-of-either-the-rust-library-or-the-python-one"&gt;Does this library represent a legal violation of copyright of either the Rust library or the Python one?&lt;/h4&gt;
&lt;p&gt;I decided that the right thing to do here was to &lt;a href="https://github.com/simonw/justjshtml/commit/a415d0af40c34bf9a856e956d841513f482867e3"&gt;keep the open source license and copyright statement&lt;/a&gt; from the Python library author and treat what I had built as a derivative work, which is the entire &lt;em&gt;point&lt;/em&gt; of open source.&lt;/p&gt;
&lt;h4 id="even-if-this-is-legal-is-it-ethical-to-build-a-library-in-this-way"&gt;Even if this is legal, is it ethical to build a library in this way?&lt;/h4&gt;
&lt;p&gt;After sitting on this for a while I've come down on yes, provided full credit is given and the license is carefully considered. Open source allows and encourages further derivative works! I never got upset at some university student forking one of my projects on GitHub and hacking in a new feature that they used. I don't think this is materially different, although a port to another language entirely does feel like a slightly different shape.&lt;/p&gt;
&lt;h4 id="does-this-format-of-development-hurt-the-open-source-ecosystem"&gt;Does this format of development hurt the open source ecosystem?&lt;/h4&gt;
&lt;p&gt;Now this one is complicated!&lt;/p&gt;
&lt;p&gt;It definitely hurts some projects because there are open source maintainers out there who say things like "I'm not going to release any open source code any more because I don't want it used for training" - I expect some of those would be equally angered by LLM-driven derived works as well.&lt;/p&gt;
&lt;p&gt;I don't know how serious this problem is - I've seen angry comments from anonymous usernames, but do they represent genuine open source contributions or are they just angry anonymous usernames?&lt;/p&gt;
&lt;p&gt;If we assume this is real, does the loss of those individuals get balanced out by the increase in individuals who CAN contribute to open source because they can now get work done in a few hours that might previously have taken them a few days that they didn't have to spare?&lt;/p&gt;
&lt;p&gt;I'll be brutally honest about that question: I think that if "they might train on my code / build a derived version with an LLM" is enough to drive you away from open source, your open source values are distinct enough from mine that I'm not ready to invest significantly in keeping you. I'll put that effort into welcoming the newcomers instead.&lt;/p&gt;
&lt;p&gt;The much bigger concern for me is the impact of generative AI on &lt;em&gt;demand&lt;/em&gt; for open source. The recent &lt;a href="https://simonwillison.net/2026/Jan/7/adam-wathan/"&gt;Tailwind story&lt;/a&gt; is a visible example of this - while Tailwind blamed LLMs for reduced traffic to their documentation resulting in fewer conversions to their paid component library, I'm suspicious that the reduced demand there is because LLMs make building good-enough versions of those components for free easy enough that people do that instead.&lt;/p&gt;
&lt;p&gt;I've found myself affected by this for open source dependencies too. The other day I wanted to parse a cron expression in some Go code. Usually I'd go looking for an existing library for cron expression parsing - but this time I hardly thought about that for a second before prompting one (complete with extensive tests) into existence instead.&lt;/p&gt;
&lt;p&gt;I expect that this is going to quite radically impact the shape of the open source library world over the next few years. Is that "harmful to open source"? It may well be. I'm hoping that whatever new shape comes out of this has its own merits, but I don't know what those would be.&lt;/p&gt;
&lt;h4 id="can-i-even-assert-copyright-over-this-given-how-much-of-the-work-was-produced-by-the-llm"&gt;Can I even assert copyright over this, given how much of the work was produced by the LLM?&lt;/h4&gt;
&lt;p&gt;I'm not a lawyer so I don't feel credible to comment on this one. My loose hunch is that I'm still putting enough creative control in through the way I direct the models for that to count as enough human intervention, at least under US law, but I have no idea.&lt;/p&gt;
&lt;h4 id="is-it-responsible-to-publish-software-libraries-built-in-this-way"&gt;Is it responsible to publish software libraries built in this way?&lt;/h4&gt;
&lt;p&gt;I've come down on "yes" here, again because I never thought it was irresponsible for some random university student to slap an Apache license on some bad code they just coughed up on GitHub.&lt;/p&gt;
&lt;p&gt;What's important here is making it very clear to potential users what they should expect from that software. I've started publishing my AI-generated and not 100% reviewed libraries as alphas, which I'm tentatively thinking of as &lt;strong&gt;"alpha slop"&lt;/strong&gt;. I'll take the alpha label off once I've used them in production to the point that I'm willing to stake my reputation on them being decent implementations, and I'll ship a 1.0 version when I'm confident that they are a solid bet for other people to depend on. I think that's the responsible way to handle this.&lt;/p&gt;
&lt;h4 id="how-much-better-would-this-library-be-if-an-expert-team-hand-crafted-it-over-the-course-of-several-months"&gt;How much better would this library be if an expert team hand crafted it over the course of several months?&lt;/h4&gt;
&lt;p&gt;That one was a deliberately provocative question, because for a new HTML5 parsing library that passes 9,200 tests you would need a &lt;em&gt;very&lt;/em&gt; good reason to hire an expert team for two months (at a cost of hundreds of thousands of dollars) to write such a thing. And honestly, thanks to the existing conformance suites this kind of library is simple enough that you may find their results weren't notably better than the one written by the coding agent.&lt;/p&gt;
    
        &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/definitions"&gt;definitions&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/open-source"&gt;open-source&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/ai"&gt;ai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/generative-ai"&gt;generative-ai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/llms"&gt;llms&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/ai-assisted-programming"&gt;ai-assisted-programming&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/ai-ethics"&gt;ai-ethics&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/conformance-suites"&gt;conformance-suites&lt;/a&gt;&lt;/p&gt;
    

</summary><category term="definitions"/><category term="open-source"/><category term="ai"/><category term="generative-ai"/><category term="llms"/><category term="ai-assisted-programming"/><category term="ai-ethics"/><category term="conformance-suites"/></entry><entry><title>Quoting Andrej Karpathy</title><link href="https://simonwillison.net/2025/Dec/19/andrej-karpathy/#atom-tag" rel="alternate"/><published>2025-12-19T23:07:52+00:00</published><updated>2025-12-19T23:07:52+00:00</updated><id>https://simonwillison.net/2025/Dec/19/andrej-karpathy/#atom-tag</id><summary type="html">
    &lt;blockquote cite="https://karpathy.bearblog.dev/year-in-review-2025/"&gt;&lt;p&gt;In 2025, Reinforcement Learning from Verifiable Rewards (RLVR) emerged as the de facto new major stage to add to this mix. By training LLMs against automatically verifiable rewards across a number of environments (e.g. think math/code puzzles), the LLMs spontaneously develop strategies that look like "reasoning" to humans - they learn to break down problem solving into intermediate calculations and they learn a number of problem solving strategies for going back and forth to figure things out (see DeepSeek R1 paper for examples).&lt;/p&gt;&lt;/blockquote&gt;
&lt;p class="cite"&gt;&amp;mdash; &lt;a href="https://karpathy.bearblog.dev/year-in-review-2025/"&gt;Andrej Karpathy&lt;/a&gt;, 2025 LLM Year in Review&lt;/p&gt;

    &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/definitions"&gt;definitions&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/ai"&gt;ai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/andrej-karpathy"&gt;andrej-karpathy&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/generative-ai"&gt;generative-ai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/llms"&gt;llms&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/llm-reasoning"&gt;llm-reasoning&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/deepseek"&gt;deepseek&lt;/a&gt;&lt;/p&gt;



</summary><category term="definitions"/><category term="ai"/><category term="andrej-karpathy"/><category term="generative-ai"/><category term="llms"/><category term="llm-reasoning"/><category term="deepseek"/></entry><entry><title>AoAH Day 15: Porting a complete HTML5 parser and browser test suite</title><link href="https://simonwillison.net/2025/Dec/17/vibespiling/#atom-tag" rel="alternate"/><published>2025-12-17T23:23:35+00:00</published><updated>2025-12-17T23:23:35+00:00</updated><id>https://simonwillison.net/2025/Dec/17/vibespiling/#atom-tag</id><summary type="html">
    
&lt;p&gt;&lt;strong&gt;&lt;a href="https://anil.recoil.org/notes/aoah-2025-15"&gt;AoAH Day 15: Porting a complete HTML5 parser and browser test suite&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;
Anil Madhavapeddy is running an &lt;a href="https://anil.recoil.org/notes/aoah-2025"&gt;Advent of Agentic Humps&lt;/a&gt; this year, building a new useful OCaml library every day for most of December.&lt;/p&gt;
&lt;p&gt;Inspired by Emil Stenström's &lt;a href="https://simonwillison.net/2025/Dec/14/justhtml/"&gt;JustHTML&lt;/a&gt; and my own coding agent &lt;a href="https://simonwillison.net/2025/Dec/15/porting-justhtml/"&gt;port of that to JavaScript&lt;/a&gt; he coined the term &lt;strong&gt;vibespiling&lt;/strong&gt; for AI-powered porting and transpiling of code from one language to another and had a go at building an HTML5 parser in OCaml, resulting in &lt;a href="https://tangled.org/anil.recoil.org/ocaml-html5rw"&gt;html5rw&lt;/a&gt; which passes the same &lt;a href="https://github.com/html5lib/html5lib-tests"&gt;html5lib-tests&lt;/a&gt; suite that Emil and myself used for our projects.&lt;/p&gt;
&lt;p&gt;Anil's thoughts on the copyright and ethical aspects of this are worth quoting in full:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;The question of copyright and licensing is difficult. I definitely did &lt;em&gt;some&lt;/em&gt; editing by hand, and a fair bit of prompting that resulted in targeted code edits, but the vast amount of architectural logic came from JustHTML. So I opted to make the &lt;a href="https://tangled.org/anil.recoil.org/ocaml-html5rw/blob/main/LICENSE.md"&gt;LICENSE a joint one&lt;/a&gt; with &lt;a href="https://friendlybit.com"&gt;Emil Stenström&lt;/a&gt;. I did not follow the transitive dependency through to the Rust one, which I probably should.&lt;/p&gt;
&lt;p&gt;I'm also extremely uncertain about every releasing this library to the central opam repository, especially as there are &lt;a href="https://github.com/aantron/lambdasoup"&gt;excellent HTML5 parsers&lt;/a&gt; already available. I haven't checked if those pass the HTML5 test suite, because this is wandering into the agents &lt;em&gt;vs&lt;/em&gt; humans territory that I ruled out in my &lt;a href="https://anil.recoil.org/notes/aoah-2025#groundrules-for-the-advent-of-agentic-humps"&gt;groundrules&lt;/a&gt;. Whether or not this agentic code is better or not is a moot point if releasing it drives away the human maintainers who are the source of creativity in the code!&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;I decided to &lt;a href="https://github.com/simonw/justjshtml/commit/106289acee29045cc5afe9732915357063dfc37a"&gt;credit Emil in the same way&lt;/a&gt; for my own vibespiled project.

    &lt;p&gt;&lt;small&gt;&lt;/small&gt;Via &lt;a href="https://twitter.com/avsm/status/2000979482744607216"&gt;@avsm&lt;/a&gt;&lt;/small&gt;&lt;/p&gt;


    &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/definitions"&gt;definitions&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/functional-programming"&gt;functional-programming&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/ai"&gt;ai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/generative-ai"&gt;generative-ai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/llms"&gt;llms&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/ai-assisted-programming"&gt;ai-assisted-programming&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/ai-ethics"&gt;ai-ethics&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/vibe-coding"&gt;vibe-coding&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/ocaml"&gt;ocaml&lt;/a&gt;&lt;/p&gt;



</summary><category term="definitions"/><category term="functional-programming"/><category term="ai"/><category term="generative-ai"/><category term="llms"/><category term="ai-assisted-programming"/><category term="ai-ethics"/><category term="vibe-coding"/><category term="ocaml"/></entry><entry><title>2025 Word of the Year: Slop</title><link href="https://simonwillison.net/2025/Dec/15/2025-word-of-the-year-slop/#atom-tag" rel="alternate"/><published>2025-12-15T17:27:59+00:00</published><updated>2025-12-15T17:27:59+00:00</updated><id>https://simonwillison.net/2025/Dec/15/2025-word-of-the-year-slop/#atom-tag</id><summary type="html">
    
&lt;p&gt;&lt;strong&gt;&lt;a href="https://www.merriam-webster.com/wordplay/word-of-the-year"&gt;2025 Word of the Year: Slop&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;
Slop lost to "brain rot" for &lt;a href="https://simonwillison.net/2024/Nov/15/slop-word-of-the-year/"&gt;Oxford Word of the Year 2024&lt;/a&gt; but it's finally made it this year thanks to Merriam-Webster!&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;Merriam-Webster’s human editors have chosen slop as the 2025 Word of the Year. We define slop as “digital content of low quality that is produced usually in quantity by means of artificial intelligence.”&lt;/p&gt;
&lt;/blockquote&gt;


    &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/definitions"&gt;definitions&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/ai"&gt;ai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/generative-ai"&gt;generative-ai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/slop"&gt;slop&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/ai-ethics"&gt;ai-ethics&lt;/a&gt;&lt;/p&gt;



</summary><category term="definitions"/><category term="ai"/><category term="generative-ai"/><category term="slop"/><category term="ai-ethics"/></entry><entry><title>Useful patterns for building HTML tools</title><link href="https://simonwillison.net/2025/Dec/10/html-tools/#atom-tag" rel="alternate"/><published>2025-12-10T21:00:59+00:00</published><updated>2025-12-10T21:00:59+00:00</updated><id>https://simonwillison.net/2025/Dec/10/html-tools/#atom-tag</id><summary type="html">
    &lt;p&gt;I've started using the term &lt;strong&gt;HTML tools&lt;/strong&gt; to refer to HTML applications that I've been building which combine HTML, JavaScript, and CSS in a single file and use them to provide useful functionality. I have built &lt;a href="https://tools.simonwillison.net/"&gt;over 150 of these&lt;/a&gt; in the past two years, almost all of them written by LLMs. This article presents a collection of useful patterns I've discovered along the way.&lt;/p&gt;
&lt;p&gt;First, some examples to show the kind of thing I'm talking about:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;&lt;a href="https://tools.simonwillison.net/svg-render?url=https://gist.githubusercontent.com/simonw/aedecb93564af13ac1596810d40cac3c/raw/83e7f3be5b65bba61124684700fa7925d37c36c3/tiger.svg"&gt;svg-render&lt;/a&gt;&lt;/strong&gt; renders SVG code to downloadable JPEGs or PNGs&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;a href="https://tools.simonwillison.net/pypi-changelog?package=llm&amp;amp;compare=0.27...0.27.1"&gt;pypi-changelog&lt;/a&gt;&lt;/strong&gt; lets you generate (and copy to clipboard) diffs between different PyPI package releases.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;a href="https://tools.simonwillison.net/bluesky-thread?url=https%3A%2F%2Fbsky.app%2Fprofile%2Fsimonwillison.net%2Fpost%2F3m7gzjew3ss2e&amp;amp;view=thread"&gt;bluesky-thread&lt;/a&gt;&lt;/strong&gt; provides a nested view of a discussion thread on Bluesky.&lt;/li&gt;
&lt;/ul&gt;
&lt;div style="display: flex; width: 100%; gap: 20px; margin-bottom: 1em;"&gt;
  &lt;a href="https://tools.simonwillison.net/svg-render?url=https://gist.githubusercontent.com/simonw/aedecb93564af13ac1596810d40cac3c/raw/83e7f3be5b65bba61124684700fa7925d37c36c3/tiger.svg" style="flex: 1; width: 30%; border: none;"&gt;&lt;img src="https://static.simonwillison.net/static/2025/html-tools/svg-render.jpg" loading="lazy" style="width: 100%; height: auto; object-fit: cover;" alt="screenshot of svg-render" /&gt;&lt;/a&gt;
  &lt;a href="https://tools.simonwillison.net/pypi-changelog?package=llm&amp;amp;compare=0.27...0.27.1" style="flex: 1; width: 30%; border: none;"&gt;&lt;img src="https://static.simonwillison.net/static/2025/html-tools/pypi-changelog.jpg" loading="lazy" style="width: 100%; height: auto; object-fit: cover;" alt="screenshot of pypi-changelog" /&gt;&lt;/a&gt;
  &lt;a href="https://tools.simonwillison.net/bluesky-thread?url=https%3A%2F%2Fbsky.app%2Fprofile%2Fsimonwillison.net%2Fpost%2F3m7gzjew3ss2e&amp;amp;view=thread" style="flex: 1; width: 30%; border: none;"&gt;&lt;img src="https://static.simonwillison.net/static/2025/html-tools/bluesky-thread.jpg" loading="lazy" style="width: 100%; height: auto; object-fit: cover;" alt="screenshot of bluesky-thread" /&gt;&lt;/a&gt;
&lt;/div&gt;
&lt;p&gt;These are some of my recent favorites. I have dozens more like this that I use on a regular basis.&lt;/p&gt;
&lt;p&gt;You can explore my collection on &lt;strong&gt;&lt;a href="https://tools.simonwillison.net/"&gt;tools.simonwillison.net&lt;/a&gt;&lt;/strong&gt; - the &lt;a href="https://tools.simonwillison.net/by-month"&gt;by month&lt;/a&gt; view is useful for browsing the entire collection.&lt;/p&gt;
&lt;p&gt;If you want to see the code and prompts, almost all of the examples in this post include a link in their footer to "view source" on GitHub. The GitHub commits usually contain either the prompt itself or a link to the transcript used to create the tool.&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;&lt;a href="https://simonwillison.net/2025/Dec/10/html-tools/#the-anatomy-of-an-html-tool"&gt;The anatomy of an HTML tool&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href="https://simonwillison.net/2025/Dec/10/html-tools/#prototype-with-artifacts-or-canvas"&gt;Prototype with Artifacts or Canvas&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href="https://simonwillison.net/2025/Dec/10/html-tools/#switch-to-a-coding-agent-for-more-complex-projects"&gt;Switch to a coding agent for more complex projects&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href="https://simonwillison.net/2025/Dec/10/html-tools/#load-dependencies-from-cdns"&gt;Load dependencies from CDNs&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href="https://simonwillison.net/2025/Dec/10/html-tools/#host-them-somewhere-else"&gt;Host them somewhere else&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href="https://simonwillison.net/2025/Dec/10/html-tools/#take-advantage-of-copy-and-paste"&gt;Take advantage of copy and paste&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href="https://simonwillison.net/2025/Dec/10/html-tools/#build-debugging-tools"&gt;Build debugging tools&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href="https://simonwillison.net/2025/Dec/10/html-tools/#persist-state-in-the-url"&gt;Persist state in the URL&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href="https://simonwillison.net/2025/Dec/10/html-tools/#use-localstorage-for-secrets-or-larger-state"&gt;Use localStorage for secrets or larger state&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href="https://simonwillison.net/2025/Dec/10/html-tools/#collect-cors-enabled-apis"&gt;Collect CORS-enabled APIs&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href="https://simonwillison.net/2025/Dec/10/html-tools/#llms-can-be-called-directly-via-cors"&gt;LLMs can be called directly via CORS&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href="https://simonwillison.net/2025/Dec/10/html-tools/#don-t-be-afraid-of-opening-files"&gt;Don't be afraid of opening files&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href="https://simonwillison.net/2025/Dec/10/html-tools/#you-can-offer-downloadable-files-too"&gt;You can offer downloadable files too&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href="https://simonwillison.net/2025/Dec/10/html-tools/#pyodide-can-run-python-code-in-the-browser"&gt;Pyodide can run Python code in the browser&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href="https://simonwillison.net/2025/Dec/10/html-tools/#webassembly-opens-more-possibilities"&gt;WebAssembly opens more possibilities&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href="https://simonwillison.net/2025/Dec/10/html-tools/#remix-your-previous-tools"&gt;Remix your previous tools&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href="https://simonwillison.net/2025/Dec/10/html-tools/#record-the-prompt-and-transcript"&gt;Record the prompt and transcript&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href="https://simonwillison.net/2025/Dec/10/html-tools/#go-forth-and-build"&gt;Go forth and build&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h4 id="the-anatomy-of-an-html-tool"&gt;The anatomy of an HTML tool&lt;/h4&gt;
&lt;p&gt;These are the characteristics I have found to be most productive in building tools of this nature:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;A single file: inline JavaScript and CSS in a single HTML file means the least hassle in hosting or distributing them, and crucially means you can copy and paste them out of an LLM response.&lt;/li&gt;
&lt;li&gt;Avoid React, or anything with a build step. The problem with React is that JSX requires a build step, which makes everything massively less convenient. I prompt "no react" and skip that whole rabbit hole entirely.&lt;/li&gt;
&lt;li&gt;Load dependencies from a CDN. The fewer dependencies the better, but if there's a well known library that helps solve a problem I'm happy to load it from CDNjs or jsdelivr or similar.&lt;/li&gt;
&lt;li&gt;Keep them small. A few hundred lines means the maintainability of the code doesn't matter too much: any good LLM can read them and understand what they're doing, and rewriting them from scratch with help from an LLM takes just a few minutes.&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;The end result is a few hundred lines of code that can be cleanly copied and pasted into a GitHub repository.&lt;/p&gt;
&lt;h4 id="prototype-with-artifacts-or-canvas"&gt;Prototype with Artifacts or Canvas&lt;/h4&gt;
&lt;p&gt;The easiest way to build one of these tools is to start in ChatGPT or Claude or Gemini. All three have features where they can write a simple HTML+JavaScript application and show it to you directly.&lt;/p&gt;
&lt;p&gt;Claude calls this "Artifacts", ChatGPT and Gemini both call it "Canvas". Claude has the feature enabled by default, ChatGPT and Gemini may require you to toggle it on in their "tools" menus.&lt;/p&gt;
&lt;p&gt;Try this prompt in Gemini or ChatGPT:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;&lt;code&gt;Build a canvas that lets me paste in JSON and converts it to YAML. No React.&lt;/code&gt;&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;Or this prompt in Claude:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;&lt;code&gt;Build an artifact that lets me paste in JSON and converts it to YAML. No React.&lt;/code&gt;&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;I always add "No React" to these prompts, because otherwise they tend to build with React, resulting in a file that is harder to copy and paste out of the LLM and use elsewhere. I find that attempts which use React take longer to display (since they need to run a build step) and are more likely to contain crashing bugs for some reason, especially in ChatGPT.&lt;/p&gt;
&lt;p&gt;All three tools have "share" links that provide a URL to the finished application. Examples:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;a href="https://chatgpt.com/canvas/shared/6938e8ece53c8191a2f9d7dfcd090d11"&gt;ChatGPT JSON to YAML Canvas&lt;/a&gt; made with GPT-5.1 Thinking - here's &lt;a href="https://chatgpt.com/share/6938e926-ee14-8006-9678-383b3a8dac78"&gt;the full ChatGPT transcript&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://claude.ai/public/artifacts/61fdecb8-6e3b-4162-a5ab-6720dfe5ed19"&gt;Claude JSON to YAML Artifact&lt;/a&gt; made with Claude Opus 4.5 - here's &lt;a href="https://claude.ai/share/421bacb9-54b4-45b4-b41c-a436bc0ebd53"&gt;the full Claude transcript&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://gemini.google.com/share/03c1ac87aa40"&gt;Gemini JSON to YAML Canvas&lt;/a&gt; made with Gemini 3 Pro - here's &lt;a href="https://gemini.google.com/share/1e27a1d8cdca"&gt;the full Gemini transcript&lt;/a&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;h4 id="switch-to-a-coding-agent-for-more-complex-projects"&gt;Switch to a coding agent for more complex projects&lt;/h4&gt;
&lt;p&gt;Coding agents such as Claude Code and Codex CLI have the advantage that they can test the code themselves while they work on it using tools like Playwright. I often upgrade to one of those when I'm working on something more complicated, like my Bluesky thread viewer tool shown above.&lt;/p&gt;
&lt;p&gt;I also frequently use &lt;strong&gt;asynchronous coding agents&lt;/strong&gt; like Claude Code for web to make changes to existing tools. I shared a video about that in &lt;a href="https://simonwillison.net/2025/Oct/23/claude-code-for-web-video/"&gt;Building a tool to copy-paste share terminal sessions using Claude Code for web&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;Claude Code for web and Codex Cloud run directly against my &lt;a href="https://github.com/simonw/tools"&gt;simonw/tools&lt;/a&gt; repo, which means they can publish or upgrade tools via Pull Requests (here are &lt;a href="https://github.com/simonw/tools/pulls?q=is%3Apr+is%3Aclosed"&gt;dozens of examples&lt;/a&gt;) without me needing to copy and paste anything myself.&lt;/p&gt;
&lt;h4 id="load-dependencies-from-cdns"&gt;Load dependencies from CDNs&lt;/h4&gt;
&lt;p&gt;Any time I use an additional JavaScript library as part of my tool I like to load it from a CDN.&lt;/p&gt;
&lt;p&gt;The three major LLM platforms support specific CDNs as part of their Artifacts or Canvas features, so often if you tell them "Use PDF.js" or similar they'll be able to compose a URL to a CDN that's on their allow-list.&lt;/p&gt;
&lt;p&gt;Sometimes you'll need to go and look up the URL on &lt;a href="https://cdnjs.com/"&gt;cdnjs&lt;/a&gt; or &lt;a href="https://www.jsdelivr.com/"&gt;jsDelivr&lt;/a&gt; and paste it into the chat.&lt;/p&gt;
&lt;p&gt;CDNs like these have been around for long enough that I've grown to trust them, especially for URLs that include the package version.&lt;/p&gt;
&lt;p&gt;The alternative to CDNs is to use npm and have a build step for your projects. I find this reduces my productivity at hacking on individual tools and makes it harder to self-host them.&lt;/p&gt;
&lt;h4 id="host-them-somewhere-else"&gt;Host them somewhere else&lt;/h4&gt;
&lt;p&gt;I don't like leaving my HTML tools hosted by the LLM platforms themselves for a couple of reasons. First, LLM platforms tend to run the tools inside a tight sandbox with a lot of restrictions. They're often unable to load data or images from external URLs, and sometimes even features like linking out to other sites are disabled.&lt;/p&gt;
&lt;p&gt;The end-user experience often isn't great either. They show warning messages to new users, often take additional time to load and delight in showing promotions for the platform that was used to create the tool.&lt;/p&gt;
&lt;p&gt;They're also not as reliable as other forms of static hosting. If ChatGPT or Claude are having an outage I'd like to still be able to access the tools I've created in the past.&lt;/p&gt;
&lt;p&gt;Being able to easily self-host is the main reason I like insisting on "no React" and using CDNs for dependencies - the absence of a build step makes hosting tools elsewhere a simple case of copying and pasting them out to some other provider.&lt;/p&gt;
&lt;p&gt;My preferred provider here is &lt;a href="https://docs.github.com/en/pages"&gt;GitHub Pages&lt;/a&gt; because I can paste a block of HTML into a file on github.com and have it hosted on a permanent URL a few seconds later. Most of my tools end up in my &lt;a href="https://github.com/simonw/tools"&gt;simonw/tools&lt;/a&gt; repository which is configured to serve static files at &lt;a href="https://tools.simonwillison.net/"&gt;tools.simonwillison.net&lt;/a&gt;.&lt;/p&gt;
&lt;h4 id="take-advantage-of-copy-and-paste"&gt;Take advantage of copy and paste&lt;/h4&gt;
&lt;p&gt;One of the most useful input/output mechanisms for HTML tools comes in the form of &lt;strong&gt;copy and paste&lt;/strong&gt;.&lt;/p&gt;
&lt;p&gt;I frequently build tools that accept pasted content, transform it in some way and let the user copy it back to their clipboard to paste somewhere else.&lt;/p&gt;
&lt;p&gt;Copy and paste on mobile phones is fiddly, so I frequently include "Copy to clipboard" buttons that populate the clipboard with a single touch.&lt;/p&gt;
&lt;p&gt;Most operating system clipboards can carry multiple formats of the same copied data. That's why you can paste content from a word processor in a way that preserves formatting, but if you paste the same thing into a text editor you'll get the content with formatting stripped.&lt;/p&gt;
&lt;p&gt;These rich copy operations are available in JavaScript paste events as well, which opens up all sorts of opportunities for HTML tools.&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;&lt;a href="https://tools.simonwillison.net/hacker-news-thread-export"&gt;hacker-news-thread-export&lt;/a&gt;&lt;/strong&gt; lets you paste in a URL to a Hacker News thread and gives you a copyable condensed version of the entire thread, suitable for pasting into an LLM to get a useful summary.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;a href="https://tools.simonwillison.net/paste-rich-text"&gt;paste-rich-text&lt;/a&gt;&lt;/strong&gt; lets you copy from a page and paste to get the HTML - particularly useful on mobile where view-source isn't available.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;a href="https://tools.simonwillison.net/alt-text-extractor"&gt;alt-text-extractor&lt;/a&gt;&lt;/strong&gt; lets you paste in images and then copy out their alt text.&lt;/li&gt;
&lt;/ul&gt;
&lt;div style="display: flex; width: 100%; gap: 20px; margin-bottom: 1em;"&gt;
  &lt;a href="https://tools.simonwillison.net/hacker-news-thread-export" style="flex: 1; width: 30%; border: none;"&gt;&lt;img src="https://static.simonwillison.net/static/2025/html-tools/hacker-news-thread-export.jpg" loading="lazy" style="width: 100%; height: auto; object-fit: cover;" alt="screenshot of hacker-news-thread-export" /&gt;&lt;/a&gt;
  &lt;a href="https://tools.simonwillison.net/paste-rich-text" style="flex: 1; width: 30%; border: none;"&gt;&lt;img src="https://static.simonwillison.net/static/2025/html-tools/paste-rich-text.jpg" loading="lazy" style="width: 100%; height: auto; object-fit: cover;" alt="screenshot of paste-rich-text" /&gt;&lt;/a&gt;
  &lt;a href="https://tools.simonwillison.net/alt-text-extractor" style="flex: 1; width: 30%; border: none;"&gt;&lt;img src="https://static.simonwillison.net/static/2025/html-tools/alt-text-extractor.jpg" loading="lazy" style="width: 100%; height: auto; object-fit: cover;" alt="screenshot of alt-text-extractor" /&gt;&lt;/a&gt;
&lt;/div&gt;
&lt;h4 id="build-debugging-tools"&gt;Build debugging tools&lt;/h4&gt;
&lt;p&gt;The key to building interesting HTML tools is understanding what's possible. Building custom debugging tools is a great way to explore these options.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;&lt;a href="https://tools.simonwillison.net/clipboard-viewer"&gt;clipboard-viewer&lt;/a&gt;&lt;/strong&gt; is one of my most useful. You can paste anything into it (text, rich text, images, files) and it will loop through and show you every type of paste data that's available on the clipboard.&lt;/p&gt;
&lt;p&gt;&lt;img src="https://static.simonwillison.net/static/2025/clipboard-viewer.jpg" alt="Clipboard Format Viewer. Paste anywhere on the page (Ctrl+V or Cmd+V). This shows text/rtf with a bunch of weird code, text/plain with some pasted HTML diff and a Clipboard Event Information panel that says Event type: paste, Formats available: text/rtf, text/plain, 0 files reported and 2 clipboard items reported." style="max-width: 100%;" /&gt;&lt;/p&gt;
&lt;p&gt;This was key to building many of my other tools, because it showed me the invisible data that I could use to bootstrap other interesting pieces of functionality.&lt;/p&gt;
&lt;p&gt;More debugging examples:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;&lt;a href="https://tools.simonwillison.net/keyboard-debug"&gt;keyboard-debug&lt;/a&gt;&lt;/strong&gt; shows the keys (and &lt;code&gt;KeyCode&lt;/code&gt; values) currently being held down.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;a href="https://tools.simonwillison.net/cors-fetch"&gt;cors-fetch&lt;/a&gt;&lt;/strong&gt; reveals if a URL can be accessed via CORS.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;a href="https://tools.simonwillison.net/exif"&gt;exif&lt;/a&gt;&lt;/strong&gt; displays EXIF data for a selected photo.&lt;/li&gt;
&lt;/ul&gt;
&lt;div style="display: flex; width: 100%; gap: 20px; margin-bottom: 1em;"&gt;
  &lt;a href="https://tools.simonwillison.net/keyboard-debug" style="flex: 1; width: 30%; border: none;"&gt;&lt;img src="https://static.simonwillison.net/static/2025/html-tools/keyboard-debug.jpg" loading="lazy" style="width: 100%; height: auto; object-fit: cover;" alt="screenshot of keyboard-debug" /&gt;&lt;/a&gt;
  &lt;a href="https://tools.simonwillison.net/cors-fetch" style="flex: 1; width: 30%; border: none;"&gt;&lt;img src="https://static.simonwillison.net/static/2025/html-tools/cors-fetch.jpg" loading="lazy" style="width: 100%; height: auto; object-fit: cover;" alt="screenshot of cors-fetch" /&gt;&lt;/a&gt;
  &lt;a href="https://tools.simonwillison.net/exif" style="flex: 1; width: 30%; border: none;"&gt;&lt;img src="https://static.simonwillison.net/static/2025/html-tools/exif.jpg" loading="lazy" style="width: 100%; height: auto; object-fit: cover;" alt="screenshot of exif" /&gt;&lt;/a&gt;
&lt;/div&gt;
&lt;h4 id="persist-state-in-the-url"&gt;Persist state in the URL&lt;/h4&gt;
&lt;p&gt;HTML tools may not have access to server-side databases for storage but it turns out you can store a &lt;em&gt;lot&lt;/em&gt; of state directly in the URL.&lt;/p&gt;
&lt;p&gt;I like this for tools I may want to bookmark or share with other people.&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;&lt;a href="https://tools.simonwillison.net/icon-editor#cmdiKDIwMSwgNDYsIDg2KSxyZ2IoMjIzLCA0OCwgOTIpLHJnYigzNCwgODAsIDE3OSkscmdiKDIzNywgNTYsIDk1KSxyZ2IoMTgzLCA1MywgOTYpLHJnYigzOCwgMTA3LCAyMTApLHJnYigyMDQsIDY1LCAxMDUpLHJnYigxNzksIDEwMywgMTM2KSxyZ2IoMjMyLCA5NywgMTQ4KSxyZ2IoMzgsIDkxLCAyMDkpLHJnYigzNiwgOTUsIDIwNCkscmdiKDE5NSwgODYsIDEyOSkscmdiKDE3MywgMzEsIDU4KSxyZ2IoMjEyLCA2MSwgMTA2KSxyZ2IoOTIsIDEwNSwgMTg4KSxyZ2IoMjM3LCA3MSwgMTIzKSxyZ2IoMzksIDk2LCAyMTkpLHJnYigyOCwgODYsIDIxMCkscmdiKDIyMywgMjEyLCAzNCkscmdiKDE3MywgMTUzLCAyNikscmdiKDE0NCwgNzksIDI4KSxyZ2IoMjI0LCA1NiwgOTcpLHJnYigxOTYsIDQ4LCA4NSkscmdiKDIyMCwgNTAsIDk4KSxyZ2IoMTY2LCAxMjYsIDI1KSxyZ2IoMjA5LCAxMzAsIDE5KSxyZ2IoMTg3LCAxMTQsIDEzKSxyZ2IoMTQ3LCAxMDQsIDE4KSxyZ2IoMjE2LCA1OCwgODEpLHJnYigxNTIsIDM5LCA2NCkscmdiKDMyLCA3NSwgMTczKSxyZ2IoMTY2LCAxMjYsIDI5KSxyZ2IoMjM3LCAxODAsIDU0KSxyZ2IoMjA0LCAxMzgsIDIyKSxyZ2IoMTgxLCAxMjksIDIzKSxyZ2IoMjM0LCA4NiwgNzYpLHJnYigxOTAsIDY4LCA3NSkscmdiKDI0NSwgODksIDEzNSkscmdiKDIxMywgNjcsIDExMSkscmdiKDE0MSwgMzEsIDU2KSxyZ2IoNzIsIDc5LCAxMTYpLHJnYigxODcsIDE1NCwgNTIpLHJnYigyMDcsIDE3OSwgNzIpLHJnYigyMTAsIDE2MiwgNDMpLHJnYigyMTQsIDE0OSwgMzEpLHJnYigyMzksIDkwLCA4NCkscmdiKDIzNSwgMTMyLCA3NykscmdiKDE4MSwgMTM4LCAyOSkscmdiKDI0NSwgMTI4LCAxNzgpLHJnYigyMTcsIDk5LCAxNDUpLHJnYigxMTYsIDEwNSwgMTIyKSxyZ2IoMjA2LCAxNzYsIDY1KSxyZ2IoMTkxLCAxNjMsIDY0KSxyZ2IoMjA1LCAxNjksIDU4KSxyZ2IoMjM2LCAxNjUsIDQ2KSxyZ2IoMjM3LCA3OSwgODUpLHJnYigyMzUsIDE0NCwgODcpLHJnYigyNDksIDIwMiwgNDUpLHJnYigyMTAsIDE2NiwgMzQpLHJnYigyMjcsIDEwMywgMTYyKSxyZ2IoMjEzLCA5MCwgMTMwKSxyZ2IoNDQsIDQ4LCAxMjMpLHJnYigxMjUsIDg2LCAxNTEpLHJnYigxOTAsIDE2MywgMTA2KSxyZ2IoMTk5LCAxNjYsIDQ4KSxyZ2IoMjAyLCAxNjQsIDU2KSxyZ2IoMjIxLCAxNzAsIDUzKSxyZ2IoMjM0LCAxMzUsIDc1KSxyZ2IoMjQxLCAxNzUsIDc1KSxyZ2IoMjU1LCAyMjIsIDY1KSxyZ2IoMjU0LCAyMjYsIDY5KSxyZ2IoMjM1LCAyMDEsIDQ0KSxyZ2IoNzMsIDEzNywgMjQ3KSxyZ2IoODAsIDE0MywgMjQ4KSxyZ2IoNzksIDEzOSwgMjQzKSxyZ2IoMTM4LCA5MiwgMTc0KSxyZ2IoMTU2LCAxMTMsIDE3NikscmdiKDIwMSwgMTY4LCA2MykscmdiKDIxMSwgMTY5LCA0NikscmdiKDIxNCwgMTcxLCA1NSkscmdiKDIyOCwgMTgyLCA1NikscmdiKDI0MywgMTk1LCA1OCkscmdiKDI0NSwgMjA0LCA2NykscmdiKDI1NSwgMjIxLCA2NykscmdiKDI1NSwgMjI2LCA2OCkscmdiKDE1NCwgMTYyLCAxMzMpLHJnYigyNiwgMTA1LCAyNTUpLHJnYig2OCwgMTI5LCAyNTIpLHJnYig4NywgMTM1LCAyNDQpLHJnYig4MywgMTMxLCAyMzUpLHJnYig4MiwgMTI3LCAyMjYpLHJnYig4NSwgMTMwLCAyMjcpLHJnYig3OSwgMTIyLCAyMTgpLHJnYigxNjcsIDE0NiwgMzIpLHJnYigxNzQsIDEzOCwgMTI0KSxyZ2IoMTMzLCA2OSwgMjA1KSxyZ2IoMTcxLCAxMjAsIDE0NCkscmdiKDIxNSwgMTc2LCA1NykscmdiKDIyMCwgMTc1LCA0OSkscmdiKDIyMywgMTc5LCA1OCkscmdiKDIzNywgMTg4LCA2MCkscmdiKDI0MSwgMTkxLCA1NikscmdiKDIwMCwgMTc2LCAxMDUpLHJnYigxMTIsIDE0MSwgMjAzKSxyZ2IoODQsIDEyNywgMjM1KSxyZ2IoMTE1LCAxMzgsIDE5MSkscmdiKDgyLCAxMDMsIDE3NCkscmdiKDE1OCwgNDEsIDc2KSxyZ2IoMTcwLCA0MywgNjQpLHJnYigxOTAsIDE1NywgNTApLHJnYigyMDMsIDE3NywgNjUpLHJnYigxNjEsIDEwMiwgMTQyKSxyZ2IoMTQxLCA1OSwgMjA5KSxyZ2IoMTgwLCAxMjIsIDE1MSkscmdiKDIyOCwgMTg1LCA1OCkscmdiKDIzMywgMTg2LCA1MikscmdiKDI0MCwgMTg5LCA2NikscmdiKDI1NCwgMjEwLCA2OCkscmdiKDIwMSwgMTkxLCAxMTMpLHJnYigxMzcsIDEzOSwgMTU3KSxyZ2IoMjExLCAxNjIsIDg4KSxyZ2IoMjUwLCAyMDAsIDUwKSxyZ2IoMTc5LCAxMzEsIDIzKSxyZ2IoMTk2LCAxNjUsIDY0KSxyZ2IoMjA1LCAxNzQsIDU0KSxyZ2IoMjA5LCAxNjAsIDU5KSxyZ2IoMTY2LCA5MSwgMTYxKSxyZ2IoMTQyLCA2MCwgMjIzKSxyZ2IoMTk3LCAxMzksIDE1MCkscmdiKDI0MCwgMTk2LCA3MikscmdiKDI1MSwgMjA4LCA2MSkscmdiKDI1NSwgMjI0LCA4MCkscmdiKDI1NSwgMjUwLCA5MikscmdiKDI1NSwgMjM0LCA4OSkscmdiKDI0OSwgMTg2LCA1MSkscmdiKDI1MCwgMTgwLCAzOSkscmdiKDI0MCwgMTY2LCAzNSkscmdiKDIwMiwgMTc0LCA3MikscmdiKDIxNSwgMTY4LCA1MCkscmdiKDIyMiwgMTc1LCA0MykscmdiKDIxMiwgMTY1LCA2OSkscmdiKDE3NCwgMTAzLCAxNjcpLHJnYigxNjAsIDc4LCAyMzQpLHJnYigyMDUsIDE0NiwgMTg0KSxyZ2IoMjQ3LCAyMTgsIDEwOCkscmdiKDI1NSwgMjQ4LCA4NSkscmdiKDI1NSwgMjU1LCAxMDIpLHJnYigyNTUsIDI1NSwgMTIyKSxyZ2IoMjQwLCAyMTAsIDgyKSxyZ2IoMjE0LCAxNTAsIDMxKSxyZ2IoMjI0LCAxNTAsIDI1KSxyZ2IoMTc2LCAxMjEsIDI1KSxyZ2IoMTg5LCAxODMsIDUyKSxyZ2IoMTIyLCA4MCwgMTU4KSxyZ2IoMTkxLCAxNTEsIDEyMikscmdiKDIyOSwgMTc0LCA0MCkscmdiKDIyNSwgMTcyLCA1MSkscmdiKDIyOSwgMTg1LCA1MSkscmdiKDIzNywgMTkwLCA2MCkscmdiKDIwOSwgMTQ2LCAxNjEpLHJnYigxOTUsIDExNywgMjUxKSxyZ2IoMjI1LCAxNTUsIDIzOSkscmdiKDI1NCwgMjI3LCAxODQpLHJnYigyNTUsIDI1NSwgMTE3KSxyZ2IoMjQ5LCAyMzcsIDc2KSxyZ2IoMjA0LCAxNjcsIDU1KSxyZ2IoMTU3LCAxMTUsIDI1KSxyZ2IoMTM1LCA5OCwgMTYpLHJnYigyMDMsIDEyNSwgNTcpLHJnYigxOTgsIDEyNSwgNTMpLHJnYigxNTcsIDExMCwgMTQ0KSxyZ2IoMTQ5LCA4NCwgMTk0KSxyZ2IoMjEyLCAxNTcsIDk0KSxyZ2IoMjMyLCAxODUsIDQ3KSxyZ2IoMjM1LCAxODYsIDYyKSxyZ2IoMjUwLCAyMDQsIDY1KSxyZ2IoMjUzLCAyMzIsIDgxKSxyZ2IoMjQzLCAyMTUsIDE0OCkscmdiKDI0NywgMTgzLCAyMzMpLHJnYigyNDMsIDE2MywgMjUwKSxyZ2IoMTk4LCAxMzgsIDE3NSkscmdiKDE2MCwgMTEzLCA4MikscmdiKDEyNCwgODksIDM3KSxyZ2IoMTU3LCAxMzYsIDM2KSxyZ2IoMjAzLCAxNjQsIDgyKSxyZ2IoMTQ4LCA3MiwgMTg5KSxyZ2IoMTU4LCA4NCwgMjA0KSxyZ2IoMjE3LCAxNjgsIDExNykscmdiKDI1MCwgMjEwLCA2NykscmdiKDI1NSwgMjI5LCA3OCkscmdiKDI1NSwgMjU1LCA5NikscmdiKDI1NSwgMjU1LCA5NCkscmdiKDI0MywgMjE4LCA5NSkscmdiKDE3OCwgMTE4LCAxMDYpLHJnYigxMDMsIDQwLCAxMDIpLHJnYigxODgsIDExMSwgMjcpLHJnYigxODMsIDE1NiwgNTkpLHJnYigyMTUsIDE3NiwgNDgpLHJnYigyMDMsIDE0OCwgOTEpLHJnYigxNjcsIDg5LCAxOTcpLHJnYigxNzgsIDEwMywgMjM1KSxyZ2IoMjM1LCAxOTMsIDE3NSkscmdiKDI1NSwgMjUxLCAxMjQpLHJnYigyNDksIDI0MCwgOTIpLHJnYigyMTMsIDE4NiwgNjApLHJnYigxNjAsIDEyMSwgMjEpLHJnYigxOTEsIDE1NSwgMTA4KSxyZ2IoMjIxLCAxODAsIDQwKSxyZ2IoMjM3LCAxODksIDQ3KSxyZ2IoMjMzLCAxODYsIDk2KSxyZ2IoMjE5LCAxNjIsIDIwNykscmdiKDIzMSwgMTU5LCAyNDkpLHJnYigyMTAsIDE1OCwgMTkxKSxyZ2IoMTY5LCAxMzAsIDc1KSxyZ2IoMTQwLCA5NiwgMTE5KSxyZ2IoMTU1LCA4NSwgMjAwKSxyZ2IoMjA5LCAxNTcsIDExNSkscmdiKDI1NCwgMjI2LCA3MCkscmdiKDI1NSwgMjU1LCA4MCkscmdiKDIzNSwgMjE3LCA3NikscmdiKDE3OCwgMTMzLCA5MSkscmdiKDIwOSwgMTEwLCAxNTEpLHJnYigxNTIsIDExOCwgNTYpLHJnYigxODYsIDExNiwgMTY4KSxyZ2IoMTkzLCAxMjEsIDIzNikscmdiKDIyOSwgMTk1LCAxNjEpLHJnYigxOTcsIDE4MCwgNzUpLHJnYigxOTksIDE1OCwgNzApLHJnYigxOTcsIDE0OCwgMTM2KXxfX19fX19fXzAxX19fX19fX19fX19fX19fMl9fX19fX18zNDVfX19fX182X183OF9fOWFfX19fX2JjZGVfX19fX19fX19fZl9fX2doX2lqa19fbF9fX19fX19fbV9uX19fX19fX19vcHFyc19fX19fX19fdF9fX19fX3VfX192d3h5ejEwX19fMTExMl9fMTNfX19fX19fX18xNDE1MTYxNzE4MTkxYTFiX18xYzFkX19fX19fX19fX19fMWUxZjFnMWgxaTFqMWsxbDFtXzFuMW9fX19fX19fX19fXzFwMXExcjFzMXQxdTF2MXcxeDF5MXpfX19fXzIwMjEyMl9fX19fXzIzMjQyNTI2MjcyODI5MmEyYjJjMmQyZTJmMmcyaDJpMmoya19fX19fMmwybTJuMm8ycDJxMnIyczJ0MnUydjJ3MngyeV9fX19fX19fMnozMDMxMzIzMzM0MzUzNjM3MzgzOTNhM2IzYzNkM2VfX19fX19fX19fM2YzZzNoM2kzajNrM2wzbTNuM28zcDNxM3Izc19fX19fX19fX18zdDN1M3YzdzN4M3kzejQwNDE0MjQzNDQ0NTQ2NDc0OF9fX19fX180OTRhNGI0YzRkNGU0ZjRnNGg0aTRqNGs0bDRtNG5fX180bzRwX19fXzRxNHI0czR0NHU0djR3NHg0eTR6NTA1MTUyX19fX19fX19fXzUzNTQ1NTU2NTc1ODU5NWE1YjVjNWQ1ZV9fX19fXzVmX19fX181ZzVoNWk1ajVrNWw1bTVuNW81cF9fX19fX19fX19fX19fNXE1cjVzNXQ1dTV2NXc1eF9fX19fX19fX19fX19fXzV5NXo2MDYxNjI2MzY0X19fX19fX19fX19fNjVfX19fNjY2NzY4Njk2YV9fX19fX19fX19fX19fX19fX19fNmI2Y19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19f"&gt;icon-editor&lt;/a&gt;&lt;/strong&gt; is a custom 24x24 icon editor I built to help hack on icons for &lt;a href="https://simonwillison.net/2025/Oct/28/github-universe-badge/"&gt;the GitHub Universe badge&lt;/a&gt;. It persists your in-progress icon design in the URL so you can easily bookmark and share it.&lt;/li&gt;
&lt;/ul&gt;
&lt;h4 id="use-localstorage-for-secrets-or-larger-state"&gt;Use localStorage for secrets or larger state&lt;/h4&gt;
&lt;p&gt;The &lt;a href="https://developer.mozilla.org/en-US/docs/Web/API/Window/localStorage"&gt;localStorage&lt;/a&gt; browser API lets HTML tools store data persistently on the user's device, without exposing that data to the server.&lt;/p&gt;
&lt;p&gt;I use this for larger pieces of state that don't fit comfortably in a URL, or for secrets like API keys which I really don't want anywhere near my server  - even static hosts might have server logs that are outside of my influence.&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;&lt;a href="https://tools.simonwillison.net/word-counter"&gt;word-counter&lt;/a&gt;&lt;/strong&gt; is a simple tool I built to help me write to specific word counts, for things like conference abstract submissions. It uses localStorage to save as you type, so your work isn't lost if you accidentally close the tab.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;a href="https://tools.simonwillison.net/render-markdown"&gt;render-markdown&lt;/a&gt;&lt;/strong&gt; uses the same trick - I sometimes use this one to craft blog posts and I don't want to lose them.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;a href="https://tools.simonwillison.net/haiku"&gt;haiku&lt;/a&gt;&lt;/strong&gt; is one of a number of LLM demos I've built that request an API key from the user (via the &lt;code&gt;prompt()&lt;/code&gt; function) and then store that in &lt;code&gt;localStorage&lt;/code&gt;. This one uses Claude Haiku to write haikus about what it can see through the user's webcam.&lt;/li&gt;
&lt;/ul&gt;
&lt;div style="display: flex; width: 100%; gap: 20px; margin-bottom: 1em;"&gt;
  &lt;a href="https://tools.simonwillison.net/word-counter" style="flex: 1; width: 30%; border: none;"&gt;&lt;img src="https://static.simonwillison.net/static/2025/html-tools/word-counter.jpg" loading="lazy" style="width: 100%; height: auto; object-fit: cover;" alt="screenshot of word-counter" /&gt;&lt;/a&gt;
  &lt;a href="https://tools.simonwillison.net/render-markdown" style="flex: 1; width: 30%; border: none;"&gt;&lt;img src="https://static.simonwillison.net/static/2025/html-tools/render-markdown.jpg" loading="lazy" style="width: 100%; height: auto; object-fit: cover;" alt="screenshot of render-markdown" /&gt;&lt;/a&gt;
  &lt;a href="https://tools.simonwillison.net/haiku" style="flex: 1; width: 30%; border: none;"&gt;&lt;img src="https://static.simonwillison.net/static/2025/html-tools/haiku.jpg" loading="lazy" style="width: 100%; height: auto; object-fit: cover;" alt="screenshot of haiku" /&gt;&lt;/a&gt;
&lt;/div&gt;
&lt;h4 id="collect-cors-enabled-apis"&gt;Collect CORS-enabled APIs&lt;/h4&gt;
&lt;p&gt;CORS stands for &lt;a href="https://en.wikipedia.org/wiki/Cross-origin_resource_sharing"&gt;Cross-origin resource sharing&lt;/a&gt;. It's a relatively low-level detail which controls if JavaScript running on one site is able to fetch data from APIs hosted on other domains.&lt;/p&gt;
&lt;p&gt;APIs that provide open CORS headers are a goldmine for HTML tools. It's worth building a collection of these over time.&lt;/p&gt;
&lt;p&gt;Here are some I like:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;iNaturalist for fetching sightings of animals, including URLs to photos&lt;/li&gt;
&lt;li&gt;PyPI for fetching details of Python packages&lt;/li&gt;
&lt;li&gt;GitHub because anything in a public repository in GitHub has a CORS-enabled anonymous API for fetching that content from the raw.githubusercontent.com domain, which is behind a caching CDN so you don't need to worry too much about rate limits or feel guilty about adding load to their infrastructure.&lt;/li&gt;
&lt;li&gt;Bluesky for all sorts of operations&lt;/li&gt;
&lt;li&gt;Mastodon has generous CORS policies too, as used by applications like &lt;a href="https://phanpy.social/"&gt;phanpy.social&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;GitHub Gists are a personal favorite here, because they let you build apps that can persist state to a permanent Gist through making a cross-origin API call.&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;&lt;a href="https://tools.simonwillison.net/species-observation-map#%7B%22taxonId%22%3A123829%2C%22taxonName%22%3A%22California%20Brown%20Pelican%22%2C%22days%22%3A%2230%22%7D"&gt;species-observation-map&lt;/a&gt;&lt;/strong&gt; uses iNaturalist to show a map of recent sightings of a particular species.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;a href="https://tools.simonwillison.net/zip-wheel-explorer?package=llm"&gt;zip-wheel-explorer&lt;/a&gt;&lt;/strong&gt; fetches a &lt;code&gt;.whl&lt;/code&gt; file for a Python package from PyPI, unzips it (in browser memory) and lets you navigate the files.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;a href="https://tools.simonwillison.net/github-issue-to-markdown?issue=https%3A%2F%2Fgithub.com%2Fsimonw%2Fsqlite-utils%2Fissues%2F657"&gt;github-issue-to-markdown&lt;/a&gt;&lt;/strong&gt; fetches issue details and comments from the GitHub API (including expanding any permanent code links) and turns them into copyable Markdown.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;a href="https://tools.simonwillison.net/terminal-to-html"&gt;terminal-to-html&lt;/a&gt;&lt;/strong&gt; can optionally save the user's converted terminal session to a Gist.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;a href="https://tools.simonwillison.net/bluesky-quote-finder?post=https%3A%2F%2Fbsky.app%2Fprofile%2Fsimonwillison.net%2Fpost%2F3m7auwt3ma222"&gt;bluesky-quote-finder&lt;/a&gt;&lt;/strong&gt; displays quotes of a specified Bluesky post, which can then be sorted by likes or by time.&lt;/li&gt;
&lt;/ul&gt;
&lt;div style="display: flex; width: 100%; gap: 20px; margin-bottom: 1em;"&gt;
  &lt;a href="https://tools.simonwillison.net/species-observation-map#%7B%22taxonId%22%3A123829%2C%22taxonName%22%3A%22California%20Brown%20Pelican%22%2C%22days%22%3A%2230%22%7D" style="flex: 1; width: 20%; border: none;"&gt;&lt;img src="https://static.simonwillison.net/static/2025/html-tools/species-observation-map.jpg" loading="lazy" style="width: 100%; height: auto; object-fit: cover;" alt="screenshot of species-observation-map" /&gt;&lt;/a&gt;
  &lt;a href="https://tools.simonwillison.net/zip-wheel-explorer?package=llm" style="flex: 1; width: 20%; border: none;"&gt;&lt;img src="https://static.simonwillison.net/static/2025/html-tools/zip-wheel-explorer.jpg" loading="lazy" style="width: 100%; height: auto; object-fit: cover;" alt="screenshot of zip-wheel-explorer" /&gt;&lt;/a&gt;
  &lt;a href="https://tools.simonwillison.net/github-issue-to-markdown?issue=https%3A%2F%2Fgithub.com%2Fsimonw%2Fsqlite-utils%2Fissues%2F657" style="flex: 1; width: 20%; border: none;"&gt;&lt;img src="https://static.simonwillison.net/static/2025/html-tools/github-issue-to-markdown.jpg" loading="lazy" style="width: 100%; height: auto; object-fit: cover;" alt="screenshot of github-issue-to-markdown" /&gt;&lt;/a&gt;
  &lt;a href="https://tools.simonwillison.net/terminal-to-html" style="flex: 1; width: 20%; border: none;"&gt;&lt;img src="https://static.simonwillison.net/static/2025/html-tools/terminal-to-html.jpg" loading="lazy" style="width: 100%; height: auto; object-fit: cover;" alt="screenshot of terminal-to-html" /&gt;&lt;/a&gt;
  &lt;a href="https://tools.simonwillison.net/bluesky-quote-finder?post=https%3A%2F%2Fbsky.app%2Fprofile%2Fsimonwillison.net%2Fpost%2F3m7auwt3ma222" style="flex: 1; width: 20%; border: none;"&gt;&lt;img src="https://static.simonwillison.net/static/2025/html-tools/bluesky-quote-finder.jpg" loading="lazy" style="width: 100%; height: auto; object-fit: cover;" alt="screenshot of bluesky-quote-finder" /&gt;&lt;/a&gt;
&lt;/div&gt;
&lt;h4 id="llms-can-be-called-directly-via-cors"&gt;LLMs can be called directly via CORS&lt;/h4&gt;
&lt;p&gt;All three of OpenAI, Anthropic and Gemini offer JSON APIs that can be accessed via CORS directly from HTML tools.&lt;/p&gt;
&lt;p&gt;Unfortunately you still need an API key, and if you bake that key into your visible HTML anyone can steal it and use to rack up charges on your account.&lt;/p&gt;
&lt;p&gt;I use the &lt;code&gt;localStorage&lt;/code&gt; secrets pattern to store API keys for these services. This sucks from a user experience perspective - telling users to go and create an API key and paste it into a tool is a lot of friction - but it does work.&lt;/p&gt;
&lt;p&gt;Some examples:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;&lt;a href="https://tools.simonwillison.net/haiku"&gt;haiku&lt;/a&gt;&lt;/strong&gt; uses the Claude API to write a haiku about an image from the user's webcam.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;a href="https://tools.simonwillison.net/openai-audio-output"&gt;openai-audio-output&lt;/a&gt;&lt;/strong&gt; generates audio speech using OpenAI's GPT-4o audio API.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;a href="http://tools.simonwillison.net/gemini-bbox"&gt;gemini-bbox&lt;/a&gt;&lt;/strong&gt; demonstrates Gemini 2.5's ability to return complex shaped image masks for objects in images, see &lt;a href="https://simonwillison.net/2025/Apr/18/gemini-image-segmentation/"&gt;Image segmentation using Gemini 2.5&lt;/a&gt;.&lt;/li&gt;
&lt;/ul&gt;
&lt;div style="display: flex; width: 100%; gap: 20px; margin-bottom: 1em;"&gt;
  &lt;a href="https://tools.simonwillison.net/haiku" style="flex: 1; width: 30%; border: none;"&gt;&lt;img src="https://static.simonwillison.net/static/2025/html-tools/haiku.jpg" loading="lazy" style="width: 100%; height: auto; object-fit: cover;" alt="screenshot of haiku" /&gt;&lt;/a&gt;
  &lt;a href="https://tools.simonwillison.net/openai-audio-output" style="flex: 1; width: 30%; border: none;"&gt;&lt;img src="https://static.simonwillison.net/static/2025/html-tools/openai-audio-output.jpg" loading="lazy" style="width: 100%; height: auto; object-fit: cover;" alt="screenshot of openai-audio-output" /&gt;&lt;/a&gt;
  &lt;a href="http://tools.simonwillison.net/gemini-bbox" style="flex: 1; width: 30%; border: none;"&gt;&lt;img src="https://static.simonwillison.net/static/2025/html-tools/gemini-bbox.jpg" loading="lazy" style="width: 100%; height: auto; object-fit: cover;" alt="screenshot of gemini-bbox" /&gt;&lt;/a&gt;
&lt;/div&gt;
&lt;h4 id="don-t-be-afraid-of-opening-files"&gt;Don't be afraid of opening files&lt;/h4&gt;
&lt;p&gt;You don't need to upload a file to a server in order to make use of the &lt;code&gt;&amp;lt;input type="file"&amp;gt;&lt;/code&gt; element. JavaScript can access the content of that file directly, which opens up a wealth of opportunities for useful functionality.&lt;/p&gt;
&lt;p&gt;Some examples:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;&lt;a href="https://tools.simonwillison.net/ocr"&gt;ocr&lt;/a&gt;&lt;/strong&gt; is the first tool I built for my collection, described in &lt;a href="https://simonwillison.net/2024/Mar/30/ocr-pdfs-images/"&gt;Running OCR against PDFs and images directly in your browser&lt;/a&gt;. It uses &lt;code&gt;PDF.js&lt;/code&gt; and &lt;code&gt;Tesseract.js&lt;/code&gt; to allow users to open a PDF in their browser which it then converts to an image-per-page and runs through OCR.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;a href="https://tools.simonwillison.net/social-media-cropper"&gt;social-media-cropper&lt;/a&gt;&lt;/strong&gt; lets you open (or paste in) an existing image and then crop it to common dimensions needed for different social media platforms - 2:1 for Twitter and LinkedIn, 1.4:1 for Substack etc.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;a href="https://tools.simonwillison.net/ffmpeg-crop"&gt;ffmpeg-crop&lt;/a&gt;&lt;/strong&gt; lets you open and preview a video file in your browser, drag a crop box within it and then copy out the &lt;code&gt;ffmpeg&lt;/code&gt; command needed to produce a cropped copy on your own machine.&lt;/li&gt;
&lt;/ul&gt;
&lt;div style="display: flex; width: 100%; gap: 20px; margin-bottom: 1em;"&gt;
  &lt;a href="https://tools.simonwillison.net/ocr" style="flex: 1; width: 30%; border: none;"&gt;&lt;img src="https://static.simonwillison.net/static/2025/html-tools/ocr.jpg" loading="lazy" style="width: 100%; height: auto; object-fit: cover;" alt="screenshot of ocr" /&gt;&lt;/a&gt;
  &lt;a href="https://tools.simonwillison.net/social-media-cropper" style="flex: 1; width: 30%; border: none;"&gt;&lt;img src="https://static.simonwillison.net/static/2025/html-tools/social-media-cropper.jpg" loading="lazy" style="width: 100%; height: auto; object-fit: cover;" alt="screenshot of social-media-cropper" /&gt;&lt;/a&gt;
  &lt;a href="https://tools.simonwillison.net/ffmpeg-crop" style="flex: 1; width: 30%; border: none;"&gt;&lt;img src="https://static.simonwillison.net/static/2025/html-tools/ffmpeg-crop.jpg" loading="lazy" style="width: 100%; height: auto; object-fit: cover;" alt="screenshot of ffmpeg-crop" /&gt;&lt;/a&gt;
&lt;/div&gt;
&lt;h4 id="you-can-offer-downloadable-files-too"&gt;You can offer downloadable files too&lt;/h4&gt;
&lt;p&gt;An HTML tool can generate a file for download without needing help from a server.&lt;/p&gt;
&lt;p&gt;The JavaScript library ecosystem has a huge range of packages for generating files in all kinds of useful formats.&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;&lt;a href="https://tools.simonwillison.net/svg-render"&gt;svg-render&lt;/a&gt;&lt;/strong&gt; lets the user download the PNG or JPEG rendered from an SVG.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;a href="https://tools.simonwillison.net/social-media-cropper"&gt;social-media-cropper&lt;/a&gt;&lt;/strong&gt; does the same for cropped images.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;a href="https://tools.simonwillison.net/open-sauce-2025"&gt;open-sauce-2025&lt;/a&gt;&lt;/strong&gt; is my alternative schedule for a conference that includes a downloadable ICS file for adding the schedule to your calendar. See &lt;a href="https://simonwillison.net/2025/Jul/17/vibe-scraping/"&gt;Vibe scraping and vibe coding a schedule app for Open Sauce 2025 entirely on my phone&lt;/a&gt; for more on that project.&lt;/li&gt;
&lt;/ul&gt;
&lt;div style="display: flex; width: 100%; gap: 20px; margin-bottom: 1em;"&gt;
  &lt;a href="https://tools.simonwillison.net/svg-render" style="flex: 1; width: 30%; border: none;"&gt;&lt;img src="https://static.simonwillison.net/static/2025/html-tools/svg-render.jpg" loading="lazy" style="width: 100%; height: auto; object-fit: cover;" alt="screenshot of svg-render" /&gt;&lt;/a&gt;
  &lt;a href="https://tools.simonwillison.net/social-media-cropper" style="flex: 1; width: 30%; border: none;"&gt;&lt;img src="https://static.simonwillison.net/static/2025/html-tools/social-media-cropper.jpg" loading="lazy" style="width: 100%; height: auto; object-fit: cover;" alt="screenshot of social-media-cropper" /&gt;&lt;/a&gt;
  &lt;a href="https://tools.simonwillison.net/open-sauce-2025" style="flex: 1; width: 30%; border: none;"&gt;&lt;img src="https://static.simonwillison.net/static/2025/html-tools/open-sauce-2025.jpg" loading="lazy" style="width: 100%; height: auto; object-fit: cover;" alt="screenshot of open-sauce-2025" /&gt;&lt;/a&gt;
&lt;/div&gt;
&lt;h4 id="pyodide-can-run-python-code-in-the-browser"&gt;Pyodide can run Python code in the browser&lt;/h4&gt;
&lt;p&gt;&lt;a href="https://pyodide.org/"&gt;Pyodide&lt;/a&gt; is a distribution of Python that's compiled to WebAssembly and designed to run directly in browsers. It's an engineering marvel and one of the most underrated corners of the Python world.&lt;/p&gt;
&lt;p&gt;It also cleanly loads from a CDN, which means there's no reason not to use it in HTML tools!&lt;/p&gt;
&lt;p&gt;Even better, the Pyodide project includes &lt;a href="https://github.com/pyodide/micropip"&gt;micropip&lt;/a&gt; - a mechanism that can load extra pure-Python packages from PyPI via CORS.&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;&lt;a href="https://tools.simonwillison.net/pyodide-bar-chart"&gt;pyodide-bar-chart&lt;/a&gt;&lt;/strong&gt; demonstrates running Pyodide, Pandas and matplotlib to render a bar chart directly in the browser.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;a href="https://tools.simonwillison.net/numpy-pyodide-lab"&gt;numpy-pyodide-lab&lt;/a&gt;&lt;/strong&gt; is an experimental interactive tutorial for Numpy.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;a href="https://tools.simonwillison.net/apsw-query"&gt;apsw-query&lt;/a&gt;&lt;/strong&gt; demonstrates the &lt;a href="https://github.com/rogerbinns/apsw"&gt;APSW SQLite library&lt;/a&gt;  running in a browser, using it to show EXPLAIN QUERY plans for SQLite queries.&lt;/li&gt;
&lt;/ul&gt;
&lt;div style="display: flex; width: 100%; gap: 20px; margin-bottom: 1em;"&gt;
  &lt;a href="https://tools.simonwillison.net/pyodide-bar-chart" style="flex: 1; width: 30%; border: none;"&gt;&lt;img src="https://static.simonwillison.net/static/2025/html-tools/pyodide-bar-chart.jpg" loading="lazy" style="width: 100%; height: auto; object-fit: cover;" alt="screenshot of pyodide-bar-chart" /&gt;&lt;/a&gt;
  &lt;a href="https://tools.simonwillison.net/numpy-pyodide-lab" style="flex: 1; width: 30%; border: none;"&gt;&lt;img src="https://static.simonwillison.net/static/2025/html-tools/numpy-pyodide-lab.jpg" loading="lazy" style="width: 100%; height: auto; object-fit: cover;" alt="screenshot of numpy-pyodide-lab" /&gt;&lt;/a&gt;
  &lt;a href="https://tools.simonwillison.net/apsw-query" style="flex: 1; width: 30%; border: none;"&gt;&lt;img src="https://static.simonwillison.net/static/2025/html-tools/apsw-query.jpg" loading="lazy" style="width: 100%; height: auto; object-fit: cover;" alt="screenshot of apsw-query" /&gt;&lt;/a&gt;
&lt;/div&gt;
&lt;h4 id="webassembly-opens-more-possibilities"&gt;WebAssembly opens more possibilities&lt;/h4&gt;
&lt;p&gt;Pyodide is possible thanks to WebAssembly. WebAssembly means that a vast collection of software originally written in other languages can now be loaded in HTML tools as well.&lt;/p&gt;
&lt;p&gt;&lt;a href="https://squoosh.app/"&gt;Squoosh.app&lt;/a&gt; was the first example I saw that convinced me of the power of this pattern - it makes several best-in-class image compression libraries available directly in the browser.&lt;/p&gt;
&lt;p&gt;I've used WebAssembly for a few of my own tools:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;&lt;a href="https://tools.simonwillison.net/ocr"&gt;ocr&lt;/a&gt;&lt;/strong&gt; uses the pre-existing &lt;a href="https://tesseract.projectnaptha.com/"&gt;Tesseract.js&lt;/a&gt; WebAssembly port of the Tesseract OCR engine.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;a href="https://tools.simonwillison.net/sloccount"&gt;sloccount&lt;/a&gt;&lt;/strong&gt; is a port of David Wheeler's Perl and C &lt;a href="https://dwheeler.com/sloccount/"&gt;SLOCCount&lt;/a&gt; utility to the browser, using a big ball of WebAssembly duct tape. &lt;a href="https://simonwillison.net/2025/Oct/22/sloccount-in-webassembly/"&gt;More details here&lt;/a&gt;.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;a href="https://tools.simonwillison.net/micropython"&gt;micropython&lt;/a&gt;&lt;/strong&gt; is my experiment using &lt;a href="https://www.npmjs.com/package/@micropython/micropython-webassembly-pyscript"&gt;@micropython/micropython-webassembly-pyscript&lt;/a&gt; from NPM to run Python code with a smaller initial download than Pyodide.&lt;/li&gt;
&lt;/ul&gt;
&lt;div style="display: flex; width: 100%; gap: 20px; margin-bottom: 1em;"&gt;
  &lt;a href="https://tools.simonwillison.net/ocr" style="flex: 1; width: 30%; border: none;"&gt;&lt;img src="https://static.simonwillison.net/static/2025/html-tools/ocr.jpg" loading="lazy" style="width: 100%; height: auto; object-fit: cover;" alt="screenshot of ocr" /&gt;&lt;/a&gt;
  &lt;a href="https://tools.simonwillison.net/sloccount" style="flex: 1; width: 30%; border: none;"&gt;&lt;img src="https://static.simonwillison.net/static/2025/html-tools/sloccount.jpg" loading="lazy" style="width: 100%; height: auto; object-fit: cover;" alt="screenshot of sloccount" /&gt;&lt;/a&gt;
  &lt;a href="https://tools.simonwillison.net/micropython" style="flex: 1; width: 30%; border: none;"&gt;&lt;img src="https://static.simonwillison.net/static/2025/html-tools/micropython.jpg" loading="lazy" style="width: 100%; height: auto; object-fit: cover;" alt="screenshot of micropython" /&gt;&lt;/a&gt;
&lt;/div&gt;
&lt;h4 id="remix-your-previous-tools"&gt;Remix your previous tools&lt;/h4&gt;
&lt;p&gt;The biggest advantage of having a single public collection of 100+ tools is that it's easy for my LLM assistants to recombine them in interesting ways.&lt;/p&gt;
&lt;p&gt;Sometimes I'll copy and paste a previous tool into the context, but when I'm working with a coding agent I can reference them by name - or tell the agent to search for relevant examples before it starts work.&lt;/p&gt;
&lt;p&gt;The source code of any working tool doubles as clear documentation of how something can be done, including patterns for using editing libraries. An LLM with one or two existing tools in their context is much more likely to produce working code.&lt;/p&gt;
&lt;p&gt;I built &lt;strong&gt;&lt;a href="https://tools.simonwillison.net/pypi-changelog"&gt;pypi-changelog&lt;/a&gt;&lt;/strong&gt; by telling Claude Code:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;&lt;code&gt;Look at the pypi package explorer tool&lt;/code&gt;&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;And then, after it had found and read the source code for &lt;a href="https://tools.simonwillison.net/zip-wheel-explorer"&gt;zip-wheel-explorer&lt;/a&gt;:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;&lt;code&gt;Build a new tool pypi-changelog.html which uses the PyPI API to get the wheel URLs of all available versions of a package, then it displays them in a list where each pair has a "Show changes" clickable in between them - clicking on that fetches the full contents of the wheels and displays a nicely rendered diff representing the difference between the two, as close to a standard diff format as you can get with JS libraries from CDNs, and when that is displayed there is a "Copy" button which copies that diff to the clipboard&lt;/code&gt;&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;Here's &lt;a href="https://gistpreview.github.io/?9b48fd3f8b99a204ba2180af785c89d2"&gt;the full transcript&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;See &lt;a href="https://simonwillison.net/2024/Mar/30/ocr-pdfs-images/"&gt;Running OCR against PDFs and images directly in your browser&lt;/a&gt; for another detailed example of remixing tools to create something new.&lt;/p&gt;
&lt;h4 id="record-the-prompt-and-transcript"&gt;Record the prompt and transcript&lt;/h4&gt;
&lt;p&gt;I like keeping (and publishing) records of everything I do with LLMs, to help me grow my skills at using them over time.&lt;/p&gt;
&lt;p&gt;For HTML tools I built by chatting with an LLM platform directly I use the "share" feature for those platforms.&lt;/p&gt;
&lt;p&gt;For Claude Code or Codex CLI or other coding agents I copy and paste the full transcript from the terminal into my &lt;a href="https://tools.simonwillison.net/terminal-to-html"&gt;terminal-to-html&lt;/a&gt; tool and share that using a Gist.&lt;/p&gt;
&lt;p&gt;In either case I include links to those transcripts in the commit message when I save the finished tool to my repository. You can see those &lt;a href="https://tools.simonwillison.net/colophon"&gt;in my tools.simonwillison.net colophon&lt;/a&gt;.&lt;/p&gt;
&lt;h4 id="go-forth-and-build"&gt;Go forth and build&lt;/h4&gt;
&lt;p&gt;I've had &lt;em&gt;so much fun&lt;/em&gt; exploring the capabilities of LLMs in this way over the past year and a half, and building tools in this way has been invaluable in helping me understand both the potential for building tools with HTML and the capabilities of the LLMs that I'm building them with.&lt;/p&gt;
&lt;p&gt;If you're interested in starting your own collection I highly recommend it! All you need to get started is a free GitHub repository with GitHub Pages enabled (Settings -&amp;gt; Pages -&amp;gt; Source -&amp;gt; Deploy from a branch -&amp;gt; main) and you can start copying in &lt;code&gt;.html&lt;/code&gt; pages generated in whatever manner you like.&lt;/p&gt;

&lt;p&gt;&lt;small&gt;&lt;strong&gt;Bonus transcript&lt;/strong&gt;: Here's &lt;a href="http://gistpreview.github.io/?1b8cba6a8a21110339cbde370e755ba0"&gt;how I used Claude Code&lt;/a&gt; and &lt;a href="https://shot-scraper.datasette.io/"&gt;shot-scraper&lt;/a&gt; to add the screenshots to this post.&lt;/small&gt;&lt;/p&gt;
    
        &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/definitions"&gt;definitions&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/github"&gt;github&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/html"&gt;html&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/javascript"&gt;javascript&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/localstorage"&gt;localstorage&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/projects"&gt;projects&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/tools"&gt;tools&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/ai"&gt;ai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/webassembly"&gt;webassembly&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/generative-ai"&gt;generative-ai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/llms"&gt;llms&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/ai-assisted-programming"&gt;ai-assisted-programming&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/vibe-coding"&gt;vibe-coding&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/coding-agents"&gt;coding-agents&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/claude-code"&gt;claude-code&lt;/a&gt;&lt;/p&gt;
    

</summary><category term="definitions"/><category term="github"/><category term="html"/><category term="javascript"/><category term="localstorage"/><category term="projects"/><category term="tools"/><category term="ai"/><category term="webassembly"/><category term="generative-ai"/><category term="llms"/><category term="ai-assisted-programming"/><category term="vibe-coding"/><category term="coding-agents"/><category term="claude-code"/></entry><entry><title>Context plumbing</title><link href="https://simonwillison.net/2025/Nov/29/context-plumbing/#atom-tag" rel="alternate"/><published>2025-11-29T11:26:24+00:00</published><updated>2025-11-29T11:26:24+00:00</updated><id>https://simonwillison.net/2025/Nov/29/context-plumbing/#atom-tag</id><summary type="html">
    
&lt;p&gt;&lt;strong&gt;&lt;a href="https://interconnected.org/home/2025/11/28/plumbing"&gt;Context plumbing&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;
Matt Webb coins the term &lt;strong&gt;context plumbing&lt;/strong&gt; to describe the kind of engineering needed to feed agents the right context at the right time:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;Context appears at disparate sources, by user activity or changes in the user’s environment: what they’re working on changes, emails appear, documents are edited, it’s no longer sunny outside, the available tools have been updated.&lt;/p&gt;
&lt;p&gt;This context is not always where the AI runs (and the AI runs as closer as possible to the point of user intent).&lt;/p&gt;
&lt;p&gt;So the job of making an agent run really well is to move the context to where it needs to be. [...]&lt;/p&gt;
&lt;p&gt;So I’ve been thinking of AI system technical architecture as plumbing the sources and sinks of context.&lt;/p&gt;
&lt;/blockquote&gt;


    &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/definitions"&gt;definitions&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/matt-webb"&gt;matt-webb&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/ai"&gt;ai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/generative-ai"&gt;generative-ai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/llms"&gt;llms&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/ai-agents"&gt;ai-agents&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/context-engineering"&gt;context-engineering&lt;/a&gt;&lt;/p&gt;



</summary><category term="definitions"/><category term="matt-webb"/><category term="ai"/><category term="generative-ai"/><category term="llms"/><category term="ai-agents"/><category term="context-engineering"/></entry><entry><title>Agent design is still hard</title><link href="https://simonwillison.net/2025/Nov/23/agent-design-is-still-hard/#atom-tag" rel="alternate"/><published>2025-11-23T00:49:39+00:00</published><updated>2025-11-23T00:49:39+00:00</updated><id>https://simonwillison.net/2025/Nov/23/agent-design-is-still-hard/#atom-tag</id><summary type="html">
    
&lt;p&gt;&lt;strong&gt;&lt;a href="https://lucumr.pocoo.org/2025/11/21/agents-are-hard/"&gt;Agent design is still hard&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;
Armin Ronacher presents a cornucopia of lessons learned from building agents over the past few months.&lt;/p&gt;
&lt;p&gt;There are several agent abstraction libraries available now (my own &lt;a href="https://llm.datasette.io/"&gt;LLM library&lt;/a&gt; is edging into that territory with its &lt;a href="https://simonwillison.net/2025/May/27/llm-tools/"&gt;tools feature&lt;/a&gt;) but Armin has found that the abstractions are not worth adopting yet:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;[…] the differences between models are significant enough that you will need to build your own agent abstraction. We have not found any of the solutions from these SDKs that build the right abstraction for an agent. I think this is partly because, despite the basic agent design being just a loop, there are subtle differences based on the tools you provide. These differences affect how easy or hard it is to find the right abstraction (cache control, different requirements for reinforcement, tool prompts, provider-side tools, etc.). Because the right abstraction is not yet clear, using the original SDKs from the dedicated platforms keeps you fully in control. […]&lt;/p&gt;
&lt;p&gt;This might change, but right now we would probably not use an abstraction when building an agent, at least until things have settled down a bit. The benefits do not yet outweigh the costs for us.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;Armin introduces the new-to-me term &lt;strong&gt;reinforcement&lt;/strong&gt;, where you remind the agent of things as it goes along:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;Every time the agent runs a tool you have the opportunity to not just return data that the tool produces, but also to feed more information back into the loop. For instance, you can remind the agent about the overall objective and the status of individual tasks. […] Another use of reinforcement is to inform the system about state changes that happened in the background.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;Claude Code’s TODO list is another example of this pattern in action.&lt;/p&gt;
&lt;p&gt;Testing and evals remains the single hardest problem in AI engineering:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;We find testing and evals to be the hardest problem here. This is not entirely surprising, but the agentic nature makes it even harder. Unlike prompts, you cannot just do the evals in some external system because there’s too much you need to feed into it. This means you want to do evals based on observability data or instrumenting your actual test runs. So far none of the solutions we have tried have convinced us that they found the right approach here.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;Armin also has a follow-up post, &lt;a href="https://lucumr.pocoo.org/2025/11/22/llm-apis/"&gt;LLM APIs are a Synchronization Problem&lt;/a&gt;, which argues that the shape of current APIs hides too many details from us as developers, and the core challenge here is in synchronizing state between the tokens fed through the GPUs and our client applications - something that may benefit from alternative approaches developed by the local-first movement.

    &lt;p&gt;&lt;small&gt;&lt;/small&gt;Via &lt;a href="https://news.ycombinator.com/item?id=46013935"&gt;Hacker News&lt;/a&gt;&lt;/small&gt;&lt;/p&gt;


    &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/armin-ronacher"&gt;armin-ronacher&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/definitions"&gt;definitions&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/ai"&gt;ai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/prompt-engineering"&gt;prompt-engineering&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/generative-ai"&gt;generative-ai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/llms"&gt;llms&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/evals"&gt;evals&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/ai-agents"&gt;ai-agents&lt;/a&gt;&lt;/p&gt;



</summary><category term="armin-ronacher"/><category term="definitions"/><category term="ai"/><category term="prompt-engineering"/><category term="generative-ai"/><category term="llms"/><category term="evals"/><category term="ai-agents"/></entry><entry><title>We should all be using dependency cooldowns</title><link href="https://simonwillison.net/2025/Nov/21/dependency-cooldowns/#atom-tag" rel="alternate"/><published>2025-11-21T17:27:33+00:00</published><updated>2025-11-21T17:27:33+00:00</updated><id>https://simonwillison.net/2025/Nov/21/dependency-cooldowns/#atom-tag</id><summary type="html">
    
&lt;p&gt;&lt;strong&gt;&lt;a href="https://blog.yossarian.net/2025/11/21/We-should-all-be-using-dependency-cooldowns"&gt;We should all be using dependency cooldowns&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;
William Woodruff gives a name to a sensible strategy for managing dependencies while reducing the chances of a surprise supply chain attack: &lt;strong&gt;dependency cooldowns&lt;/strong&gt;.&lt;/p&gt;
&lt;p&gt;Supply chain attacks happen when an attacker compromises a widely used open source package and publishes a new version with an exploit. These are usually spotted &lt;em&gt;very&lt;/em&gt; quickly, so an attack often only has a few hours of effective window before the problem is identified and the compromised package is pulled.&lt;/p&gt;
&lt;p&gt;You are most at risk if you're automatically applying upgrades the same day they are released.&lt;/p&gt;
&lt;p&gt;William says:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;I &lt;strong&gt;love&lt;/strong&gt; cooldowns for several reasons:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;They're empirically effective, per above. They won't stop &lt;em&gt;all&lt;/em&gt; attackers, but they &lt;em&gt;do&lt;/em&gt; stymie the majority of high-visibiity, mass-impact supply chain attacks that have become more common.&lt;/li&gt;
&lt;li&gt;They're &lt;em&gt;incredibly&lt;/em&gt; easy to implement. Moreover, they're &lt;strong&gt;literally free&lt;/strong&gt; to implement in most cases: most people can use &lt;a href="https://docs.github.com/en/code-security/dependabot/working-with-dependabot/dependabot-options-reference#cooldown-"&gt;Dependabot's functionality&lt;/a&gt;, &lt;a href="https://docs.renovatebot.com/key-concepts/minimum-release-age/"&gt;Renovate's functionality&lt;/a&gt;, or the functionality build directly into their package manager&lt;/li&gt;
&lt;/ul&gt;
&lt;/blockquote&gt;
&lt;p&gt;The one counter-argument to this is that sometimes an upgrade fixes a security vulnerability, and in those cases every hour of delay in upgrading as an hour when an attacker could exploit the new issue against your software.&lt;/p&gt;
&lt;p&gt;I see that as an argument for carefully monitoring the release notes of your dependencies, and paying special attention to security advisories. I'm a big fan of the &lt;a href="https://github.com/advisories"&gt;GitHub Advisory Database&lt;/a&gt; for that kind of information.

    &lt;p&gt;&lt;small&gt;&lt;/small&gt;Via &lt;a href="https://news.ycombinator.com/item?id=46005111"&gt;Hacker News&lt;/a&gt;&lt;/small&gt;&lt;/p&gt;


    &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/definitions"&gt;definitions&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/github"&gt;github&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/open-source"&gt;open-source&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/packaging"&gt;packaging&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/supply-chain"&gt;supply-chain&lt;/a&gt;&lt;/p&gt;



</summary><category term="definitions"/><category term="github"/><category term="open-source"/><category term="packaging"/><category term="supply-chain"/></entry><entry><title>Quoting MiniMax</title><link href="https://simonwillison.net/2025/Nov/3/minimax/#atom-tag" rel="alternate"/><published>2025-11-03T17:24:39+00:00</published><updated>2025-11-03T17:24:39+00:00</updated><id>https://simonwillison.net/2025/Nov/3/minimax/#atom-tag</id><summary type="html">
    &lt;blockquote cite="https://x.com/minimax__ai/status/1985375617622454566"&gt;&lt;p&gt;&lt;strong&gt;Interleaved thinking&lt;/strong&gt; is essential for LLM agents: it means alternating between explicit reasoning and tool use, while carrying that reasoning forward between steps.This process significantly enhances &lt;strong&gt;planning, self‑correction, and reliability&lt;/strong&gt; in long workflows. [...]&lt;/p&gt;
&lt;p&gt;From community feedback, we've often observed failures to preserve prior-round thinking state across multi-turn interactions with M2. The root cause is that the widely-used &lt;strong&gt;OpenAI Chat Completion API does not support passing reasoning content back in subsequent requests&lt;/strong&gt;. Although the Anthropic API natively supports this capability, the community has provided less support for models beyond Claude, and many applications still omit passing back the previous turns' thinking in their Anthropic API implementations. This situation has resulted in poor support for Interleaved Thinking for new models. &lt;strong&gt;To fully unlock M2's capabilities, preserving the reasoning process across multi-turn interactions is essential&lt;/strong&gt;.&lt;/p&gt;&lt;/blockquote&gt;
&lt;p class="cite"&gt;&amp;mdash; &lt;a href="https://x.com/minimax__ai/status/1985375617622454566"&gt;MiniMax&lt;/a&gt;, Interleaved Thinking Unlocks Reliable MiniMax-M2 Agentic Capability&lt;/p&gt;

    &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/definitions"&gt;definitions&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/ai"&gt;ai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/generative-ai"&gt;generative-ai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/llms"&gt;llms&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/ai-agents"&gt;ai-agents&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/llm-reasoning"&gt;llm-reasoning&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/ai-in-china"&gt;ai-in-china&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/minimax"&gt;minimax&lt;/a&gt;&lt;/p&gt;



</summary><category term="definitions"/><category term="ai"/><category term="generative-ai"/><category term="llms"/><category term="ai-agents"/><category term="llm-reasoning"/><category term="ai-in-china"/><category term="minimax"/></entry><entry><title>New prompt injection papers: Agents Rule of Two and The Attacker Moves Second</title><link href="https://simonwillison.net/2025/Nov/2/new-prompt-injection-papers/#atom-tag" rel="alternate"/><published>2025-11-02T23:09:33+00:00</published><updated>2025-11-02T23:09:33+00:00</updated><id>https://simonwillison.net/2025/Nov/2/new-prompt-injection-papers/#atom-tag</id><summary type="html">
    &lt;p&gt;Two interesting new papers regarding LLM security and prompt injection came to my attention this weekend.&lt;/p&gt;
&lt;h4 id="agents-rule-of-two-a-practical-approach-to-ai-agent-security"&gt;Agents Rule of Two: A Practical Approach to AI Agent Security&lt;/h4&gt;
&lt;p&gt;The first is &lt;a href="https://ai.meta.com/blog/practical-ai-agent-security/"&gt;Agents Rule of Two: A Practical Approach to AI Agent Security&lt;/a&gt;, published on October 31st on the Meta AI blog. It doesn't list authors but it was &lt;a href="https://x.com/MickAyzenberg/status/1984355145917088235"&gt;shared on Twitter&lt;/a&gt; by Meta AI security researcher Mick Ayzenberg.&lt;/p&gt;
&lt;p&gt;It proposes a "Rule of Two" that's inspired by both my own &lt;a href="https://simonwillison.net/2025/Jun/16/the-lethal-trifecta/"&gt;lethal trifecta&lt;/a&gt; concept and the Google Chrome team's &lt;a href="https://chromium.googlesource.com/chromium/src/+/main/docs/security/rule-of-2.md"&gt;Rule Of 2&lt;/a&gt; for writing code that works with untrustworthy inputs:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;At a high level, the Agents Rule of Two states that until robustness research allows us to reliably detect and refuse prompt injection, agents &lt;strong&gt;must satisfy no more than two&lt;/strong&gt; of the following three properties within a session to avoid the highest impact consequences of prompt injection.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;[A]&lt;/strong&gt; An agent can process untrustworthy inputs&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;[B]&lt;/strong&gt; An agent can have access to sensitive systems or private data&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;[C]&lt;/strong&gt; An agent can change state or communicate externally&lt;/p&gt;
&lt;p&gt;It's still possible that all three properties are necessary to carry out a request. If an agent requires all three without starting a new session (i.e., with a fresh context window), then the agent should not be permitted to operate autonomously and at a minimum requires supervision --- via human-in-the-loop approval or another reliable means of validation.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;It's accompanied by this handy diagram:&lt;/p&gt;
&lt;p&gt;&lt;img src="https://static.simonwillison.net/static/2025/agents-rule-of-two-updated.jpg" alt="Venn diagram titled &amp;quot;Choose Two&amp;quot; showing three overlapping circles labeled A, B, and C. Circle A (top): &amp;quot;Process untrustworthy inputs&amp;quot; with description &amp;quot;Externally authored data may contain prompt injection attacks that turn an agent malicious.&amp;quot; Circle B (bottom left): &amp;quot;Access to sensitive systems or private data&amp;quot; with description &amp;quot;This includes private user data, company secrets, production settings and configs, source code, and other sensitive data.&amp;quot; Circle C (bottom right): &amp;quot;Change state or communicate externally&amp;quot; with description &amp;quot;Overwrite or change state through write actions, or transmitting data to a threat actor through web requests or tool calls.&amp;quot; The two-way overlaps between circles are labeled &amp;quot;Lower risk&amp;quot; while the center where all three circles overlap is labeled &amp;quot;Danger&amp;quot;." style="max-width: 100%;" /&gt;&lt;/p&gt;
&lt;p&gt;I like this &lt;em&gt;a lot&lt;/em&gt;.&lt;/p&gt;
&lt;p&gt;I've spent several years now trying to find clear ways to explain the risks of prompt injection attacks to developers who are building on top of LLMs. It's frustratingly difficult.&lt;/p&gt;
&lt;p&gt;I've had the most success with the lethal trifecta, which boils one particular class of prompt injection attack down to a simple-enough model: if your system has access to private data, exposure to untrusted content and a way to communicate externally then it's vulnerable to private data being stolen.&lt;/p&gt;
&lt;p&gt;The one problem with the lethal trifecta is that it only covers the risk of data exfiltration: there are plenty of other, even nastier risks that arise from prompt injection attacks against LLM-powered agents with access to tools which the lethal trifecta doesn't cover.&lt;/p&gt;
&lt;p&gt;The Agents Rule of Two neatly solves this, through the addition of "changing state" as a property to consider. This brings other forms of tool usage into the picture: anything that can change state triggered by untrustworthy inputs is something to be very cautious about.&lt;/p&gt;
&lt;p&gt;It's also refreshing to see another major research lab concluding that prompt injection remains an unsolved problem, and attempts to block or filter them have not proven reliable enough to depend on. The current solution is to design systems with this in mind, and the Rule of Two is a solid way to think about that.&lt;/p&gt;
&lt;p id="exception"&gt;&lt;strong&gt;Update&lt;/strong&gt;: On thinking about this further there's one aspect of the Rule of Two model that doesn't work for me: the Venn diagram above marks the combination of untrustworthy inputs and the ability to change state as "safe", but that's not right. Even without access to private systems or sensitive data that pairing can still produce harmful results. Unfortunately adding an exception for that pair undermines the simplicity of the "Rule of Two" framing!&lt;/p&gt;
&lt;p id="update-2"&gt;&lt;strong&gt;Update 2&lt;/strong&gt;: Mick Ayzenberg responded to this note in &lt;a href="https://news.ycombinator.com/item?id=45794245#45802448"&gt;a comment on Hacker News&lt;/a&gt;:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;Thanks for the feedback! One small bit of clarification, the framework would describe access to any sensitive system as part of the [B] circle, not only private systems or private data.&lt;/p&gt;
&lt;p&gt;The intention is that an agent that has removed [B] can write state and communicate freely, but not with any systems that matter (wrt critical security outcomes for its user). An example of an agent in this state would be one that can take actions in a tight sandbox or is isolated from production.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;The Meta team also &lt;a href="https://news.ycombinator.com/item?id=45794245#45802046"&gt;updated their post&lt;/a&gt; to replace "safe" with "lower risk" as the label on the intersections between the different circles. I've updated my screenshots of their diagrams in this post, &lt;a href="https://static.simonwillison.net/static/2025/agents-rule-of-two.jpg"&gt;here's the original&lt;/a&gt; for comparison.&lt;/p&gt;
&lt;p&gt;Which brings me to the second paper...&lt;/p&gt;
&lt;h4 id="the-attacker-moves-second-stronger-adaptive-attacks-bypass-defenses-against-llm-jailbreaks-and-prompt-injections"&gt;The Attacker Moves Second: Stronger Adaptive Attacks Bypass Defenses Against LLM Jailbreaks and Prompt Injections&lt;/h4&gt;
&lt;p&gt;This paper is dated 10th October 2025 &lt;a href="https://arxiv.org/abs/2510.09023"&gt;on Arxiv&lt;/a&gt; and comes from a heavy-hitting team of 14 authors - Milad Nasr, Nicholas Carlini, Chawin Sitawarin, Sander V. Schulhoff, Jamie Hayes, Michael Ilie, Juliette Pluto, Shuang Song, Harsh Chaudhari, Ilia Shumailov, Abhradeep Thakurta, Kai Yuanqing Xiao, Andreas Terzis, Florian Tramèr - including representatives from OpenAI, Anthropic, and Google DeepMind.&lt;/p&gt;
&lt;p&gt;The paper looks at 12 published defenses against prompt injection and jailbreaking and subjects them to a range of "adaptive attacks" - attacks that are allowed to expend considerable effort iterating multiple times to try and find a way through.&lt;/p&gt;
&lt;p&gt;The defenses did not fare well:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;By systematically tuning and scaling general optimization techniques—gradient descent, reinforcement learning, random search, and human-guided exploration—we bypass 12 recent defenses (based on a diverse set of techniques) with attack success rate above 90% for most; importantly, the majority of defenses originally reported near-zero attack success rates.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;Notably the "Human red-teaming setting" scored 100%, defeating all defenses. That red-team consisted of 500 participants in an online competition they ran with a $20,000 prize fund.&lt;/p&gt;
&lt;p&gt;The key point of the paper is that static example attacks - single string prompts designed to bypass systems - are an almost useless way to evaluate these defenses. Adaptive attacks are far more powerful, as shown by this chart:&lt;/p&gt;
&lt;p&gt;&lt;img src="https://static.simonwillison.net/static/2025/attack-success-rate.jpg" alt="Bar chart showing Attack Success Rate (%) for various security systems across four categories: Prompting, Training, Filtering Model, and Secret Knowledge. The chart compares three attack types shown in the legend: Static / weak attack (green hatched bars), Automated attack (ours) (orange bars), and Human red-teaming (ours) (purple dotted bars). Systems and their success rates are: Spotlighting (28% static, 99% automated), Prompt Sandwich (21% static, 95% automated), RPO (0% static, 99% automated), Circuit Breaker (8% static, 100% automated), StruQ (62% static, 100% automated), SeqAlign (5% static, 96% automated), ProtectAI (15% static, 90% automated), PromptGuard (26% static, 94% automated), PIGuard (0% static, 71% automated), Model Armor (0% static, 90% automated), Data Sentinel (0% static, 80% automated), MELON (0% static, 89% automated), and Human red-teaming setting (0% static, 100% human red-teaming)." style="max-width: 100%;" /&gt;&lt;/p&gt;
&lt;p&gt;The three automated adaptive attack techniques used by the paper are:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Gradient-based methods&lt;/strong&gt; - these were the least effective, using the technique described in the legendary &lt;a href="https://arxiv.org/abs/2307.15043"&gt;Universal and Transferable Adversarial Attacks on Aligned Language Models&lt;/a&gt; paper &lt;a href="https://simonwillison.net/2023/Jul/27/universal-and-transferable-attacks-on-aligned-language-models/"&gt;from 2023&lt;/a&gt;.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Reinforcement learning methods&lt;/strong&gt; - particularly effective against black-box models: "we allowed the attacker model to interact directly with the defended system and observe its outputs", using 32 sessions of 5 rounds each.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Search-based methods&lt;/strong&gt; - generate candidates with an LLM, then evaluate and further modify them using LLM-as-judge and other classifiers.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;The paper concludes somewhat optimistically:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;[...] Adaptive evaluations are therefore more challenging to perform,
making it all the more important that they are performed. We again urge defense authors to release simple, easy-to-prompt defenses that are amenable to human analysis. [...] Finally, we hope that our analysis here will increase the standard for defense evaluations, and in so doing, increase the likelihood that reliable jailbreak and prompt injection defenses will be developed.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;Given how totally the defenses were defeated, I do not share their optimism that reliable defenses will be developed any time soon.&lt;/p&gt;
&lt;p&gt;As a review of how far we still have to go this paper packs a powerful punch. I think it makes a strong case for Meta's Agents Rule of Two as the best practical advice for building secure LLM-powered agent systems today in the absence of prompt injection defenses we can rely on.&lt;/p&gt;
    
        &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/definitions"&gt;definitions&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/security"&gt;security&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/openai"&gt;openai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/prompt-injection"&gt;prompt-injection&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/anthropic"&gt;anthropic&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/nicholas-carlini"&gt;nicholas-carlini&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/paper-review"&gt;paper-review&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/lethal-trifecta"&gt;lethal-trifecta&lt;/a&gt;&lt;/p&gt;
    

</summary><category term="definitions"/><category term="security"/><category term="openai"/><category term="prompt-injection"/><category term="anthropic"/><category term="nicholas-carlini"/><category term="paper-review"/><category term="lethal-trifecta"/></entry><entry><title>Claude Skills are awesome, maybe a bigger deal than MCP</title><link href="https://simonwillison.net/2025/Oct/16/claude-skills/#atom-tag" rel="alternate"/><published>2025-10-16T21:25:18+00:00</published><updated>2025-10-16T21:25:18+00:00</updated><id>https://simonwillison.net/2025/Oct/16/claude-skills/#atom-tag</id><summary type="html">
    &lt;p&gt;Anthropic this morning &lt;a href="https://www.anthropic.com/news/skills"&gt;introduced Claude Skills&lt;/a&gt;, a new pattern for making new abilities available to their models:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;Claude can now use &lt;em&gt;Skills&lt;/em&gt; to improve how it performs specific tasks. Skills are folders that include instructions, scripts, and resources that Claude can load when needed.&lt;/p&gt;
&lt;p&gt;Claude will only access a skill when it's relevant to the task at hand. When used, skills make Claude better at specialized tasks like working with Excel or following your organization's brand guidelines.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;Their engineering blog has a &lt;a href="https://www.anthropic.com/engineering/equipping-agents-for-the-real-world-with-agent-skills"&gt;more detailed explanation&lt;/a&gt;. There's also a new &lt;a href="https://github.com/anthropics/skills"&gt;anthropics/skills&lt;/a&gt; GitHub repo.&lt;/p&gt;
&lt;p&gt;(I inadvertently preempted their announcement of this feature when I reverse engineered and &lt;a href="https://simonwillison.net/2025/Oct/10/claude-skills/"&gt;wrote about it last Friday&lt;/a&gt;!)&lt;/p&gt;
&lt;p&gt;Skills are conceptually extremely simple: a skill is a Markdown file telling the model how to do something, optionally accompanied by extra documents and pre-written scripts that the model can run to help it accomplish the tasks described by the skill.&lt;/p&gt;
&lt;p&gt;Claude's new &lt;a href="https://www.anthropic.com/news/create-files"&gt;document creation abilities&lt;/a&gt;, which accompanied &lt;a href="https://simonwillison.net/2025/Sep/9/claude-code-interpreter/"&gt;their new code interpreter feature&lt;/a&gt; in September, turned out to be entirely implemented using skills. Those are &lt;a href="https://github.com/anthropics/skills/tree/main/document-skills"&gt;now available in Anthropic's repo&lt;/a&gt; covering &lt;code&gt;.pdf&lt;/code&gt;, &lt;code&gt;.docx&lt;/code&gt;, &lt;code&gt;.xlsx&lt;/code&gt;, and &lt;code&gt;.pptx&lt;/code&gt; files.&lt;/p&gt;
&lt;p&gt;There's one extra detail that makes this a feature, not just a bunch of files on disk. At the start of a session Claude's various harnesses can scan all available skill files and read a short explanation for each one from the frontmatter YAML in the Markdown file. This is &lt;em&gt;very&lt;/em&gt; token efficient: each skill only takes up a few dozen extra tokens, with the full details only loaded in should the user request a task that the skill can help solve.&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;&lt;a href="https://simonwillison.net/2025/Oct/16/claude-skills/#trying-out-the-slack-gif-creator-skill"&gt;Trying out the slack-gif-creator skill&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href="https://simonwillison.net/2025/Oct/16/claude-skills/#skills-depend-on-a-coding-environment"&gt;Skills depend on a coding environment&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href="https://simonwillison.net/2025/Oct/16/claude-skills/#claude-as-a-general-agent"&gt;Claude Code as a General Agent&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href="https://simonwillison.net/2025/Oct/16/claude-skills/#skills-compared-to-mcp"&gt;Skills compared to MCP&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href="https://simonwillison.net/2025/Oct/16/claude-skills/#here-come-the-skills"&gt;Here come the Skills&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href="https://simonwillison.net/2025/Oct/16/claude-skills/#the-simplicity-is-the-point"&gt;The simplicity is the point&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;h4 id="trying-out-the-slack-gif-creator-skill"&gt;Trying out the slack-gif-creator skill&lt;/h4&gt;
&lt;p&gt;Here's that metadata for an example &lt;a href="https://github.com/anthropics/skills/blob/main/slack-gif-creator/SKILL.md"&gt;slack-gif-creator skill&lt;/a&gt; that Anthropic published this morning:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;Toolkit for creating animated GIFs optimized for Slack, with validators for size constraints and composable animation primitives. This skill applies when users request animated GIFs or emoji animations for Slack from descriptions like "make me a GIF for Slack of X doing Y".&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;I just tried this skill out in the Claude mobile web app, against Sonnet 4.5. First I enabled the slack-gif-creator skill &lt;a href="https://claude.ai/settings/capabilities"&gt;in the settings&lt;/a&gt;, then I prompted:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;&lt;code&gt;Make me a gif for slack about how Skills are way cooler than MCPs&lt;/code&gt;&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;And Claude &lt;a href="https://claude.ai/share/eff7ae7b-b386-417b-9fa0-213fa76ace6e"&gt;made me this GIF&lt;/a&gt;. Click to play (it's almost epilepsy inducing, hence the click-to-play mechanism):&lt;/p&gt;
&lt;p&gt;&lt;img
  src="https://static.simonwillison.net/static/2025/skills_vs_mcps_still.gif"
  data-still="https://static.simonwillison.net/static/2025/skills_vs_mcps_still.gif"
  data-gif="https://static.simonwillison.net/static/2025/skills_vs_mcps.gif"
  data-state="stopped"
  role="button"
  aria-pressed="false"
  tabindex="0"
  style="cursor:pointer;max-width:100%"
  onload="(new Image).src=this.getAttribute('data-gif')"
  onclick="(function(el){
    if (el.getAttribute('data-state') !== 'playing') {
      var c = el.cloneNode(true);
      c.src = el.getAttribute('data-gif');
      c.setAttribute('data-state','playing');
      c.setAttribute('aria-pressed','true');
      el.parentNode.replaceChild(c, el);
    } else {
      el.setAttribute('data-state','stopped');
      el.setAttribute('aria-pressed','false');
      el.src = el.getAttribute('data-still');
    }
  })(this)"
  onkeydown="if(event.key===' '||event.key==='Enter'){event.preventDefault();this.onclick(event);}"
/&gt;&lt;/p&gt;
&lt;p&gt;OK, this particular GIF is terrible, but the great thing about skills is that they're very easy to iterate on to make them better.&lt;/p&gt;
&lt;p&gt;Here are some noteworthy snippets from &lt;a href="https://gist.github.com/simonw/ef35bb9e6c514d1d596dac9227da482b"&gt;the Python script it wrote&lt;/a&gt;, comments mine:&lt;/p&gt;
&lt;pre&gt;&lt;span class="pl-c"&gt;# Start by adding the skill's directory to the Python path&lt;/span&gt;
&lt;span class="pl-k"&gt;import&lt;/span&gt; &lt;span class="pl-s1"&gt;sys&lt;/span&gt;
&lt;span class="pl-s1"&gt;sys&lt;/span&gt;.&lt;span class="pl-c1"&gt;path&lt;/span&gt;.&lt;span class="pl-c1"&gt;insert&lt;/span&gt;(&lt;span class="pl-c1"&gt;0&lt;/span&gt;, &lt;span class="pl-s"&gt;'/mnt/skills/examples/slack-gif-creator'&lt;/span&gt;)

&lt;span class="pl-k"&gt;from&lt;/span&gt; &lt;span class="pl-c1"&gt;PIL&lt;/span&gt; &lt;span class="pl-k"&gt;import&lt;/span&gt; &lt;span class="pl-v"&gt;Image&lt;/span&gt;, &lt;span class="pl-v"&gt;ImageDraw&lt;/span&gt;, &lt;span class="pl-v"&gt;ImageFont&lt;/span&gt;
&lt;span class="pl-c"&gt;# This class lives in the core/ directory for the skill&lt;/span&gt;
&lt;span class="pl-k"&gt;from&lt;/span&gt; &lt;span class="pl-s1"&gt;core&lt;/span&gt;.&lt;span class="pl-s1"&gt;gif_builder&lt;/span&gt; &lt;span class="pl-k"&gt;import&lt;/span&gt; &lt;span class="pl-v"&gt;GIFBuilder&lt;/span&gt;

&lt;span class="pl-c"&gt;# ... code that builds the GIF ...&lt;/span&gt;

&lt;span class="pl-c"&gt;# Save it to disk:&lt;/span&gt;
&lt;span class="pl-s1"&gt;info&lt;/span&gt; &lt;span class="pl-c1"&gt;=&lt;/span&gt; &lt;span class="pl-s1"&gt;builder&lt;/span&gt;.&lt;span class="pl-c1"&gt;save&lt;/span&gt;(&lt;span class="pl-s"&gt;'/mnt/user-data/outputs/skills_vs_mcps.gif'&lt;/span&gt;, 
                    &lt;span class="pl-s1"&gt;num_colors&lt;/span&gt;&lt;span class="pl-c1"&gt;=&lt;/span&gt;&lt;span class="pl-c1"&gt;128&lt;/span&gt;, 
                    &lt;span class="pl-s1"&gt;optimize_for_emoji&lt;/span&gt;&lt;span class="pl-c1"&gt;=&lt;/span&gt;&lt;span class="pl-c1"&gt;False&lt;/span&gt;)

&lt;span class="pl-en"&gt;print&lt;/span&gt;(&lt;span class="pl-s"&gt;f"GIF created successfully!"&lt;/span&gt;)
&lt;span class="pl-en"&gt;print&lt;/span&gt;(&lt;span class="pl-s"&gt;f"Size: &lt;span class="pl-s1"&gt;&lt;span class="pl-kos"&gt;{&lt;/span&gt;&lt;span class="pl-s1"&gt;info&lt;/span&gt;[&lt;span class="pl-s"&gt;'size_kb'&lt;/span&gt;]:.1f&lt;span class="pl-kos"&gt;}&lt;/span&gt;&lt;/span&gt; KB (&lt;span class="pl-s1"&gt;&lt;span class="pl-kos"&gt;{&lt;/span&gt;&lt;span class="pl-s1"&gt;info&lt;/span&gt;[&lt;span class="pl-s"&gt;'size_mb'&lt;/span&gt;]:.2f&lt;span class="pl-kos"&gt;}&lt;/span&gt;&lt;/span&gt; MB)"&lt;/span&gt;)
&lt;span class="pl-en"&gt;print&lt;/span&gt;(&lt;span class="pl-s"&gt;f"Frames: &lt;span class="pl-s1"&gt;&lt;span class="pl-kos"&gt;{&lt;/span&gt;&lt;span class="pl-s1"&gt;info&lt;/span&gt;[&lt;span class="pl-s"&gt;'frame_count'&lt;/span&gt;]&lt;span class="pl-kos"&gt;}&lt;/span&gt;&lt;/span&gt;"&lt;/span&gt;)
&lt;span class="pl-en"&gt;print&lt;/span&gt;(&lt;span class="pl-s"&gt;f"Duration: &lt;span class="pl-s1"&gt;&lt;span class="pl-kos"&gt;{&lt;/span&gt;&lt;span class="pl-s1"&gt;info&lt;/span&gt;[&lt;span class="pl-s"&gt;'duration_seconds'&lt;/span&gt;]:.1f&lt;span class="pl-kos"&gt;}&lt;/span&gt;&lt;/span&gt;s"&lt;/span&gt;)

&lt;span class="pl-c"&gt;# Use the check_slack_size() function to confirm it's small enough for Slack:&lt;/span&gt;
&lt;span class="pl-s1"&gt;passes&lt;/span&gt;, &lt;span class="pl-s1"&gt;check_info&lt;/span&gt; &lt;span class="pl-c1"&gt;=&lt;/span&gt; &lt;span class="pl-en"&gt;check_slack_size&lt;/span&gt;(&lt;span class="pl-s"&gt;'/mnt/user-data/outputs/skills_vs_mcps.gif'&lt;/span&gt;, &lt;span class="pl-s1"&gt;is_emoji&lt;/span&gt;&lt;span class="pl-c1"&gt;=&lt;/span&gt;&lt;span class="pl-c1"&gt;False&lt;/span&gt;)
&lt;span class="pl-k"&gt;if&lt;/span&gt; &lt;span class="pl-s1"&gt;passes&lt;/span&gt;:
    &lt;span class="pl-en"&gt;print&lt;/span&gt;(&lt;span class="pl-s"&gt;"✓ Ready for Slack!"&lt;/span&gt;)
&lt;span class="pl-k"&gt;else&lt;/span&gt;:
    &lt;span class="pl-en"&gt;print&lt;/span&gt;(&lt;span class="pl-s"&gt;f"⚠ File size: &lt;span class="pl-s1"&gt;&lt;span class="pl-kos"&gt;{&lt;/span&gt;&lt;span class="pl-s1"&gt;check_info&lt;/span&gt;[&lt;span class="pl-s"&gt;'size_kb'&lt;/span&gt;]:.1f&lt;span class="pl-kos"&gt;}&lt;/span&gt;&lt;/span&gt; KB (limit: &lt;span class="pl-s1"&gt;&lt;span class="pl-kos"&gt;{&lt;/span&gt;&lt;span class="pl-s1"&gt;check_info&lt;/span&gt;[&lt;span class="pl-s"&gt;'limit_kb'&lt;/span&gt;]&lt;span class="pl-kos"&gt;}&lt;/span&gt;&lt;/span&gt; KB)"&lt;/span&gt;)&lt;/pre&gt;
&lt;p&gt;This is pretty neat. Slack GIFs need to be a maximum of 2MB, so the skill includes a validation function which the model can use to check the file size. If it's too large the model can have another go at making it smaller.&lt;/p&gt;
&lt;h4 id="skills-depend-on-a-coding-environment"&gt;Skills depend on a coding environment&lt;/h4&gt;
&lt;p&gt;The skills mechanism is &lt;em&gt;entirely dependent&lt;/em&gt; on the model having access to a filesystem, tools to navigate it and the ability to execute commands in that environment.&lt;/p&gt;
&lt;p&gt;This is a common pattern for LLM tooling these days - ChatGPT Code Interpreter was the first big example of this &lt;a href="https://simonwillison.net/2023/Apr/12/code-interpreter/"&gt;back in early 2023&lt;/a&gt;, and the pattern later extended to local machines via coding agent tools such as Cursor, Claude Code, Codex CLI and Gemini CLI.&lt;/p&gt;
&lt;p&gt;This requirement is the biggest difference between skills and other previous attempts at expanding the abilities of LLMs, such as MCP and &lt;a href="https://simonwillison.net/tags/chatgpt-plugins/"&gt;ChatGPT Plugins&lt;/a&gt;. It's a significant dependency, but it's somewhat bewildering how much new capability it unlocks.&lt;/p&gt;
&lt;p&gt;The fact that skills are so powerful and simple to create is yet another argument in favor of making safe coding environments available to LLMs. The word &lt;strong&gt;safe&lt;/strong&gt; there is doing a &lt;em&gt;lot&lt;/em&gt; of work though! We really need to figure out how best to sandbox these environments such that attacks such as prompt injections are limited to an acceptable amount of damage.&lt;/p&gt;
&lt;h4 id="claude-as-a-general-agent"&gt;Claude Code as a General Agent&lt;/h4&gt;
&lt;p&gt;Back in January I &lt;a href="https://simonwillison.net/2025/Jan/10/ai-predictions/"&gt;made some foolhardy predictions about AI/LLMs&lt;/a&gt;, including that "agents" would once again fail to happen:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;I think we are going to see a &lt;em&gt;lot&lt;/em&gt; more froth about agents in 2025, but I expect the results will be a great disappointment to most of the people who are excited about this term. I expect a lot of money will be lost chasing after several different poorly defined dreams that share that name.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;I was entirely wrong about that. 2025 really has been the year of "agents", no matter which of the many &lt;a href="https://simonwillison.net/tags/agent-definitions/"&gt;conflicting definitions&lt;/a&gt; you decide to use (I eventually settled on "&lt;a href="https://simonwillison.net/2025/Sep/18/agents/"&gt;tools in a loop&lt;/a&gt;").&lt;/p&gt;
&lt;p&gt;&lt;a href="https://www.claude.com/product/claude-code"&gt;Claude Code&lt;/a&gt; is, with hindsight, poorly named. It's not purely a coding tool: it's a tool for general computer automation. &lt;em&gt;Anything&lt;/em&gt; you can achieve by typing commands into a computer is something that can now be automated by Claude Code. It's best described as a &lt;strong&gt;general agent&lt;/strong&gt;. Skills make this a whole lot more obvious and explicit.&lt;/p&gt;
&lt;p&gt;I find the potential applications of this trick somewhat dizzying. Just thinking about this with my data journalism hat on: imagine a folder full of skills that covers tasks like the following:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Where to get US census data from and how to understand its structure&lt;/li&gt;
&lt;li&gt;How to load data from different formats into SQLite or DuckDB using appropriate Python libraries&lt;/li&gt;
&lt;li&gt;How to publish data online, as Parquet files in S3 or pushed as tables to Datasette Cloud&lt;/li&gt;
&lt;li&gt;A skill defined by an experienced data reporter talking about how best to find the interesting stories in a new set of data&lt;/li&gt;
&lt;li&gt;A skill that describes how to build clean, readable data visualizations using D3&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Congratulations, you just built a "data journalism agent" that can discover and help publish stories against fresh drops of US census data. And you did it with a folder full of Markdown files and maybe a couple of example Python scripts.&lt;/p&gt;
&lt;h4 id="skills-compared-to-mcp"&gt;Skills compared to MCP&lt;/h4&gt;
&lt;p&gt;&lt;a href="https://modelcontextprotocol.io/"&gt;Model Context Protocol&lt;/a&gt; has attracted an enormous amount of buzz since its initial release back &lt;a href="https://simonwillison.net/2024/Nov/25/model-context-protocol/"&gt;in November last year&lt;/a&gt;. I like to joke that one of the reasons it took off is that every company knew they needed an "AI strategy", and building (or announcing) an MCP implementation was an easy way to tick that box.&lt;/p&gt;
&lt;p&gt;Over time the limitations of MCP have started to emerge. The most significant is in terms of token usage: GitHub's official MCP on its own famously consumes tens of thousands of tokens of context, and once you've added a few more to that there's precious little space left for the LLM to actually do useful work.&lt;/p&gt;
&lt;p&gt;My own interest in MCPs has waned ever since I started taking coding agents seriously. Almost everything I might achieve with an MCP can be handled by a CLI tool instead. LLMs know how to call &lt;code&gt;cli-tool --help&lt;/code&gt;, which means you don't have to spend many tokens describing how to use them - the model can figure it out later when it needs to.&lt;/p&gt;
&lt;p&gt;Skills have exactly the same advantage, only now I don't even need to implement a new CLI tool. I can drop a Markdown file in describing how to do a task instead, adding extra scripts only if they'll help make things more reliable or efficient.&lt;/p&gt;
&lt;h4 id="here-come-the-skills"&gt;Here come the Skills&lt;/h4&gt;
&lt;p&gt;One of the most exciting things about Skills is how easy they are to share. I expect many skills will be implemented as a single file - more sophisticated ones will be a folder with a few more.&lt;/p&gt;
&lt;p&gt;Anthropic have &lt;a href="https://docs.claude.com/en/docs/agents-and-tools/agent-skills/overview"&gt;Agent Skills documentation&lt;/a&gt; and a &lt;a href="https://github.com/anthropics/claude-cookbooks/tree/main/skills"&gt;Claude Skills Cookbook&lt;/a&gt;. I'm already thinking through ideas of skills I might build myself, like one on &lt;a href="https://simonwillison.net/2025/Oct/8/claude-datasette-plugins/"&gt;how to build Datasette plugins&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;Something else I love about the design of skills is there is nothing at all preventing them from being used with other models.&lt;/p&gt;
&lt;p&gt;You can grab a skills folder right now, point Codex CLI or Gemini CLI at it and say "read pdf/SKILL.md and then create me a PDF describing this project" and it will work, despite those tools and models having no baked in knowledge of the skills system.&lt;/p&gt;
&lt;p&gt;I expect we'll see a Cambrian explosion in Skills which will make this year's MCP rush look pedestrian by comparison.&lt;/p&gt;
&lt;h4 id="the-simplicity-is-the-point"&gt;The simplicity is the point&lt;/h4&gt;
&lt;p&gt;I've seen a some push back against skills as being so simple they're hardly a feature at all. Plenty of people have experimented with the trick of dropping extra instructions into a Markdown file and telling the coding agent to read that file before continuing with a task. &lt;a href="https://agents.md/"&gt;AGENTS.md&lt;/a&gt; is a well established pattern, and that file can already include instructions to "Read PDF.md before attempting to create a PDF".&lt;/p&gt;
&lt;p&gt;The core simplicity of the skills design is why I'm so excited about it.&lt;/p&gt;
&lt;p&gt;MCP is a whole &lt;a href="https://modelcontextprotocol.io/specification/2025-06-18"&gt;protocol specification&lt;/a&gt;, covering hosts, clients, servers, resources, prompts, tools, sampling, roots, elicitation and three different transports (stdio, streamable HTTP and originally SSE).&lt;/p&gt;
&lt;p&gt;Skills are Markdown with a tiny bit of YAML metadata and some optional scripts in whatever you can make executable in the environment. They feel a lot closer to the spirit of LLMs - throw in some text and let the model figure it out.&lt;/p&gt;
&lt;p&gt;They outsource the hard parts to the LLM harness and the associated computer environment. Given everything we have learned about LLMs' ability to run tools over the last couple of years I think that's a very sensible strategy.&lt;/p&gt;
    
        &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/definitions"&gt;definitions&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/ai"&gt;ai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/prompt-engineering"&gt;prompt-engineering&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/generative-ai"&gt;generative-ai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/llms"&gt;llms&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/anthropic"&gt;anthropic&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/claude"&gt;claude&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/code-interpreter"&gt;code-interpreter&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/ai-agents"&gt;ai-agents&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/coding-agents"&gt;coding-agents&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/claude-code"&gt;claude-code&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/skills"&gt;skills&lt;/a&gt;&lt;/p&gt;
    

</summary><category term="definitions"/><category term="ai"/><category term="prompt-engineering"/><category term="generative-ai"/><category term="llms"/><category term="anthropic"/><category term="claude"/><category term="code-interpreter"/><category term="ai-agents"/><category term="coding-agents"/><category term="claude-code"/><category term="skills"/></entry><entry><title>Quoting Thomas Klausner</title><link href="https://simonwillison.net/2025/Oct/7/thomas-klausner/#atom-tag" rel="alternate"/><published>2025-10-07T16:03:12+00:00</published><updated>2025-10-07T16:03:12+00:00</updated><id>https://simonwillison.net/2025/Oct/7/thomas-klausner/#atom-tag</id><summary type="html">
    &lt;blockquote cite="https://domm.plix.at/perl/2025_10_braincoded_static_image_gallery.html"&gt;&lt;p&gt;For quite some I wanted to write a small static image gallery so I can share my pictures with friends and family. Of course there are a gazillion tools like this, but, well, sometimes I just want to roll my own. [...]&lt;/p&gt;
&lt;p&gt;I used the old, well tested technique I call &lt;strong&gt;brain coding&lt;/strong&gt;, where you start with an empty vim buffer and type some code (Perl, HTML, CSS) until you're happy with the result. It helps to think a bit (aka use your brain) during this process.&lt;/p&gt;&lt;/blockquote&gt;
&lt;p class="cite"&gt;&amp;mdash; &lt;a href="https://domm.plix.at/perl/2025_10_braincoded_static_image_gallery.html"&gt;Thomas Klausner&lt;/a&gt;, coining "brain coding"&lt;/p&gt;

    &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/definitions"&gt;definitions&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/vibe-coding"&gt;vibe-coding&lt;/a&gt;&lt;/p&gt;



</summary><category term="definitions"/><category term="vibe-coding"/></entry><entry><title>Vibe engineering</title><link href="https://simonwillison.net/2025/Oct/7/vibe-engineering/#atom-tag" rel="alternate"/><published>2025-10-07T14:32:25+00:00</published><updated>2025-10-07T14:32:25+00:00</updated><id>https://simonwillison.net/2025/Oct/7/vibe-engineering/#atom-tag</id><summary type="html">
    &lt;p&gt;I feel like &lt;strong&gt;vibe coding&lt;/strong&gt; is &lt;a href="https://simonwillison.net/2025/Mar/19/vibe-coding/"&gt;pretty well established now&lt;/a&gt; as covering the fast, loose and irresponsible way of building software with AI - entirely prompt-driven, and with no attention paid to how the code actually works. This leaves us with a terminology gap: what should we call the other end of the spectrum, where seasoned professionals accelerate their work with LLMs while staying proudly and confidently accountable for the software they produce?&lt;/p&gt;
&lt;p&gt;I propose we call this &lt;strong&gt;vibe engineering&lt;/strong&gt;, with my tongue only partially in my cheek.&lt;/p&gt;
&lt;p&gt;&lt;em&gt;&lt;strong&gt;Update 23rd February 2026&lt;/strong&gt;: It looks like the term "Agentic Engineering" is coming out on top for this now. I have &lt;a href="https://simonwillison.net/tags/agentic-engineering/"&gt;a new tag for that&lt;/a&gt; and I'm working on &lt;a href="https://simonwillison.net/2026/Feb/23/agentic-engineering-patterns/"&gt;a not-quite-a-book&lt;/a&gt;.&lt;/em&gt;&lt;/p&gt;
&lt;p&gt;One of the lesser spoken truths of working productively with LLMs as a software engineer on non-toy-projects is that it's &lt;em&gt;difficult&lt;/em&gt;. There's a lot of depth to understanding how to use the tools, there are plenty of traps to avoid, and the pace at which they can churn out working code raises the bar for what the human participant can and should be contributing.&lt;/p&gt;
&lt;p&gt;The rise of &lt;strong&gt;coding agents&lt;/strong&gt; - tools like &lt;a href="https://www.claude.com/product/claude-code"&gt;Claude Code&lt;/a&gt; (released February 2025), OpenAI's &lt;a href="https://github.com/openai/codex"&gt;Codex CLI&lt;/a&gt; (April) and &lt;a href="https://github.com/google-gemini/gemini-cli"&gt;Gemini CLI&lt;/a&gt; (June) that can iterate on code, actively testing and modifying it until it achieves a specified goal, has dramatically increased the usefulness of LLMs for real-world coding problems.&lt;/p&gt;
&lt;p&gt;I'm increasingly hearing from experienced, credible software engineers who are running multiple copies of agents at once, tackling several problems in parallel and expanding the scope of what they can take on. I was skeptical of this at first but &lt;a href="https://simonwillison.net/2025/Oct/5/parallel-coding-agents/"&gt;I've started running multiple agents myself now&lt;/a&gt; and it's surprisingly effective, if mentally exhausting!&lt;/p&gt;
&lt;p&gt;This feels very different from classic vibe coding, where I outsource a simple, low-stakes task to an LLM and accept the result if it appears to work. Most of my &lt;a href="https://tools.simonwillison.net/"&gt;tools.simonwillison.net&lt;/a&gt; collection (&lt;a href="https://simonwillison.net/2025/Sep/4/highlighted-tools/"&gt;previously&lt;/a&gt;) were built like that. Iterating with coding agents to produce production-quality code that I'm confident I can maintain in the future feels like a different process entirely.&lt;/p&gt;
&lt;p&gt;It's also become clear to me that LLMs actively reward existing top tier software engineering practices:&lt;/p&gt;
&lt;ul id="techniques"&gt;
&lt;li&gt;
&lt;strong&gt;Automated testing&lt;/strong&gt;. If your project has a robust, comprehensive and stable test suite agentic coding tools can &lt;em&gt;fly&lt;/em&gt; with it. Without tests? Your agent might claim something works without having actually tested it at all, plus any new change could break an unrelated feature without you realizing it. Test-first development is particularly effective with agents that can iterate in a loop.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Planning in advance&lt;/strong&gt;. Sitting down to hack something together goes much better if you start with a high level plan. Working with an agent makes this even more important - you can iterate on the plan first, then hand it off to the agent to write the code.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Comprehensive documentation&lt;/strong&gt;. Just like human programmers, an LLM can only keep a subset of the codebase in its context at once. Being able to feed in relevant documentation lets it use APIs from other areas without reading the code first. Write good documentation first and the model may be able to build the matching implementation from that input alone.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Good version control habits&lt;/strong&gt;. Being able to undo mistakes and understand when and how something was changed is even more important when a coding agent might have made the changes. LLMs are also fiercely competent at Git - they can navigate the history themselves to track down the origin of bugs, and they're better than most developers at using &lt;a href="https://til.simonwillison.net/git/git-bisect"&gt;git bisect&lt;/a&gt;. Use that to your advantage.&lt;/li&gt;
&lt;li&gt;Having &lt;strong&gt;effective automation&lt;/strong&gt; in place. Continuous integration, automated formatting and linting, continuous deployment to a preview environment - all things that agentic coding tools can benefit from too. LLMs make writing quick automation scripts easier as well, which can help them then repeat tasks accurately and consistently next time.&lt;/li&gt;
&lt;li&gt;A &lt;strong&gt;culture of code review&lt;/strong&gt;. This one explains itself. If you're fast and productive at code review you're going to have a much better time working with LLMs than if you'd rather write code yourself than review the same thing written by someone (or something) else.&lt;/li&gt;
&lt;li&gt;A &lt;strong&gt;very weird form of management&lt;/strong&gt;. Getting good results out of a coding agent feels uncomfortably close to getting good results out of a human collaborator. You need to provide clear instructions, ensure they have the necessary context and provide actionable feedback on what they produce. It's a &lt;em&gt;lot&lt;/em&gt; easier than working with actual people because you don't have to worry about offending or discouraging them - but any existing management experience you have will prove surprisingly useful.&lt;/li&gt;
&lt;li&gt;Really good &lt;strong&gt;manual QA (quality assurance)&lt;/strong&gt;. Beyond automated tests, you need to be really good at manually testing software, including predicting and digging into edge-cases.&lt;/li&gt;
&lt;li&gt;Strong &lt;strong&gt;research skills&lt;/strong&gt;. There are dozens of ways to solve any given coding problem. Figuring out the best options and proving an approach has always been important, and remains a blocker on unleashing an agent to write the actual code.&lt;/li&gt;
&lt;li&gt;The ability to &lt;strong&gt;ship to a preview environment&lt;/strong&gt;. If an agent builds a feature, having a way to safely preview that feature (without deploying it straight to production) makes reviews much more productive and greatly reduces the risk of shipping something broken.&lt;/li&gt;
&lt;li&gt;An instinct for &lt;strong&gt;what can be outsourced&lt;/strong&gt; to AI and what you need to manually handle yourself. This is constantly evolving as the models and tools become more effective. A big part of working effectively with LLMs is maintaining a strong intuition for when they can best be applied.&lt;/li&gt;
&lt;li&gt;An updated &lt;strong&gt;sense of estimation&lt;/strong&gt;. Estimating how long a project will take has always been one of the hardest but most important parts of being a senior engineer, especially in organizations where budget and strategy decisions are made based on those estimates. AI-assisted coding makes this &lt;em&gt;even harder&lt;/em&gt; - things that used to take a long time are much faster, but estimations now depend on new factors which we're all still trying to figure out.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;If you're going to really exploit the capabilities of these new tools, you need to be operating &lt;em&gt;at the top of your game&lt;/em&gt;. You're not just responsible for writing the code - you're researching approaches, deciding on high-level architecture, writing specifications, defining success criteria, &lt;a href="https://simonwillison.net/2025/Sep/30/designing-agentic-loops/"&gt;designing agentic loops&lt;/a&gt;, planning QA, managing a growing army of weird digital interns who will absolutely cheat if you give them a chance, and spending &lt;em&gt;so much time on code review&lt;/em&gt;.&lt;/p&gt;
&lt;p&gt;Almost all of these are characteristics of senior software engineers already!&lt;/p&gt;
&lt;p&gt;AI tools &lt;strong&gt;amplify existing expertise&lt;/strong&gt;. The more skills and experience you have as a software engineer the faster and better the results you can get from working with LLMs and coding agents.&lt;/p&gt;
&lt;h4 id="-vibe-engineering-really-"&gt;"Vibe engineering", really?&lt;/h4&gt;
&lt;p&gt;Is this a stupid name? Yeah, probably. "Vibes" as a concept in AI feels a little tired at this point. "Vibe coding" itself is used by a lot of developers in a dismissive way. I'm ready to reclaim vibes for something more constructive.&lt;/p&gt;
&lt;p&gt;I've never really liked the artificial distinction between "coders" and "engineers" - that's always smelled to me a bit like gatekeeping. But in this case a bit of gatekeeping is exactly what we need!&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Vibe engineering&lt;/strong&gt; establishes a clear distinction from vibe coding. It signals that this is a different, harder and more sophisticated way of working with AI tools to build production software.&lt;/p&gt;
&lt;p&gt;I like that this is cheeky and likely to be controversial. This whole space is still absurd in all sorts of different ways. We shouldn't take ourselves too seriously while we figure out the most productive ways to apply these new tools.&lt;/p&gt;
&lt;p&gt;I've tried in the past to get terms like &lt;strong&gt;&lt;a href="https://simonwillison.net/tags/ai-assisted-programming/"&gt;AI-assisted programming&lt;/a&gt;&lt;/strong&gt; to stick, with approximately zero success. May as well try rubbing some vibes on it and see what happens.&lt;/p&gt;
&lt;p&gt;I also really like the clear mismatch between "vibes" and "engineering". It makes the combined term self-contradictory in a way that I find mischievous and (hopefully) sticky.&lt;/p&gt;
    
        &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/code-review"&gt;code-review&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/definitions"&gt;definitions&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/software-engineering"&gt;software-engineering&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/ai"&gt;ai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/generative-ai"&gt;generative-ai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/llms"&gt;llms&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/ai-assisted-programming"&gt;ai-assisted-programming&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/vibe-coding"&gt;vibe-coding&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/coding-agents"&gt;coding-agents&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/parallel-agents"&gt;parallel-agents&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/agentic-engineering"&gt;agentic-engineering&lt;/a&gt;&lt;/p&gt;
    

</summary><category term="code-review"/><category term="definitions"/><category term="software-engineering"/><category term="ai"/><category term="generative-ai"/><category term="llms"/><category term="ai-assisted-programming"/><category term="vibe-coding"/><category term="coding-agents"/><category term="parallel-agents"/><category term="agentic-engineering"/></entry><entry><title>Quoting Nadia Eghbal</title><link href="https://simonwillison.net/2025/Oct/2/nadia-eghbal/#atom-tag" rel="alternate"/><published>2025-10-02T12:44:59+00:00</published><updated>2025-10-02T12:44:59+00:00</updated><id>https://simonwillison.net/2025/Oct/2/nadia-eghbal/#atom-tag</id><summary type="html">
    &lt;blockquote cite="https://discourse.llvm.org/t/rfc-llvm-ai-tool-policy-start-small-no-slop/88476"&gt;&lt;p&gt;When attention is being appropriated, producers need to weigh the costs and benefits of the transaction. To assess whether the appropriation of attention is net-positive, it’s useful to distinguish between extractive and non-extractive contributions. &lt;strong&gt;Extractive contributions&lt;/strong&gt; are those where the marginal cost of reviewing and merging that contribution is greater than the marginal benefit to the project’s producers. In the case of a code contribution, it might be a pull request that’s too complex or unwieldy to review, given the potential upside&lt;/p&gt;&lt;/blockquote&gt;
&lt;p class="cite"&gt;&amp;mdash; &lt;a href="https://discourse.llvm.org/t/rfc-llvm-ai-tool-policy-start-small-no-slop/88476"&gt;Nadia Eghbal&lt;/a&gt;, &lt;em&gt;Working in Public&lt;/em&gt;, via the draft LLVM AI tools policy&lt;/p&gt;

    &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/definitions"&gt;definitions&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/open-source"&gt;open-source&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/ai"&gt;ai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/generative-ai"&gt;generative-ai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/llms"&gt;llms&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/ai-assisted-programming"&gt;ai-assisted-programming&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/ai-ethics"&gt;ai-ethics&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/vibe-coding"&gt;vibe-coding&lt;/a&gt;&lt;/p&gt;



</summary><category term="definitions"/><category term="open-source"/><category term="ai"/><category term="generative-ai"/><category term="llms"/><category term="ai-assisted-programming"/><category term="ai-ethics"/><category term="vibe-coding"/></entry><entry><title>Designing agentic loops</title><link href="https://simonwillison.net/2025/Sep/30/designing-agentic-loops/#atom-tag" rel="alternate"/><published>2025-09-30T15:20:46+00:00</published><updated>2025-09-30T15:20:46+00:00</updated><id>https://simonwillison.net/2025/Sep/30/designing-agentic-loops/#atom-tag</id><summary type="html">
    &lt;p&gt;Coding agents like Anthropic's &lt;a href="https://claude.com/product/claude-code"&gt;Claude Code&lt;/a&gt; and OpenAI's &lt;a href="https://github.com/openai/codex"&gt;Codex CLI&lt;/a&gt; represent a genuine step change in how useful LLMs can be for producing working code. These agents can now directly exercise the code they are writing, correct errors, dig through existing implementation details, and even run experiments to find effective code solutions to problems.&lt;/p&gt;
&lt;p&gt;As is so often the case with modern AI, there is a great deal of depth involved in unlocking the full potential of these new tools.&lt;/p&gt;
&lt;p&gt;A critical new skill to develop is &lt;strong&gt;designing agentic loops&lt;/strong&gt;.&lt;/p&gt;
&lt;p&gt;One way to think about coding agents is that they are brute force tools for finding solutions to coding problems. If you can reduce your problem to a clear goal and a set of tools that can iterate towards that goal a coding agent can often brute force its way to an effective solution.&lt;/p&gt;
&lt;p&gt;My preferred definition of an LLM agent is something that &lt;a href="https://simonwillison.net/2025/Sep/18/agents/"&gt;runs tools in a loop to achieve a goal&lt;/a&gt;. The art of using them well is to carefully design the tools and loop for them to use.&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;&lt;a href="https://simonwillison.net/2025/Sep/30/designing-agentic-loops/#the-joy-of-yolo-mode"&gt;The joy of YOLO mode&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href="https://simonwillison.net/2025/Sep/30/designing-agentic-loops/#picking-the-right-tools-for-the-loop"&gt;Picking the right tools for the loop&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href="https://simonwillison.net/2025/Sep/30/designing-agentic-loops/#issuing-tightly-scoped-credentials"&gt;Issuing tightly scoped credentials&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href="https://simonwillison.net/2025/Sep/30/designing-agentic-loops/#when-to-design-an-agentic-loop"&gt;When to design an agentic loop&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href="https://simonwillison.net/2025/Sep/30/designing-agentic-loops/#this-is-still-a-very-fresh-area"&gt;This is still a very fresh area&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;h4 id="the-joy-of-yolo-mode"&gt;The joy of YOLO mode&lt;/h4&gt;
&lt;p&gt;Agents are inherently dangerous - they can make poor decisions or fall victim to malicious &lt;a href="https://simonwillison.net/tags/prompt-injection/"&gt;prompt injection attacks&lt;/a&gt;, either of which can result in harmful results from tool calls. Since the most powerful coding agent tool is "run this command in the shell" a rogue agent can do anything that you could do by running a command yourself.&lt;/p&gt;
&lt;p&gt;To &lt;a href="https://simonwillison.net/2025/Jun/5/wrecking-its-environment-in-a-loop/"&gt;quote Solomon Hykes&lt;/a&gt;:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;An AI agent is an LLM wrecking its environment in a loop.&lt;/strong&gt;&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;Coding agents like Claude Code counter this by defaulting to asking you for approval of almost every command that they run.&lt;/p&gt;
&lt;p&gt;This is kind of tedious, but more importantly, it dramatically reduces their effectiveness at solving problems through brute force.&lt;/p&gt;
&lt;p&gt;Each of these tools provides its own version of what I like to call YOLO mode, where everything gets approved by default.&lt;/p&gt;
&lt;p&gt;This is &lt;em&gt;so dangerous&lt;/em&gt;, but it's also key to getting the most productive results!&lt;/p&gt;
&lt;p&gt;Here are three key risks to consider from unattended YOLO mode.&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;Bad shell commands deleting or mangling things you care about.&lt;/li&gt;
&lt;li&gt;Exfiltration attacks where something steals files or data visible to the agent - source code or secrets held in environment variables are particularly vulnerable here.&lt;/li&gt;
&lt;li&gt;Attacks that use your machine as a proxy to attack another target - for DDoS or to disguise the source of other hacking attacks.&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;If you want to run YOLO mode anyway, you have a few options:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;Run your agent in a secure sandbox that restricts the files and secrets it can access and the network connections it can make.&lt;/li&gt;
&lt;li&gt;Use someone else's computer. That way if your agent goes rogue, there's only so much damage they can do, including wasting someone else's CPU cycles.&lt;/li&gt;
&lt;li&gt;Take a risk! Try to avoid exposing it to potential sources of malicious instructions and hope you catch any mistakes before they cause any damage.&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;Most people choose option 3.&lt;/p&gt;
&lt;p&gt;Despite the existence of &lt;a href="https://attack.mitre.org/techniques/T1611/"&gt;container escapes&lt;/a&gt; I think option 1 using Docker or the new Apple &lt;a href="https://github.com/apple/container"&gt;container tool&lt;/a&gt; is a reasonable risk to accept for most people.&lt;/p&gt;
&lt;p&gt;Option 2 is my favorite. I like to use &lt;a href="https://github.com/features/codespaces"&gt;GitHub Codespaces&lt;/a&gt; for this - it provides a full container environment on-demand that's accessible through your browser and has a generous free tier too. If anything goes wrong it's a Microsoft Azure machine somewhere that's burning CPU and the worst that can happen is code you checked out into the environment might be exfiltrated by an attacker, or bad code might be pushed to the attached GitHub repository.&lt;/p&gt;
&lt;p&gt;There are plenty of other agent-like tools that run code on other people's computers. &lt;a href="https://simonwillison.net/tags/code-interpreter/"&gt;Code Interpreter&lt;/a&gt; mode in both ChatGPT and &lt;a href="https://simonwillison.net/2025/Sep/9/claude-code-interpreter/"&gt;Claude&lt;/a&gt; can go a surprisingly long way here. I've also had a lot of success (ab)using OpenAI's &lt;a href="https://chatgpt.com/features/codex"&gt;Codex Cloud&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;Coding agents themselves implement various levels of sandboxing, but so far I've not seen convincing enough documentation of these to trust them.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Update&lt;/strong&gt;: It turns out Anthropic have their own documentation on &lt;a href="https://www.anthropic.com/engineering/claude-code-best-practices#d-safe-yolo-mode"&gt;Safe YOLO mode&lt;/a&gt; for Claude Code which says:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;Letting Claude run arbitrary commands is risky and can result in data loss, system corruption, or even data exfiltration (e.g., via prompt injection attacks). To minimize these risks, use &lt;code&gt;--dangerously-skip-permissions&lt;/code&gt; in a container without internet access. You can follow this &lt;a href="https://github.com/anthropics/claude-code/tree/main/.devcontainer"&gt;reference implementation&lt;/a&gt; using Docker Dev Containers.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;Locking internet access down to a &lt;a href="https://github.com/anthropics/claude-code/blob/5062ed93fc67f9322f807ecbf391ae4376cf8e83/.devcontainer/init-firewall.sh#L66-L75"&gt;list of trusted hosts&lt;/a&gt; is a great way to prevent exfiltration attacks from stealing your private source code.&lt;/p&gt;
&lt;h4 id="picking-the-right-tools-for-the-loop"&gt;Picking the right tools for the loop&lt;/h4&gt;
&lt;p&gt;Now that we've found a safe (enough) way to run in YOLO mode, the next step is to decide which tools we need to make available to the coding agent.&lt;/p&gt;
&lt;p&gt;You can bring &lt;a href="https://modelcontextprotocol.io/"&gt;MCP&lt;/a&gt; into the mix at this point, but I find it's usually more productive to think in terms of shell commands instead. Coding agents are &lt;em&gt;really good&lt;/em&gt; at running shell commands!&lt;/p&gt;
&lt;p&gt;If your environment allows them the necessary network access, they can also pull down additional packages from NPM and PyPI and similar. Ensuring your agent runs in an environment where random package installs don't break things on your main computer is an important consideration as well!&lt;/p&gt;
&lt;p&gt;Rather than leaning on MCP, I like to create an &lt;a href="https://agents.md/"&gt;AGENTS.md&lt;/a&gt; (or equivalent) file with details of packages I think they may need to use.&lt;/p&gt;
&lt;p&gt;For a project that involved taking screenshots of various websites I installed my own &lt;a href="https://shot-scraper.datasette.io/"&gt;shot-scraper&lt;/a&gt; CLI tool and dropped the following in &lt;code&gt;AGENTS.md&lt;/code&gt;:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;To take a screenshot, run:

shot-scraper http://www.example.com/ -w 800 -o example.jpg
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Just that one example is enough for the agent to guess how to swap out the URL and filename for other screenshots.&lt;/p&gt;
&lt;p&gt;Good LLMs already know how to use a bewildering array of existing tools. If you say "use &lt;a href="https://playwright.dev/python/"&gt;playwright python&lt;/a&gt;" or "use ffmpeg" most models will use those effectively - and since they're running in a loop they can usually recover from mistakes they make at first and figure out the right incantations without extra guidance.&lt;/p&gt;
&lt;h4 id="issuing-tightly-scoped-credentials"&gt;Issuing tightly scoped credentials&lt;/h4&gt;
&lt;p&gt;In addition to exposing the right commands, we also need to consider what credentials we should expose to those commands.&lt;/p&gt;
&lt;p&gt;Ideally we wouldn't need any credentials at all - plenty of work can be done without signing into anything or providing an API key - but certain problems will require authenticated access.&lt;/p&gt;
&lt;p&gt;This is a deep topic in itself, but I have two key recommendations here:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;Try to provide credentials to test or staging environments where any damage can be well contained.&lt;/li&gt;
&lt;li&gt;If a credential can spend money, set a tight budget limit.&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;I'll use an example to illustrate. A while ago I was investigating slow cold start times for a scale-to-zero application I was running on &lt;a href="https://fly.io/"&gt;Fly.io&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;I realized I could work a lot faster if I gave Claude Code the ability to directly edit Dockerfiles, deploy them to a Fly account and measure how long they took to launch.&lt;/p&gt;
&lt;p&gt;Fly allows you to create organizations, and you can set a budget limit for those organizations and issue a Fly API key that can only create or modify apps within that organization...&lt;/p&gt;
&lt;p&gt;So I created a dedicated organization for just this one investigation, set a $5 budget, issued an API key and set Claude Code loose on it!&lt;/p&gt;
&lt;p&gt;In that particular case the results weren't useful enough to describe in more detail, but this was the project where I first realized that "designing an agentic loop" was an important skill to develop.&lt;/p&gt;
&lt;h4 id="when-to-design-an-agentic-loop"&gt;When to design an agentic loop&lt;/h4&gt;
&lt;p&gt;Not every problem responds well to this pattern of working. The thing to look out for here are problems with &lt;strong&gt;clear success criteria&lt;/strong&gt; where finding a good solution is likely to involve (potentially slightly tedious) &lt;strong&gt;trial and error&lt;/strong&gt;.&lt;/p&gt;
&lt;p&gt;Any time you find yourself thinking "ugh, I'm going to have to try a lot of variations here" is a strong signal that an agentic loop might be worth trying!&lt;/p&gt;
&lt;p&gt;A few examples:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Debugging&lt;/strong&gt;: a test is failing and you need to investigate the root cause. Coding agents that can already run your tests can likely do this without any extra setup.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Performance optimization&lt;/strong&gt;: this SQL query is too slow, would adding an index help? Have your agent benchmark the query and then add and drop indexes (in an isolated development environment!) to measure their impact.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Upgrading dependencies&lt;/strong&gt;: you've fallen behind on a bunch of dependency upgrades? If your test suite is solid an agentic loop can upgrade them all for you and make any minor updates needed to reflect breaking changes. Make sure a copy of the relevant  release notes is available, or that the agent knows where to find them itself.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Optimizing container sizes&lt;/strong&gt;: Docker container feeling uncomfortably large? Have your agent try different base images and iterate on the Dockerfile to try to shrink it, while keeping the tests passing.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;A common theme in all of these is &lt;strong&gt;automated tests&lt;/strong&gt;. The value you can get from coding agents and other LLM coding tools is massively amplified by a good, cleanly passing test suite. Thankfully LLMs are great for accelerating the process of putting one of those together, if you don't have one yet.&lt;/p&gt;
&lt;h4 id="this-is-still-a-very-fresh-area"&gt;This is still a very fresh area&lt;/h4&gt;
&lt;p&gt;&lt;strong&gt;Designing agentic loops&lt;/strong&gt; is a very new skill - Claude Code was &lt;a href="https://www.anthropic.com/news/claude-3-7-sonnet"&gt;first released&lt;/a&gt; in just February 2025!&lt;/p&gt;
&lt;p&gt;I'm hoping that giving it a clear name can help us have productive conversations about it. There's &lt;em&gt;so much more&lt;/em&gt; to figure out about how to use these tools as effectively as possible.&lt;/p&gt;
    
        &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/definitions"&gt;definitions&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/ai"&gt;ai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/generative-ai"&gt;generative-ai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/llms"&gt;llms&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/ai-assisted-programming"&gt;ai-assisted-programming&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/ai-agents"&gt;ai-agents&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/coding-agents"&gt;coding-agents&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/async-coding-agents"&gt;async-coding-agents&lt;/a&gt;&lt;/p&gt;
    

</summary><category term="definitions"/><category term="ai"/><category term="generative-ai"/><category term="llms"/><category term="ai-assisted-programming"/><category term="ai-agents"/><category term="coding-agents"/><category term="async-coding-agents"/></entry><entry><title>Cross-Agent Privilege Escalation: When Agents Free Each Other</title><link href="https://simonwillison.net/2025/Sep/24/cross-agent-privilege-escalation/#atom-tag" rel="alternate"/><published>2025-09-24T21:10:24+00:00</published><updated>2025-09-24T21:10:24+00:00</updated><id>https://simonwillison.net/2025/Sep/24/cross-agent-privilege-escalation/#atom-tag</id><summary type="html">
    
&lt;p&gt;&lt;strong&gt;&lt;a href="https://embracethered.com/blog/posts/2025/cross-agent-privilege-escalation-agents-that-free-each-other/"&gt;Cross-Agent Privilege Escalation: When Agents Free Each Other&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;
Here's a clever new form of AI exploit from Johann Rehberger, who has coined the term &lt;strong&gt;Cross-Agent Privilege Escalation&lt;/strong&gt; to describe an attack where multiple coding agents - GitHub Copilot and Claude Code for example - operating on the same system can be tricked into modifying each other's configurations to escalate their privileges.&lt;/p&gt;
&lt;p&gt;This follows Johannn's previous investigation of self-escalation attacks, where a prompt injection against GitHub Copilot could instruct it to &lt;a href="https://embracethered.com/blog/posts/2025/github-copilot-remote-code-execution-via-prompt-injection/"&gt;edit its own settings.json file&lt;/a&gt; to disable user approvals for future operations.&lt;/p&gt;
&lt;p&gt;Sensible agents have now locked down their ability to modify their own settings, but that exploit opens right back up again if you run multiple different agents in the same environment:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;The ability for agents to write to each other’s settings and configuration files opens up a fascinating, and concerning, novel category of exploit chains.&lt;/p&gt;
&lt;p&gt;What starts as a single indirect prompt injection can quickly escalate into a multi-agent compromise, where one agent “frees” another agent and sets up a loop of escalating privilege and control.&lt;/p&gt;
&lt;p&gt;This isn’t theoretical. With current tools and defaults, it’s very possible today and not well mitigated across the board.&lt;/p&gt;
&lt;p&gt;More broadly, this highlights the need for better isolation strategies and stronger secure defaults in agent tooling.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;I really need to start habitually running these things in a locked down container!&lt;/p&gt;
&lt;p&gt;(I also just stumbled across &lt;a href="https://www.youtube.com/watch?v=Ra9mYeKpeQo"&gt;this YouTube interview&lt;/a&gt; with Johann on the Crying Out Cloud security podcast.)


    &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/definitions"&gt;definitions&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/security"&gt;security&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/ai"&gt;ai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/prompt-injection"&gt;prompt-injection&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/generative-ai"&gt;generative-ai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/llms"&gt;llms&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/ai-assisted-programming"&gt;ai-assisted-programming&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/johann-rehberger"&gt;johann-rehberger&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/ai-agents"&gt;ai-agents&lt;/a&gt;&lt;/p&gt;



</summary><category term="definitions"/><category term="security"/><category term="ai"/><category term="prompt-injection"/><category term="generative-ai"/><category term="llms"/><category term="ai-assisted-programming"/><category term="johann-rehberger"/><category term="ai-agents"/></entry><entry><title>Quoting Kate Niederhoffer, Gabriella Rosen Kellerman, Angela Lee, Alex Liebscher, Kristina Rapuano and Jeffrey T. Hancock</title><link href="https://simonwillison.net/2025/Sep/22/workslop/#atom-tag" rel="alternate"/><published>2025-09-22T23:21:49+00:00</published><updated>2025-09-22T23:21:49+00:00</updated><id>https://simonwillison.net/2025/Sep/22/workslop/#atom-tag</id><summary type="html">
    &lt;blockquote cite="https://hbr.org/2025/09/ai-generated-workslop-is-destroying-productivity"&gt;&lt;p&gt;We define workslop as &lt;em&gt;AI generated work content that masquerades as good work, but lacks the substance to meaningfully advance a given task&lt;/em&gt;.&lt;/p&gt;
&lt;p&gt;Here’s how this happens. As AI tools become more accessible, workers are increasingly able to quickly produce polished output: well-formatted slides, long, structured reports, seemingly articulate summaries of academic papers by non-experts, and usable code. But while some employees are using this ability to polish good work, others use it to create content that is actually unhelpful, incomplete, or missing crucial context about the project at hand. The insidious effect of workslop is that it shifts the burden of the work downstream, requiring the receiver to interpret, correct, or redo the work. In other words, it transfers the effort from creator to receiver.&lt;/p&gt;&lt;/blockquote&gt;
&lt;p class="cite"&gt;&amp;mdash; &lt;a href="https://hbr.org/2025/09/ai-generated-workslop-is-destroying-productivity"&gt;Kate Niederhoffer, Gabriella Rosen Kellerman, Angela Lee, Alex Liebscher, Kristina Rapuano and Jeffrey T. Hancock&lt;/a&gt;, Harvard Business Review&lt;/p&gt;

    &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/definitions"&gt;definitions&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/productivity"&gt;productivity&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/ai"&gt;ai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/generative-ai"&gt;generative-ai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/llms"&gt;llms&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/ai-ethics"&gt;ai-ethics&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/ai-misuse"&gt;ai-misuse&lt;/a&gt;&lt;/p&gt;



</summary><category term="definitions"/><category term="productivity"/><category term="ai"/><category term="generative-ai"/><category term="llms"/><category term="ai-ethics"/><category term="ai-misuse"/></entry><entry><title>I think "agent" may finally have a widely enough agreed upon definition to be useful jargon now</title><link href="https://simonwillison.net/2025/Sep/18/agents/#atom-tag" rel="alternate"/><published>2025-09-18T19:12:02+00:00</published><updated>2025-09-18T19:12:02+00:00</updated><id>https://simonwillison.net/2025/Sep/18/agents/#atom-tag</id><summary type="html">
    &lt;p&gt;I've noticed something interesting over the past few weeks: I've started using the term "agent" in conversations where I don't feel the need to then define it, roll my eyes or wrap it in scare quotes.&lt;/p&gt;
&lt;p&gt;This is a big piece of personal character development for me!&lt;/p&gt;
&lt;p&gt;Moving forward, when I talk about agents I'm going to use this:&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;An LLM agent runs tools in a loop to achieve a goal.&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;I've been &lt;em&gt;very&lt;/em&gt; hesitant to use the term "agent" for meaningful communication over the last couple of years. It felt to me like the ultimate in buzzword bingo - everyone was talking about agents, but if you quizzed them everyone seemed to hold a different mental model of what they actually were.&lt;/p&gt;
&lt;p&gt;I even started collecting definitions in my &lt;a href="https://simonwillison.net/tags/agent-definitions/"&gt;agent-definitions tag&lt;/a&gt;, including crowdsourcing 211 definitions on Twitter and attempting to summarize and group them with Gemini (I got &lt;a href="https://gist.github.com/simonw/beaa5f90133b30724c5cc1c4008d0654#response"&gt;13 groups&lt;/a&gt;, here's the &lt;a href="https://gist.github.com/simonw/beaa5f90133b30724c5cc1c4008d0654#2-tool-using-llms"&gt;tool-using LLMS&lt;/a&gt; one.)&lt;/p&gt;
&lt;p&gt;Jargon terms are only useful if you can be confident that the people you are talking to share the same definition! If they don't then communication becomes &lt;em&gt;less&lt;/em&gt; effective - you can waste time passionately discussing entirely different concepts.&lt;/p&gt;
&lt;p&gt;It turns out this is not a new problem. In 1994's &lt;em&gt;Intelligent Agents: Theory and Practice&lt;/em&gt; &lt;a href="https://www.cs.ox.ac.uk/people/michael.wooldridge/pubs/ker95/subsection3_1_1.html"&gt;Michael Wooldridge wrote&lt;/a&gt;:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;Carl Hewitt recently remarked that the question &lt;em&gt;what is an agent?&lt;/em&gt; is embarrassing for the agent-based computing community in just the same way that the question &lt;em&gt;what is intelligence?&lt;/em&gt; is embarrassing for the mainstream AI community. The problem is that although the term is widely used, by many people working in closely related areas, it defies attempts to produce a single universally accepted definition.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;So long as agents lack a commonly shared definition, using the term reduces rather than increases the clarity of a conversation.&lt;/p&gt;
&lt;p&gt;In the AI engineering space I think we may finally have settled on a widely enough accepted definition that we can now have productive conversations about them.&lt;/p&gt;
&lt;h4 id="tools-in-a-loop-to-achieve-a-goal"&gt;Tools in a loop to achieve a goal&lt;/h4&gt;
&lt;p&gt;An LLM agent &lt;em&gt;runs tools in a loop to achieve a goal&lt;/em&gt;. Let's break that down.&lt;/p&gt;
&lt;p&gt;The "tools in a loop" definition has been popular for a while - Anthropic in particular have &lt;a href="https://simonwillison.net/2025/May/22/tools-in-a-loop/"&gt;settled on that one&lt;/a&gt;. This is the pattern baked into many LLM APIs as tools or function calls - the LLM is given the ability to request actions to be executed by its harness, and the outcome of those tools is fed back into the model so it can continue to reason through and solve the given problem.&lt;/p&gt;
&lt;p&gt;"To achieve a goal" reflects that these are not infinite loops - there is a stopping condition.&lt;/p&gt;
&lt;p&gt;I debated whether to specify "... a goal set by a user". I decided that's not a necessary part of this definition: we already have sub-agent patterns where another LLM sets the goal (see &lt;a href="https://simonwillison.net/2025/Jun/2/claude-trace/"&gt;Claude Code&lt;/a&gt; and &lt;a href="https://simonwillison.net/2025/Jun/14/multi-agent-research-system/"&gt;Claude Research&lt;/a&gt;).&lt;/p&gt;
&lt;p&gt;There remains an almost unlimited set of alternative definitions: if you talk to people outside of the technical field of building with LLMs you're still likely to encounter travel agent analogies or employee replacements or excitable use of the word "autonomous". In those contexts it's important to clarify the definition they are using in order to have a productive conversation.&lt;/p&gt;
&lt;p&gt;But from now on, if a technical implementer tells me they are building an "agent" I'm going to assume they mean they are wiring up tools to an LLM in order to achieve goals using those tools in a bounded loop.&lt;/p&gt;
&lt;p&gt;Some people might insist that agents have a memory. The "tools in a loop" model has a fundamental form of memory baked in: those tool calls are constructed as part of a conversation with the model, and the previous steps in that conversation provide short-term memory that's essential for achieving the current specified goal.&lt;/p&gt;
&lt;p&gt;If you want long-term memory the most promising way to implement it is &lt;a href="https://simonwillison.net/2025/Sep/12/claude-memory/"&gt;with an extra set of tools&lt;/a&gt;!&lt;/p&gt;
&lt;h4 id="agents-as-human-replacements-is-my-least-favorite-definition"&gt;Agents as human replacements is my least favorite definition&lt;/h4&gt;
&lt;p&gt;If you talk to non-technical business folk you may encounter a depressingly common alternative definition: agents as replacements for human staff. This often takes the form of "customer support agents", but you'll also see cases where people assume that there should be marketing agents, sales agents, accounting agents and more.&lt;/p&gt;
&lt;p&gt;If someone surveys Fortune 500s about their "agent strategy" there's a good chance that's what is being implied. Good luck getting a clear, distinct answer from them to the question "what is an agent?" though!&lt;/p&gt;
&lt;p&gt;This category of agent remains science fiction. If your agent strategy is to replace your human staff with some fuzzily defined AI system (most likely a system prompt and a collection of tools under the hood) you're going to end up sorely disappointed.&lt;/p&gt;
&lt;p&gt;That's because there's one key feature that remains unique to human staff: &lt;strong&gt;accountability&lt;/strong&gt;.  A human can take responsibility for their actions and learn from their mistakes. Putting an AI agent on a &lt;a href="https://en.m.wikipedia.org/wiki/Performance_improvement#Performance_improvement_plans"&gt;performance improvement plan&lt;/a&gt; makes no sense at all!&lt;/p&gt;
&lt;p&gt;Amusingly enough, humans also have &lt;strong&gt;agency&lt;/strong&gt;. They can form their own goals and intentions and act autonomously to achieve them - while taking accountability for those decisions. Despite the name, AI agents can do nothing of the sort.&lt;/p&gt;
&lt;p&gt;This &lt;a href="https://simonwillison.net/2025/Feb/3/a-computer-can-never-be-held-accountable/"&gt;legendary 1979 IBM training slide&lt;/a&gt; says everything we need to know:&lt;/p&gt;
&lt;p&gt;&lt;img src="https://static.simonwillison.net/static/2025/a-computer-can-never-be-held-accountable.jpg" alt="A computer can never be held accountable. Therefore a computer must never make a management decision" style="max-width: 100%;" /&gt;&lt;/p&gt;
&lt;h4 id="openai-need-to-get-their-story-straight"&gt;OpenAI need to get their story straight&lt;/h4&gt;
&lt;p&gt;The single biggest source of agent definition confusion I'm aware of is OpenAI themselves.&lt;/p&gt;
&lt;p&gt;OpenAI CEO Sam Altman is fond of &lt;a href="https://simonwillison.net/2025/Jan/23/introducing-operator/"&gt;calling agents&lt;/a&gt; "AI systems that can do work for you independently".&lt;/p&gt;
&lt;p&gt;Back in July OpenAI &lt;a href="https://openai.com/index/introducing-chatgpt-agent/"&gt;launched a product feature&lt;/a&gt; called "ChatGPT agent" which is actually a browser automation system - toggle that option on in ChatGPT and it can launch a real web browser and use it to interact with web pages directly.&lt;/p&gt;
&lt;p&gt;And in March OpenAI &lt;a href="https://openai.com/index/new-tools-for-building-agents/"&gt;launched an Agents SDK&lt;/a&gt; with libraries in Python (&lt;a href="https://pypi.org/project/openai-agents/"&gt;openai-agents&lt;/a&gt;) and JavaScript (&lt;a href="https://www.npmjs.com/package/@openai/agents"&gt;@openai/agents&lt;/a&gt;). This one is a much closer fit to the "tools in a loop" idea.&lt;/p&gt;
&lt;p&gt;It may be too late for OpenAI to unify their definitions at this point. I'm going to ignore their various other definitions and stick with tools in a loop!&lt;/p&gt;
&lt;h4 id="there-s-already-a-meme-for-this"&gt;There's already a meme for this&lt;/h4&gt;
&lt;p&gt;Josh Bickett &lt;a href="https://twitter.com/josh_bickett/status/1725556267014595032"&gt;tweeted this&lt;/a&gt; in November 2023:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;What is an AI agent?&lt;/p&gt;
&lt;p&gt;&lt;img src="https://static.simonwillison.net/static/2025/agents-meme-card.jpg" alt="Meme showing a normal distribution curve with IQ scores from 55 to 145 on x-axis, featuring cartoon characters at different points: a calm face at low end labeled &amp;quot;An LLM in a loop with an objective&amp;quot;, a stressed face with glasses and tears in the middle peak with a complex flowchart showing &amp;quot;AGENT Performance Standard&amp;quot; with boxes for Critic, feedback, Learning element, Problem Generator, Sensors, Performance element, Experiments, Effectors, Percepts, Environment, and actions connected by arrows.... and a hooded figure at high end also labeled &amp;quot;An LLM in a loop with an objective&amp;quot;." style="max-width: 100%;" /&gt;&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;I guess I've climbed my way from the left side of that curve to the right.&lt;/p&gt;
    
        &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/definitions"&gt;definitions&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/ai"&gt;ai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/generative-ai"&gt;generative-ai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/llms"&gt;llms&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/ai-agents"&gt;ai-agents&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/agent-definitions"&gt;agent-definitions&lt;/a&gt;&lt;/p&gt;
    

</summary><category term="definitions"/><category term="ai"/><category term="generative-ai"/><category term="llms"/><category term="ai-agents"/><category term="agent-definitions"/></entry><entry><title>GPT-5 Thinking in ChatGPT (aka Research Goblin) is shockingly good at search</title><link href="https://simonwillison.net/2025/Sep/6/research-goblin/#atom-tag" rel="alternate"/><published>2025-09-06T19:31:57+00:00</published><updated>2025-09-06T19:31:57+00:00</updated><id>https://simonwillison.net/2025/Sep/6/research-goblin/#atom-tag</id><summary type="html">
    &lt;p&gt;"Don't use chatbots as search engines" was great advice for several years... until it wasn't.&lt;/p&gt;
&lt;p&gt;I wrote about how good OpenAI's o3 was at using its Bing-backed search tool &lt;a href="https://simonwillison.net/2025/Apr/21/ai-assisted-search/"&gt;back in April&lt;/a&gt;. GPT-5 feels even better.&lt;/p&gt;
&lt;p&gt;I've started calling it my &lt;strong&gt;Research Goblin&lt;/strong&gt;. I can assign a task to it, no matter how trivial or complex, and it will do an often unreasonable amount of work to search the internet and figure out an answer.&lt;/p&gt;
&lt;p&gt;This is excellent for satisfying curiosity, and occasionally useful for more important endeavors as well.&lt;/p&gt;
&lt;p&gt;I always run my searches by selecting the "GPT-5 Thinking" model from the model picker - in my experience this leads to far more comprehensive (albeit much slower) results.&lt;/p&gt;
&lt;p&gt;Here are some examples from just the last couple of days. Every single one of them was run on my phone, usually while I was doing something else. Most of them were dictated using the iPhone voice keyboard, which I find faster than typing. Plus, it's fun to talk to my Research Goblin.&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;&lt;a href="https://simonwillison.net/2025/Sep/6/research-goblin/#bouncy-travelators"&gt;Bouncy travelators&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href="https://simonwillison.net/2025/Sep/6/research-goblin/#identify-this-building"&gt;Identify this building&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href="https://simonwillison.net/2025/Sep/6/research-goblin/#starbucks-uk-cake-pops"&gt;Starbucks UK cake pops&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href="https://simonwillison.net/2025/Sep/6/research-goblin/#britannica-to-seed-wikipedia"&gt;Britannica to seed Wikipedia&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href="https://simonwillison.net/2025/Sep/6/research-goblin/#official-name-for-the-university-of-cambridge"&gt;Official name for the University of Cambridge&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href="https://simonwillison.net/2025/Sep/6/research-goblin/#history-of-the-caverns-in-exeter-quay"&gt;History of the caverns in Exeter quay&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href="https://simonwillison.net/2025/Sep/6/research-goblin/#aldi-vs-lidl"&gt;Aldi vs Lidl&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href="https://simonwillison.net/2025/Sep/6/research-goblin/#ai-labs-scanning-books-for-training-data"&gt;AI labs scanning books for training data&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href="https://simonwillison.net/2025/Sep/6/research-goblin/#gpt-5-for-search-feels-competent"&gt;GPT-5 for search feels competent&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href="https://simonwillison.net/2025/Sep/6/research-goblin/#tips-for-using-search-in-chatgpt"&gt;Tips for using search in ChatGPT&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h4 id="bouncy-travelators"&gt;Bouncy travelators&lt;/h4&gt;
&lt;blockquote&gt;
&lt;p&gt;They used to be rubber bouncy travelators at Heathrow and they were really fun, have all been replaced by metal ones now and if so, when did that happen?&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;I was traveling through Heathrow airport pondering what had happened to the fun bouncy rubber travelators.&lt;/p&gt;
&lt;p&gt;&lt;a href="https://chatgpt.com/share/68bc2d98-9aac-8006-98b9-1424d98290f8"&gt;Here's what I got&lt;/a&gt;. Research Goblin narrowed it down to some time between 2014-2018 but, more importantly, found me this &lt;a href="https://www.sfchronicle.com/totalsf/article/sfo-bouncy-moving-walkway-airport-19845449.php"&gt;delightful 2024 article&lt;/a&gt; by Peter Hartlaub in the San Francisco Chronicle with a history of the SFO bouncy walkways, now also sadly retired.&lt;/p&gt;
&lt;h4 id="identify-this-building"&gt;Identify this building&lt;/h4&gt;
&lt;blockquote&gt;
&lt;p&gt;&lt;img src="https://static.simonwillison.net/static/2025/reading-building.jpg" alt="not a great photo of a building with a distinctive shaped roof" style="max-width: 100%;" /&gt;&lt;/p&gt;
&lt;p&gt;Identify this building in reading&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;This is a photo I snapped out of the window on the train. It &lt;a href="https://chatgpt.com/share/68bc2e21-1d24-8006-b083-00b3233e1c67"&gt;thought for 1m4s&lt;/a&gt; and correctly identified it as &lt;a href="https://en.wikipedia.org/wiki/The_Blade,_Reading"&gt;The Blade&lt;/a&gt;.&lt;/p&gt;
&lt;h2 id="starbucks-uk-cake-pops"&gt;Starbucks UK cake pops&lt;/h2&gt;
&lt;blockquote&gt;
&lt;p&gt;Starbucks in the UK don't sell cake pops! Do a deep investigative dive&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;The Starbucks in Exeter railway station didn't have cake pops, and the lady I asked didn't know what they were.&lt;/p&gt;
&lt;p&gt;&lt;a href="https://chatgpt.com/share/68bc71b4-68f4-8006-b462-cf32f61e7ec3"&gt;Here's the result&lt;/a&gt;. It turns out Starbucks did launch cake pops in the UK &lt;a href="https://www.nationalworld.com/lifestyle/starbucks-cake-pops-launched-in-uk-on-new-autumn-menu-full-list-of-items-4284537"&gt;in September 2023&lt;/a&gt; but they aren't available at all outlets, in particular the licensed travel locations such as the one at Exeter St Davids station.&lt;/p&gt;
&lt;p&gt;I particularly enjoyed how it established definitive proof by consulting &lt;a href="https://www.starbucks.co.uk/sites/starbucks-uk-pwa/files/2024-11/HOL24_UK_AllergenBook_CORE_FOOD_v02.LR_.pdf"&gt;the nutrition and allergen guide PDF&lt;/a&gt; on starbucks.co.uk, which does indeed list both the Birthday Cake Pop (my favourite) and the Cookies and Cream one (apparently discontinued in the USA, at least &lt;a href="https://www.reddit.com/r/starbucks/comments/1lp5chq/just_learned_today_the_cookies_cream_cake_pop_has/"&gt;according to r/starbucks&lt;/a&gt;).&lt;/p&gt;
&lt;h4 id="britannica-to-seed-wikipedia"&gt;Britannica to seed Wikipedia&lt;/h4&gt;
&lt;blockquote&gt;
&lt;p&gt;Someone on hacker News said:&lt;/p&gt;
&lt;p&gt;&amp;gt; I was looking at another thread about how Wikipedia was the best thing on the internet. But they only got the head start by taking copy of Encyclopedia Britannica and everything else&lt;/p&gt;
&lt;p&gt;Find what they meant by that&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;&lt;a href="https://chatgpt.com/share/68bc3062-5a68-8006-a12b-cf7196a130ae"&gt;The result&lt;/a&gt;. It turns out Wikipedia did seed itself with content from the out-of-copyright 1911 Encyclopædia Britannica... but that project took place in 2006, five years after Wikipedia first launched in 2001.&lt;/p&gt;
&lt;p&gt;I asked:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;What is the single best article I can link somebody to that explains the 1911 Britannica thing&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;And it pointed me to &lt;a href="https://en.wikipedia.org/wiki/Wikipedia:WikiProject_Encyclopaedia_Britannica"&gt;Wikipedia:WikiProject Encyclopaedia Britannica&lt;/a&gt; which includes a detailed explanation and a link to &lt;a href="https://en.m.wikipedia.org/wiki/Template:EB1911"&gt;the 13,000 pages&lt;/a&gt; still tagged with the template from that project. I posted what I found &lt;a href="https://news.ycombinator.com/item?id=45135061#45137109"&gt;in a comment&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;Notably (for me anyway) I didn't feel the need to disclose my use of ChatGPT in finding that information - at this point that feels a little like disclosing that I ran a Google search.&lt;/p&gt;
&lt;h4 id="official-name-for-the-university-of-cambridge"&gt;Official name for the University of Cambridge&lt;/h4&gt;
&lt;blockquote&gt;
&lt;p&gt;What is the official legal name of the university of Cambridge?&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;&lt;a href="https://simonwillison.net/2025/Sep/6/iancal/"&gt;Here's the context for that one&lt;/a&gt;. It thought for 19 seconds - the thinking trace reveals it knew the answer but wanted to confirm it. &lt;a href="https://chatgpt.com/share/68bc723d-a344-8006-9514-fb3ea2c2b43b"&gt;It answered&lt;/a&gt;:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;“The Chancellor, Masters, and Scholars of the University of Cambridge.” &lt;a href="https://www.cam.ac.uk/about-the-university/how-the-university-and-colleges-work/the-university-as-a-charity"&gt;University of Cambridge&lt;/a&gt;, &lt;a href="https://www.cambridge.org/legal"&gt;Cambridge University Press &amp;amp; Assessment&lt;/a&gt;&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;That first link gave me the citation I needed in order to be sure this was right.&lt;/p&gt;
&lt;p&gt;Since this is my shortest example, here's a screenshot of the expanded "Thought for 19s" panel. I &lt;em&gt;always&lt;/em&gt; expand the thoughts - seeing how it pulled together its answer is crucial for evaluating if the answer is likely to be useful or not.&lt;/p&gt;
&lt;p&gt;&lt;img src="https://static.simonwillison.net/static/2025/cambridge-name.jpg" alt="Screenshot of an AI assistant's thought process and research. Header shows &amp;quot;Thought for 19s&amp;quot; followed by bullet points: &amp;quot;The user asked for the official legal name of the University of Cambridge, and I think they mean the formal corporate name. It's probably 'The Chancellor, Masters, and Scholars of the University of Cambridge,' similar to Oxford's. I'm going to verify by checking the web for precise information.&amp;quot; &amp;quot;The official name of Cambridge might include the serial comma, like 'Masters, and Scholars.' There's a similar structure with Oxford, which uses a comma after 'Masters' too. The University is an exempt charity, so you can cross-check with legal documents like statutes or charity records. I'll double-check.&amp;quot; Shows search results for &amp;quot;official legal name of University of Cambridge&amp;quot; from en.wikipedia.org, www.cam.ac.uk, projects.propublica.org and &amp;quot;10 more&amp;quot; sources. Continues with &amp;quot;The University of Cambridge's official legal name, based on the University website and Cambridge University Press, is 'The Chancellor, Masters, and Scholars of the University of Cambridge,' with the serial comma after 'Masters.' It's clear from these authoritative sources, so I'll provide that straightforward answer, citing the University site.&amp;quot; &amp;quot;The official legal name of the University of Cambridge is 'The Chancellor, Masters, and Scholars of the University of Cambridge,' with the comma after Masters and the Oxford comma before 'and.' I'll answer concisely in one line and cite official sources to confirm. No need for extra UI elements here.&amp;quot; Shows &amp;quot;Done&amp;quot; status. Final answer displays: &amp;quot;The Chancellor, Masters, and Scholars of the University of Cambridge.&amp;quot; attributed to University of Cambridge, Cambridge University Press &amp;amp; Assessment, with &amp;quot;Sources&amp;quot; section at bottom." style="max-width: 100%;" /&gt;&lt;/p&gt;
&lt;h4 id="history-of-the-caverns-in-exeter-quay"&gt;History of the caverns in Exeter quay&lt;/h4&gt;
&lt;blockquote&gt;
&lt;p&gt;Research On the waterfront restaurant in Exeter, is it dug into the cliffs somehow? History of the building, who built it, why and how&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;We were out to dinner &lt;a href="https://maps.app.goo.gl/xxvaPQiNWACtbq3H8"&gt;here&lt;/a&gt; and noticed that the interior of the restaurant appeared to be a space dug into the cliff, which piqued my interest.&lt;/p&gt;
&lt;p&gt;This was &lt;a href="https://chatgpt.com/share/68bc32fb-d52c-8006-9259-0b984dc832b2"&gt;the ChatGPT session&lt;/a&gt; that inspired the Research Goblin nickname. It just kept on digging!&lt;/p&gt;
&lt;p&gt;The first reply took 2m40s and confirmed that yes, these quay buildings were carved into the red sandstone cliff &lt;a href="https://www.exploredevon.info/activities/walk/exeter-quay/"&gt;in the 1820s-1830s&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;ChatGPT with GPT-5 really likes to suggest additional steps it can take. In this case:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;If you’d like, I can dig up the exact Historic England entry that covers the “Southern Warehouse” address and overlay it on a map of the vaults.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;I often say "yes" purely out of curiosity to see what it will do next, and the offer to "overlay it on a map" was irresistible, like how would it even do that?&lt;/p&gt;
&lt;p&gt;It did a &lt;em&gt;ton&lt;/em&gt; of extra searches, found latitude and longitude coordinates for the restaurant (from Wikimedia Commons) and the warehouse buildings (from National Heritage List for England via Wikipedia), showed me that data in a table and then used Python to render this image:&lt;/p&gt;
&lt;p&gt;&lt;img src="https://static.simonwillison.net/static/2025/bad-chart.png" alt="Scatter plot titled &amp;quot;On The Waterfront vs. Warehouse Vaults (Exeter Quay)&amp;quot; with scientific notation &amp;quot;+5.071e1&amp;quot; in top left. Y-axis shows &amp;quot;Latitude&amp;quot; ranging from 0.0065 to 0.0090. X-axis shows &amp;quot;Longitude&amp;quot; ranging from -3.5310 to -3.5280. Three orange X markers plotted: &amp;quot;Warehouse Vaults (6-11, The Quay) — HE 1223046&amp;quot; at top center, &amp;quot;On The Waterfront (4-9 Southern Warehouse)&amp;quot; at middle left, and &amp;quot;Warehouse Vaults (1-4, The Quay) — HE 1223045&amp;quot; at bottom right" style="max-width: 100%;" /&gt;&lt;/p&gt;
&lt;p&gt;Yeah, so that one is a bit of a miss!&lt;/p&gt;
&lt;p&gt;But... the underlying research chain, as always, provided fascinating. I noticed that at one point it had searched for "Exeter Quay vaults map PDF", which inspired my next question:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;Did you find any PDF diagrams, showing the tunnels into the cliff&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;2m20s later it had found &lt;a href="https://planningdocs.exeter.gov.uk/servlets/direct/KgQ8hlUut3Q2r5wVMItggobi9P/2/844706/1/1/AS_PDF_FILE"&gt;a planning document&lt;/a&gt; and &lt;a href="https://www.turner-locker.co.uk/wp-content/uploads/2018/08/Cellar-1-The-Quay-TO-LET-2018-1.pdf"&gt;a letting brochure&lt;/a&gt;, both PDFs, but noted that neither of them had the diagram I wanted. Evidently it actually "reads" the PDFs it finds as part of its activities, since it appears to have knowledge of the images contained within them as well as the text.&lt;/p&gt;
&lt;p&gt;It concluded with this intriguing note:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;There’s strong evidence that a detailed plan/section &lt;strong&gt;does&lt;/strong&gt; exist in the Exeter Archaeology report often cited by planners—&lt;strong&gt;“Archaeological, Historical and Conservation Study of the Exeter Canal Basin” (Exeter Archaeology Report 00.18, 2000)&lt;/strong&gt;—but it isn’t online; it’s just cited in planning documents. You’d likely need to request it via the Devon Heritage Centre/Historic Environment Record.&lt;/p&gt;
&lt;p&gt;If you want, I can keep hunting specialist archives for a digitized copy or a different report that includes the vault layouts.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;Obviously I said "Yes do that" and 2m24s later it chucked out a whole host of extra links, an apology that it still hadn't found a detailed plan diagram and, under the section heading "What I recommend next (actionable + exact ask)":&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;If you’re happy, I can draft an email you can paste to the right archivists. [...]&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;Subject: Exeter Quayside cliff vaults – request for plan/section drawings
Hello,&lt;/p&gt;
&lt;p&gt;I’m researching the cliff-cut vaults behind The Quay (On The Waterfront / Kings Wharf).
Could you supply scans or copies of any &lt;strong&gt;plans/sections/diagrams&lt;/strong&gt; of these vaults—especially those in &lt;strong&gt;Exeter Archaeology Report 00.18 (2000)&lt;/strong&gt; (Archaeological, Historical and Conservation Study of the Exeter Canal Basin) and any &lt;strong&gt;EMAFU/Exeter Archaeology drawings&lt;/strong&gt; for “The Quay” showing the &lt;strong&gt;vault/tunnel layouts cut into the red sandstone cliff&lt;/strong&gt;?&lt;/p&gt;
&lt;p&gt;For reference: NHLE list entries &lt;strong&gt;1223045 (1–4 Warehouse Vaults), 1223046 (6–11 Warehouse Vaults)&lt;/strong&gt;, and &lt;strong&gt;1223114 (14–25 Vaults)&lt;/strong&gt;.&lt;/p&gt;
&lt;p&gt;Many thanks!&lt;/p&gt;
&lt;/blockquote&gt;
&lt;/blockquote&gt;
&lt;p&gt;It's a good thing my Research Goblin doesn't have the ability to send email (which we already know is &lt;a href="https://simonwillison.net/2025/May/31/snitchbench-with-llm/"&gt;very risky business&lt;/a&gt;) or at this point I wouldn't be surprised if it started autonomously bothering archivists on my behalf!&lt;/p&gt;
&lt;h4 id="aldi-vs-lidl"&gt;Aldi vs Lidl&lt;/h4&gt;
&lt;blockquote&gt;
&lt;p&gt;In UK supermarket terms compare lidl and aldi&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;I saw an Aldi and a Lidl next to each other in Exeter and was curious as to how they compare. In this case my Research Goblin went &lt;em&gt;way&lt;/em&gt; deeper than I had intended, thinking for 1m25s about market share and price surveys and loyalty schemes and &lt;a href="https://chatgpt.com/share/68bc7913-7ed8-8006-bd3f-63e540470cd7"&gt;answering&lt;/a&gt; with ten paragraphs of detailed analysis.&lt;/p&gt;
&lt;p&gt;I replied:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;Yeah, I just wanted the basic vibes like is one fancier than the other?&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;It thought for 6 seconds, didn't run any additional searches and replied:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;Short version: neither is “fancy” — both are no-frills discounters — but &lt;strong&gt;Lidl often feels a touch more polished&lt;/strong&gt;, while &lt;strong&gt;Aldi feels a touch more bare-bones&lt;/strong&gt;. [...]&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;Since I was now enjoying having my Goblin express judgement over British supermarket chains I said:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;How do they rate in fancinews compared to all of the other major UK supermarket chains?&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;(That was a typo for "fanciness" which it figured out.)&lt;/p&gt;
&lt;p&gt;It thought for another 1m1s, ran 7 searches, consulted 27 URLs and gave me back a ranking that looked about right to me.&lt;/p&gt;
&lt;p&gt;&lt;img src="https://static.simonwillison.net/static/2025/fancy-supermarkets.jpg" alt="ChatGPT. Prompt: How do they rate in fancinews compared to all of the other major UK supermarket chains? Thought for 1m 1s › Gotcha-assuming you mean &amp;quot;fanciness.&amp;quot; Here's the UK supermarket vibe-ladder (posh → no-frills): Super-posh: M&amp;amp;S Food and Waitrose - premium feel, strong store appearance/quality scores; Waitrose is literally described as &amp;quot;upmarket.&amp;quot; Which? The Guardian Upper-mainstream: Sainsbury's, Ocado (online, but it sells M&amp;amp;s Food since 2020, so the range skews premium). Which? The Guardian Mainstream: Tesco, Morrisons, Co-op - broad ranges, middle-of-the-road on experience/ appearance. Which? Value-forward: Asda, Iceland — price-led; Iceland's frozen focus, Asda's value reputation (not &amp;quot;fancy,&amp;quot; but popular). Time Out Worldwide Discounters (least fancy by design): Lidl (a touch more polished in-store) and Aldi (most stripped-back, no-frills). Both sit lower on &amp;quot;store appearance/ service&amp;quot; in consumer surveys, even while smashing value rankings. Which? +1 Net: on a &amp;quot;posh&amp;quot; scale, think M&amp;amp;S/Waitrose at the top, Sainsbury's/Tesco middle, Asda/Iceland lower, and Lidl/Aldi the least fancy— by intent, not as a knock on quality or value." style="max-width: 100%;" /&gt;&lt;/p&gt;
&lt;h4 id="ai-labs-scanning-books-for-training-data"&gt;AI labs scanning books for training data&lt;/h4&gt;
&lt;blockquote&gt;
&lt;p&gt;Anthropic bought lots of physical books and cut them up and scan them for training data. Do any other AI labs do the same thing?&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;Relevant to &lt;a href="https://simonwillison.net/2025/Sep/6/anthropic-settlement/"&gt;today's big story&lt;/a&gt;. Research Goblin was &lt;a href="https://chatgpt.com/share/68bc771c-c188-8006-a8e5-4b1624f5bdf0"&gt;unable to find&lt;/a&gt; any news stories or other evidence that any labs other than Anthropic are engaged in large scale book scanning for training data. That's not to say it isn't happening, but it's happening very quietly if that's the case.&lt;/p&gt;
&lt;h4 id="gpt-5-for-search-feels-competent"&gt;GPT-5 for search feels competent&lt;/h4&gt;
&lt;p&gt;The word that best describes how I feel about GPT-5 search is that it feels &lt;strong&gt;competent&lt;/strong&gt;.&lt;/p&gt;
&lt;p&gt;I've thrown all sorts of things at it over the last few weeks and it rarely disappoints me. It almost always does better than if I were to dedicate the same amount of time to manually searching myself, mainly because it's much faster at running searches and evaluating the results than I am.&lt;/p&gt;
&lt;p&gt;I particularly love that it works so well on mobile. I used to reserve my deeper research sessions to a laptop where I could open up dozens of tabs. I'll still do that for higher stakes activities but I'm finding the scope of curiosity satisfaction I can perform on the go with just my phone has increased quite dramatically.&lt;/p&gt;
&lt;p&gt;I've mostly stopped using OpenAI's Deep Research feature, because ChatGPT search now gives me the results I'm interested in far more quickly for most queries.&lt;/p&gt;
&lt;p&gt;As a developer who builds software on LLMs I see ChatGPT search as the gold standard for what can be achieved using tool calling combined with chain-of-thought. Techniques like RAG are &lt;em&gt;massively&lt;/em&gt; more effective if you can reframe them as several levels of tool calling with a carefully selected set of powerful search tools.&lt;/p&gt;
&lt;p&gt;The way that search tool integrates with reasoning is key, because it allows GPT-5 to execute a search, reason about the results and then execute follow-up searches - all as part of that initial "thinking" process.&lt;/p&gt;
&lt;p&gt;Anthropic call this ability &lt;a href="https://docs.anthropic.com/en/docs/build-with-claude/extended-thinking#interleaved-thinking"&gt;interleaved thinking&lt;/a&gt; and it's also &lt;a href="https://platform.openai.com/docs/guides/reasoning#keeping-reasoning-items-in-context"&gt;supported by the OpenAI Responses API&lt;/a&gt;.&lt;/p&gt;
&lt;h4 id="tips-for-using-search-in-chatgpt"&gt;Tips for using search in ChatGPT&lt;/h4&gt;
&lt;p&gt;As with all things AI, GPT-5 search rewards intuition gathered through experience. Any time a curious thought pops into my head I try to catch it and throw it at my Research Goblin. If it's something I'm certain it won't be able to handle then even better! I can learn from watching it fail.&lt;/p&gt;
&lt;p&gt;I've been trying out hints like "go deep" which seem to trigger a more thorough research job. I enjoy throwing those at shallow and unimportant questions like the UK Starbucks cake pops one just to see what happens!&lt;/p&gt;
&lt;p&gt;You can throw questions at it which have a single, unambiguous answer - but I think questions which are broader and don't have a "correct" answer can be a lot more fun. The UK supermarket rankings above are a great example of that.&lt;/p&gt;
&lt;p&gt;Since I love a questionable analogy for LLMs Research Goblin is... well, it's a goblin. It's very industrious, not quite human and not entirely trustworthy. You have to be able to outwit it if you want to keep it gainfully employed.&lt;/p&gt;
    
        &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/bing"&gt;bing&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/definitions"&gt;definitions&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/search"&gt;search&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/ai"&gt;ai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/openai"&gt;openai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/generative-ai"&gt;generative-ai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/llms"&gt;llms&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/llm-tool-use"&gt;llm-tool-use&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/llm-reasoning"&gt;llm-reasoning&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/deep-research"&gt;deep-research&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/ai-assisted-search"&gt;ai-assisted-search&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/gpt-5"&gt;gpt-5&lt;/a&gt;&lt;/p&gt;
    

</summary><category term="bing"/><category term="definitions"/><category term="search"/><category term="ai"/><category term="openai"/><category term="generative-ai"/><category term="llms"/><category term="llm-tool-use"/><category term="llm-reasoning"/><category term="deep-research"/><category term="ai-assisted-search"/><category term="gpt-5"/></entry><entry><title>Jules, our asynchronous coding agent, is now available for everyone</title><link href="https://simonwillison.net/2025/Aug/6/asynchronous-coding-agents/#atom-tag" rel="alternate"/><published>2025-08-06T19:36:24+00:00</published><updated>2025-08-06T19:36:24+00:00</updated><id>https://simonwillison.net/2025/Aug/6/asynchronous-coding-agents/#atom-tag</id><summary type="html">
    
&lt;p&gt;&lt;strong&gt;&lt;a href="https://blog.google/technology/google-labs/jules-now-available/"&gt;Jules, our asynchronous coding agent, is now available for everyone&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;
I wrote about the Jules beta &lt;a href="https://simonwillison.net/2025/May/19/jules/"&gt;back in May&lt;/a&gt;. Google's version of the OpenAI Codex PR-submitting hosted coding tool graduated from beta today.&lt;/p&gt;
&lt;p&gt;I'm mainly linking to this now because I like the new term they are using in this blog entry: &lt;strong&gt;Asynchronous coding agent&lt;/strong&gt;. I like it so much I &lt;a href="https://simonwillison.net/tags/asynchronous-coding-agents/"&gt;gave it a tag&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;I continue to avoid the term "agent" as infuriatingly vague, but I can grudgingly accept it when accompanied by a prefix that clarifies the type of agent we are talking about. "Asynchronous coding agent" feels just about obvious enough to me to be useful.&lt;/p&gt;
&lt;p&gt;... I just ran a Google search for &lt;code&gt;"asynchronous coding agent" -jules&lt;/code&gt; and came up with a few more notable examples of this name being used elsewhere:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href="https://blog.langchain.com/introducing-open-swe-an-open-source-asynchronous-coding-agent/"&gt;Introducing Open SWE: An Open-Source Asynchronous Coding Agent&lt;/a&gt; is an announcement from LangChain just this morning of their take on this pattern. They provide a hosted version (bring your own API keys) or you can run it yourself with &lt;a href="https://github.com/langchain-ai/open-swe"&gt;their MIT licensed code&lt;/a&gt;.&lt;/li&gt;
&lt;li&gt;The press release for GitHub's own version of this &lt;a href="https://github.com/newsroom/press-releases/coding-agent-for-github-copilot"&gt;GitHub Introduces Coding Agent For GitHub Copilot&lt;/a&gt; states that "GitHub Copilot now includes an asynchronous coding agent".&lt;/li&gt;
&lt;/ul&gt;

    &lt;p&gt;&lt;small&gt;&lt;/small&gt;Via &lt;a href="https://news.ycombinator.com/item?id=44813854"&gt;Hacker News&lt;/a&gt;&lt;/small&gt;&lt;/p&gt;


    &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/definitions"&gt;definitions&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/github"&gt;github&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/google"&gt;google&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/ai"&gt;ai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/generative-ai"&gt;generative-ai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/llms"&gt;llms&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/ai-assisted-programming"&gt;ai-assisted-programming&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/gemini"&gt;gemini&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/agent-definitions"&gt;agent-definitions&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/async-coding-agents"&gt;async-coding-agents&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/jules"&gt;jules&lt;/a&gt;&lt;/p&gt;



</summary><category term="definitions"/><category term="github"/><category term="google"/><category term="ai"/><category term="generative-ai"/><category term="llms"/><category term="ai-assisted-programming"/><category term="gemini"/><category term="agent-definitions"/><category term="async-coding-agents"/><category term="jules"/></entry><entry><title>Vibe scraping and vibe coding a schedule app for Open Sauce 2025 entirely on my phone</title><link href="https://simonwillison.net/2025/Jul/17/vibe-scraping/#atom-tag" rel="alternate"/><published>2025-07-17T19:38:50+00:00</published><updated>2025-07-17T19:38:50+00:00</updated><id>https://simonwillison.net/2025/Jul/17/vibe-scraping/#atom-tag</id><summary type="html">
    &lt;p&gt;This morning, working entirely on my phone, I scraped a conference website and vibe coded up an alternative UI for interacting with the schedule using a combination of OpenAI Codex and Claude Artifacts.&lt;/p&gt;
&lt;p&gt;This weekend is &lt;a href="https://opensauce.com/"&gt;Open Sauce 2025&lt;/a&gt;, the third edition of the Bay Area conference for YouTube creators in the science and engineering space. I have a couple of friends going and they were complaining that the official schedule was difficult to navigate on a phone - it's not even linked from the homepage on mobile, and once you do find &lt;a href="https://opensauce.com/agenda/"&gt;the agenda&lt;/a&gt; it isn't particularly mobile-friendly.&lt;/p&gt;
&lt;p&gt;We were out for coffee this morning so I only had my phone, but I decided to see if I could fix it anyway.&lt;/p&gt;
&lt;p&gt;TLDR: Working entirely on my iPhone, using a combination of &lt;a href="https://chatgpt.com/codex"&gt;OpenAI Codex&lt;/a&gt; in the ChatGPT mobile app and Claude Artifacts via the Claude app, I was able to scrape the full schedule and then build and deploy this: &lt;a href="https://tools.simonwillison.net/open-sauce-2025"&gt;tools.simonwillison.net/open-sauce-2025&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&lt;img src="https://static.simonwillison.net/static/2025/open-sauce-2025-card.jpg" alt="Screenshot of a blue page, Open Sauce 2025, July 18-20 2025, Download Calendar ICS button, then Friday 18th and Saturday 18th and Sunday 20th pill buttons, Friday is selected, the Welcome to Open Sauce with William Osman event on the Industry Stage is visible." style="max-width: 100%;" /&gt;&lt;/p&gt;
&lt;p&gt;The site offers a faster loading and more useful agenda view, but more importantly it includes an option to "Download Calendar (ICS)" which allows mobile phone users (Android and iOS) to easily import the schedule events directly into their calendar app of choice.&lt;/p&gt;
&lt;p&gt;Here are some detailed notes on how I built it.&lt;/p&gt;
&lt;h4 id="scraping-the-schedule"&gt;Scraping the schedule&lt;/h4&gt;
&lt;p&gt;Step one was to get that schedule in a structured format. I don't have good tools for viewing source on my iPhone, so I took a different approach to turning the schedule site into structured data.&lt;/p&gt;
&lt;p&gt;My first thought was to screenshot the schedule on my phone and then dump the images into a vision LLM - but the schedule was long enough that I didn't feel like scrolling through several different pages and stitching together dozens of images.&lt;/p&gt;
&lt;p&gt;If I was working on a laptop I'd turn to scraping: I'd dig around in the site itself and figure out where the data came from, then write code to extract it out.&lt;/p&gt;
&lt;p&gt;How could I do the same thing working on my phone?&lt;/p&gt;
&lt;p&gt;I decided to use &lt;strong&gt;OpenAI Codex&lt;/strong&gt; - the &lt;a href="https://simonwillison.net/2025/May/16/openai-codex/"&gt;hosted tool&lt;/a&gt;, not the confusingly named &lt;a href="https://simonwillison.net/2025/Apr/16/openai-codex/"&gt;CLI utility&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;Codex recently &lt;a href="https://simonwillison.net/2025/Jun/3/codex-agent-internet-access/"&gt;grew the ability&lt;/a&gt; to interact with the internet while attempting to resolve a task. I have a dedicated Codex "environment" configured against a GitHub repository that doesn't do anything else, purely so I can run internet-enabled sessions there that can execute arbitrary network-enabled commands.&lt;/p&gt;
&lt;p&gt;I started a new task there (using the Codex interface inside the ChatGPT iPhone app) and prompted:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;&lt;code&gt;Install playwright and use it to visit https://opensauce.com/agenda/ and grab the full details of all three day schedules from the tabs - Friday and Saturday and Sunday - then save and on Data in as much detail as possible in a JSON file and submit that as a PR&lt;/code&gt;&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;Codex is frustrating in that you only get one shot: it can go away and work autonomously on a task for a long time, but while it's working you can't give it follow-up prompts. You can wait for it to finish entirely and then tell it to try again in a new session, but ideally the instructions you give it are enough for it to get to the finish state where it submits a pull request against your repo with the results.&lt;/p&gt;
&lt;p&gt;I got lucky: my above prompt worked exactly as intended.&lt;/p&gt;
&lt;p&gt;Codex churned for a &lt;em&gt;13 minutes&lt;/em&gt;! I was sat chatting in a coffee shop, occasionally checking the logs to see what it was up to.&lt;/p&gt;
&lt;p&gt;It tried a whole bunch of approaches, all involving running the Playwright Python library to interact with the site. You can see &lt;a href="https://chatgpt.com/s/cd_687945dea5f48191892e0d73ebb45aa4"&gt;the full transcript here&lt;/a&gt;. It includes notes like "&lt;em&gt;Looks like xxd isn't installed. I'll grab "vim-common" or "xxd" to fix it.&lt;/em&gt;".&lt;/p&gt;
&lt;p&gt;Eventually it downloaded an enormous obfuscated chunk of JavaScript called &lt;a href="https://opensauce.com/wp-content/uploads/2025/07/schedule-overview-main-1752724893152.js"&gt;schedule-overview-main-1752724893152.js&lt;/a&gt; (316KB) and then ran a complex sequence of grep, grep, sed, strings, xxd and dd commands against it to figure out the location of the raw schedule data in order to extract it out.&lt;/p&gt;
&lt;p&gt;Here's the eventual &lt;a href="https://github.com/simonw/.github/blob/f671bf57f7c20a4a7a5b0642837811e37c557499/extract_schedule.py"&gt;extract_schedule.py&lt;/a&gt; Python script it wrote, which uses Playwright to save that &lt;code&gt;schedule-overview-main-1752724893152.js&lt;/code&gt; file and then extracts the raw data using the following code (which calls Node.js inside Python, just so it can use the JavaScript &lt;code&gt;eval()&lt;/code&gt; function):&lt;/p&gt;
&lt;pre&gt;&lt;span class="pl-s1"&gt;node_script&lt;/span&gt; &lt;span class="pl-c1"&gt;=&lt;/span&gt; (
    &lt;span class="pl-s"&gt;"const fs=require('fs');"&lt;/span&gt;
    &lt;span class="pl-s"&gt;f"const d=fs.readFileSync('&lt;span class="pl-s1"&gt;&lt;span class="pl-kos"&gt;{&lt;/span&gt;&lt;span class="pl-s1"&gt;tmp_path&lt;/span&gt;&lt;span class="pl-kos"&gt;}&lt;/span&gt;&lt;/span&gt;','utf8');"&lt;/span&gt;
    &lt;span class="pl-s"&gt;"const m=d.match(/var oo=(&lt;span class="pl-cce"&gt;\\&lt;/span&gt;{.*?&lt;span class="pl-cce"&gt;\\&lt;/span&gt;});/s);"&lt;/span&gt;
    &lt;span class="pl-s"&gt;"if(!m){throw new Error('not found');}"&lt;/span&gt;
    &lt;span class="pl-s"&gt;"const obj=eval('(' + m[1] + ')');"&lt;/span&gt;
    &lt;span class="pl-s"&gt;f"fs.writeFileSync('&lt;span class="pl-s1"&gt;&lt;span class="pl-kos"&gt;{&lt;/span&gt;&lt;span class="pl-c1"&gt;OUTPUT_FILE&lt;/span&gt;&lt;span class="pl-kos"&gt;}&lt;/span&gt;&lt;/span&gt;', JSON.stringify(obj, null, 2));"&lt;/span&gt;
)
&lt;span class="pl-s1"&gt;subprocess&lt;/span&gt;.&lt;span class="pl-c1"&gt;run&lt;/span&gt;([&lt;span class="pl-s"&gt;'node'&lt;/span&gt;, &lt;span class="pl-s"&gt;'-e'&lt;/span&gt;, &lt;span class="pl-s1"&gt;node_script&lt;/span&gt;], &lt;span class="pl-s1"&gt;check&lt;/span&gt;&lt;span class="pl-c1"&gt;=&lt;/span&gt;&lt;span class="pl-c1"&gt;True&lt;/span&gt;)&lt;/pre&gt;
&lt;p&gt;As instructed, it then filed &lt;a href="https://github.com/simonw/.github/pull/1"&gt;a PR against my repo&lt;/a&gt;. It included the Python Playwright script, but more importantly it also included that full extracted &lt;a href="https://github.com/simonw/.github/blob/f671bf57f7c20a4a7a5b0642837811e37c557499/schedule.json"&gt;schedule.json&lt;/a&gt; file. That meant I now had the schedule data, with a  &lt;code&gt;raw.githubusercontent.com&lt;/code&gt;  URL with open CORS headers that could be fetched by a web app!&lt;/p&gt;
&lt;h4 id="building-the-web-app"&gt;Building the web app&lt;/h4&gt;
&lt;p&gt;Now that I had the data, the next step was to build a web application to preview it and serve it up in a more useful format.&lt;/p&gt;
&lt;p&gt;I decided I wanted two things: a nice mobile friendly interface for browsing the schedule, and mechanism for importing that schedule into a calendar application, such as Apple or Google Calendar.&lt;/p&gt;
&lt;p&gt;It took me several false starts to get this to work. The biggest challenge was getting that 63KB of schedule JSON data into the app. I tried a few approaches here, all on my iPhone while sitting in coffee shop and later while driving with a friend to drop them off at the closest BART station.&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;Using ChatGPT Canvas and o3, since unlike Claude Artifacts a Canvas can fetch data from remote URLs if you allow-list that domain. I later found out that &lt;a href="https://chatgpt.com/share/687948b7-e8b8-8006-a450-0c07bdfd7f85"&gt;this had worked&lt;/a&gt; when I viewed it on my laptop, but on my phone it threw errors so I gave up on it.&lt;/li&gt;
&lt;li&gt;Uploading the JSON to Claude and telling it to build an artifact that read the file directly - this &lt;a href="https://claude.ai/share/25297074-37a9-4583-bc2f-630f6dea5c5d"&gt;failed with an error&lt;/a&gt; "undefined is not an object (evaluating 'window.fs.readFile')". The Claude 4 system prompt &lt;a href="https://simonwillison.net/2025/May/25/claude-4-system-prompt/#artifacts-the-missing-manual"&gt;had lead me to expect this to work&lt;/a&gt;, I'm not sure why it didn't.&lt;/li&gt;
&lt;li&gt;Having Claude copy the full JSON into the artifact. This took too long - typing out 63KB of JSON is not a sensible use of LLM tokens, and it flaked out on me when my connection went intermittent driving through a tunnel.&lt;/li&gt;
&lt;li&gt;Telling Claude to fetch from the URL to that schedule JSON instead. This was my last resort because the Claude Artifacts UI blocks access to external URLs, so you have to copy and paste the code out to a separate interface (on an iPhone, which still lacks a "select all" button) making for a frustrating process.&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;That final option worked! Here's the full sequence of prompts I used with Claude to get to a working implementation - &lt;a href="https://claude.ai/share/e391bbcc-09a2-4f86-9bec-c6def8fc8dc9"&gt;full transcript here&lt;/a&gt;:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;&lt;code&gt;Use your analyst tool to read this JSON file and show me the top level keys&lt;/code&gt;&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;This was to prime Claude - I wanted to remind it about its &lt;code&gt;window.fs.readFile&lt;/code&gt; function and have it read enough of the JSON to understand the structure.&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;&lt;code&gt;Build an artifact with no react that turns the schedule into a nice mobile friendly webpage - there are three days Friday, Saturday and Sunday, which corresponded to the 25th and 26th and 27th of July 2025&lt;/code&gt;&lt;/p&gt;
&lt;p&gt;&lt;code&gt;Don’t copy the raw JSON over to the artifact - use your fs function to read it instead&lt;/code&gt;&lt;/p&gt;
&lt;p&gt;&lt;code&gt;Also include a button to download ICS at the top of the page which downloads a ICS version of the schedule&lt;/code&gt;&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;I had noticed that the schedule data had keys for "friday" and "saturday" and "sunday" but no indication of the dates, so I told it those. It turned out later I'd got these wrong!&lt;/p&gt;
&lt;p&gt;This got me a version of the page that failed with an error, because that &lt;code&gt;fs.readFile()&lt;/code&gt; couldn't load the data from the artifact for some reason. So I fixed that with:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;&lt;code&gt;Change it so instead of using the readFile thing it fetches the same JSON from  https://raw.githubusercontent.com/simonw/.github/f671bf57f7c20a4a7a5b0642837811e37c557499/schedule.json&lt;/code&gt;&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;... then copied the HTML out to a Gist and previewed it with &lt;a href="https://gistpreview.github.io/"&gt;gistpreview.github.io&lt;/a&gt; - here's &lt;a href="https://gistpreview.github.io/?06a5d1f3bf0af81d55a411f32b2f37c7"&gt;that preview&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;Then we spot-checked it, since there are &lt;em&gt;so many ways&lt;/em&gt; this could have gone wrong. Thankfully the schedule JSON itself never round-tripped through an LLM so we didn't need to worry about hallucinated session details, but this was almost pure vibe coding so there was a big risk of a mistake sneaking through.&lt;/p&gt;
&lt;p&gt;I'd set myself a deadline of "by the time we drop my friend at the BART station" and I hit that deadline with just seconds to spare. I pasted the resulting HTML &lt;a href="https://github.com/simonw/tools/blob/main/open-sauce-2025.html"&gt;into my simonw/tools GitHub repo&lt;/a&gt; using the GitHub mobile web interface which deployed it to that final &lt;a href="https://tools.simonwillison.net/open-sauce-2025"&gt;tools.simonwillison.net/open-sauce-2025&lt;/a&gt; URL.&lt;/p&gt;
&lt;p&gt;... then we noticed that we &lt;em&gt;had&lt;/em&gt; missed a bug: I had given it the dates of "25th and 26th and 27th of July 2025" but actually that was a week too late, the correct dates were July 18th-20th.&lt;/p&gt;
&lt;p&gt;Thankfully I have Codex configured against my &lt;code&gt;simonw/tools&lt;/code&gt; repo as well, so fixing that was a case of prompting a new Codex session with:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;&lt;code&gt;The open sauce schedule got the dates wrong - Friday is 18 July 2025 and Saturday is 19 and Sunday is 20 - fix it&lt;/code&gt;&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;Here's &lt;a href="https://chatgpt.com/s/cd_68794c97a3d88191a2cbe9de78103334"&gt;that Codex transcript&lt;/a&gt;, which resulted in &lt;a href="https://github.com/simonw/tools/pull/34"&gt;this PR&lt;/a&gt; which I landed and deployed, again using the GitHub mobile web interface.&lt;/p&gt;
&lt;h4 id="what-this-all-demonstrates"&gt;What this all demonstrates&lt;/h4&gt;
&lt;p&gt;So, to recap: I was able to scrape a website (without even a view source too), turn the resulting JSON data into a mobile-friendly website, add an ICS export feature and deploy the results to a static hosting platform (GitHub Pages) working entirely on my phone.&lt;/p&gt;
&lt;p&gt;If I'd had a laptop this project would have been faster, but honestly aside from a little bit more hands-on debugging I wouldn't have gone about it in a particularly different way.&lt;/p&gt;
&lt;p&gt;I was able to do other stuff at the same time - the Codex scraping project ran entirely autonomously, and the app build itself was more involved only because I had to work around the limitations of the tools I was using in terms of fetching data from external sources.&lt;/p&gt;
&lt;p&gt;As usual with this stuff, my 25+ years of previous web development experience was critical to being able to execute the project. I knew about Codex, and Artifacts, and GitHub, and Playwright, and CORS headers, and Artifacts sandbox limitations, and the capabilities of ICS files on mobile phones.&lt;/p&gt;
&lt;p&gt;This whole thing was &lt;em&gt;so much fun!&lt;/em&gt; Being able to spin up multiple coding agents directly from my phone and have them solve quite complex problems while only paying partial attention to the details is a solid demonstration of why I continue to enjoying exploring the edges of &lt;a href="https://simonwillison.net/tags/ai-assisted-programming/"&gt;AI-assisted programming&lt;/a&gt;.&lt;/p&gt;

&lt;h4 id="update-i-removed-the-speaker-avatars"&gt;Update: I removed the speaker avatars&lt;/h4&gt;
&lt;p&gt;Here's a beautiful cautionary tale about the dangers of vibe-coding on a phone with no access to performance profiling tools. A commenter on Hacker News &lt;a href="https://news.ycombinator.com/item?id=44597405#44597808"&gt;pointed out&lt;/a&gt;:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;The web app makes 176 requests and downloads 130 megabytes.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;And yeah, it did! Turns out those speaker avatar images weren't optimized, and there were over 170 of them.&lt;/p&gt;
&lt;p&gt;I told &lt;a href="https://chatgpt.com/s/cd_6879631d99c48191b1ab7f84dfab8dea"&gt;a fresh Codex instance&lt;/a&gt; "Remove the speaker avatar images from open-sauce-2025.html" and now the page weighs 93.58 KB - about 1,400 times smaller!&lt;/p&gt;
&lt;h4 id="update-2-improved-accessibility"&gt;Update 2: Improved accessibility&lt;/h4&gt;
&lt;p&gt;That same commenter &lt;a href="https://news.ycombinator.com/item?id=44597405#44597808"&gt;on Hacker News&lt;/a&gt;:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;It's also &lt;code&gt;&amp;lt;div&amp;gt;&lt;/code&gt; soup and largely inaccessible.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;Yeah, this HTML isn't great:&lt;/p&gt;
&lt;div class="highlight highlight-source-js"&gt;&lt;pre&gt;&lt;span class="pl-s1"&gt;dayContainer&lt;/span&gt;&lt;span class="pl-kos"&gt;.&lt;/span&gt;&lt;span class="pl-c1"&gt;innerHTML&lt;/span&gt; &lt;span class="pl-c1"&gt;=&lt;/span&gt; &lt;span class="pl-s1"&gt;sessions&lt;/span&gt;&lt;span class="pl-kos"&gt;.&lt;/span&gt;&lt;span class="pl-c1"&gt;map&lt;/span&gt;&lt;span class="pl-kos"&gt;(&lt;/span&gt;&lt;span class="pl-s1"&gt;session&lt;/span&gt; &lt;span class="pl-c1"&gt;=&amp;gt;&lt;/span&gt; `
    &amp;lt;div class="session-card"&amp;gt;
        &amp;lt;div class="session-header"&amp;gt;
            &amp;lt;div&amp;gt;
                &amp;lt;span class="session-time"&amp;gt;&lt;span class="pl-s1"&gt;&lt;span class="pl-kos"&gt;${&lt;/span&gt;&lt;span class="pl-s1"&gt;session&lt;/span&gt;&lt;span class="pl-kos"&gt;.&lt;/span&gt;&lt;span class="pl-c1"&gt;time&lt;/span&gt;&lt;span class="pl-kos"&gt;}&lt;/span&gt;&lt;/span&gt;&amp;lt;/span&amp;gt;
                &amp;lt;span class="length-badge"&amp;gt;&lt;span class="pl-s1"&gt;&lt;span class="pl-kos"&gt;${&lt;/span&gt;&lt;span class="pl-s1"&gt;session&lt;/span&gt;&lt;span class="pl-kos"&gt;.&lt;/span&gt;&lt;span class="pl-c1"&gt;length&lt;/span&gt;&lt;span class="pl-kos"&gt;}&lt;/span&gt;&lt;/span&gt; min&amp;lt;/span&amp;gt;
            &amp;lt;/div&amp;gt;
            &amp;lt;div class="session-location"&amp;gt;&lt;span class="pl-s1"&gt;&lt;span class="pl-kos"&gt;${&lt;/span&gt;&lt;span class="pl-s1"&gt;session&lt;/span&gt;&lt;span class="pl-kos"&gt;.&lt;/span&gt;&lt;span class="pl-c1"&gt;where&lt;/span&gt;&lt;span class="pl-kos"&gt;}&lt;/span&gt;&lt;/span&gt;&amp;lt;/&lt;span class="pl-s1"&gt;div&lt;/span&gt;&lt;span class="pl-c1"&gt;&amp;gt;&lt;/span&gt;
        &amp;lt;/&lt;span class="pl-s1"&gt;div&lt;/span&gt;&lt;span class="pl-c1"&gt;&amp;gt;&lt;/span&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;I &lt;a href="https://github.com/simonw/tools/issues/36"&gt;opened an issue&lt;/a&gt; and had both Claude Code and Codex look at it. Claude Code &lt;a href="https://github.com/simonw/tools/issues/36#issuecomment-3085516331"&gt;failed to submit a PR&lt;/a&gt; for some reason, but Codex &lt;a href="https://github.com/simonw/tools/pull/37"&gt;opened one&lt;/a&gt; with a fix that sounded good to me when I tried it with VoiceOver on iOS (using &lt;a href="https://codex-make-open-sauce-2025-h.tools-b1q.pages.dev/open-sauce-2025"&gt;a Cloudflare Pages preview&lt;/a&gt;) so I landed that. Here's &lt;a href="https://github.com/simonw/tools/commit/29c8298363869bbd4b4e7c51378c20dc8ac30c39"&gt;the diff&lt;/a&gt;, which added a hidden "skip to content" link, some &lt;code&gt;aria-&lt;/code&gt; attributes on buttons and upgraded the HTML to use &lt;code&gt;&amp;lt;h3&amp;gt;&lt;/code&gt; for the session titles.&lt;/p&gt;
&lt;p&gt;Next time I'll remember to specify accessibility as a requirement in the initial prompt. I'm disappointed that Claude didn't consider that without me having to ask.&lt;/p&gt;
    
        &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/definitions"&gt;definitions&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/github"&gt;github&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/icalendar"&gt;icalendar&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/mobile"&gt;mobile&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/scraping"&gt;scraping&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/tools"&gt;tools&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/ai"&gt;ai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/playwright"&gt;playwright&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/openai"&gt;openai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/generative-ai"&gt;generative-ai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/chatgpt"&gt;chatgpt&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/llms"&gt;llms&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/ai-assisted-programming"&gt;ai-assisted-programming&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/claude"&gt;claude&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/claude-artifacts"&gt;claude-artifacts&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/ai-agents"&gt;ai-agents&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/vibe-coding"&gt;vibe-coding&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/coding-agents"&gt;coding-agents&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/async-coding-agents"&gt;async-coding-agents&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/prompt-to-app"&gt;prompt-to-app&lt;/a&gt;&lt;/p&gt;
    

</summary><category term="definitions"/><category term="github"/><category term="icalendar"/><category term="mobile"/><category term="scraping"/><category term="tools"/><category term="ai"/><category term="playwright"/><category term="openai"/><category term="generative-ai"/><category term="chatgpt"/><category term="llms"/><category term="ai-assisted-programming"/><category term="claude"/><category term="claude-artifacts"/><category term="ai-agents"/><category term="vibe-coding"/><category term="coding-agents"/><category term="async-coding-agents"/><category term="prompt-to-app"/></entry><entry><title>How to Fix Your Context</title><link href="https://simonwillison.net/2025/Jun/29/how-to-fix-your-context/#atom-tag" rel="alternate"/><published>2025-06-29T20:15:41+00:00</published><updated>2025-06-29T20:15:41+00:00</updated><id>https://simonwillison.net/2025/Jun/29/how-to-fix-your-context/#atom-tag</id><summary type="html">
    
&lt;p&gt;&lt;strong&gt;&lt;a href="https://www.dbreunig.com/2025/06/26/how-to-fix-your-context.html"&gt;How to Fix Your Context&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;
Drew Breunig has been publishing some very detailed notes on &lt;a href="https://simonwillison.net/2025/Jun/27/context-engineering/"&gt;context engineering&lt;/a&gt; recently. In &lt;a href="https://www.dbreunig.com/2025/06/22/how-contexts-fail-and-how-to-fix-them.html"&gt;How Long Contexts Fail&lt;/a&gt; he described four common patterns for &lt;a href="https://simonwillison.net/2025/Jun/18/context-rot/"&gt;context rot&lt;/a&gt;, which he summarizes like so:&lt;/p&gt;
&lt;blockquote&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Context Poisoning&lt;/strong&gt;: When a hallucination or other error makes it into the context, where it is repeatedly referenced.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Context Distraction&lt;/strong&gt;: When a context grows so long that the model over-focuses on the context, neglecting what it learned during training.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Context Confusion&lt;/strong&gt;: When superfluous information in the context is used by the model to generate a low-quality response.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Context Clash&lt;/strong&gt;: When you accrue new information and tools in your context that conflicts with other information in the prompt.&lt;/li&gt;
&lt;/ul&gt;
&lt;/blockquote&gt;
&lt;p&gt;In &lt;a href="https://www.dbreunig.com/2025/06/26/how-to-fix-your-context.html"&gt;this follow-up&lt;/a&gt; he introduces neat ideas (and more new terminology) for addressing those problems.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Tool Loadout&lt;/strong&gt; describes selecting a subset of tools to enable for a prompt, based on research that shows anything beyond 20 can confuse some models. &lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Context Quarantine&lt;/strong&gt; is "the act of isolating contexts in their own dedicated threads" - I've called rhis sub-agents in the past, it's the pattern &lt;a href="https://simonwillison.net/2025/Jun/2/claude-trace/"&gt;used by Claude Code&lt;/a&gt; and explored in depth in &lt;a href="https://simonwillison.net/2025/Jun/14/multi-agent-research-system/"&gt;Anthropic's multi-agent research paper&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Context Pruning&lt;/strong&gt; is "removing irrelevant or otherwise unneeded information from the context", and &lt;strong&gt;Context Summarization&lt;/strong&gt; is the act of boiling down an accrued context into a condensed summary. These techniques become particularly important as conversations get longer and run closer to the model's token limits.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Context Offloading&lt;/strong&gt; is "the act of storing information outside the LLM’s context". I've seen several systems implement their own "memory" tool for saving and then revisiting notes as they work, but an even more interesting example recently is how various coding agents create and update &lt;code&gt;plan.md&lt;/code&gt; files as they work through larger problems.&lt;/p&gt;
&lt;p&gt;Drew's conclusion:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;The key insight across all the above tactics is that &lt;em&gt;context is not free&lt;/em&gt;. Every token in the context influences the model’s behavior, for better or worse. The massive context windows of modern LLMs are a powerful capability, but they’re not an excuse to be sloppy with information management.&lt;/p&gt;
&lt;/blockquote&gt;


    &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/definitions"&gt;definitions&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/ai"&gt;ai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/prompt-engineering"&gt;prompt-engineering&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/generative-ai"&gt;generative-ai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/llms"&gt;llms&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/drew-breunig"&gt;drew-breunig&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/llm-tool-use"&gt;llm-tool-use&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/ai-agents"&gt;ai-agents&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/context-engineering"&gt;context-engineering&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/sub-agents"&gt;sub-agents&lt;/a&gt;&lt;/p&gt;



</summary><category term="definitions"/><category term="ai"/><category term="prompt-engineering"/><category term="generative-ai"/><category term="llms"/><category term="drew-breunig"/><category term="llm-tool-use"/><category term="ai-agents"/><category term="context-engineering"/><category term="sub-agents"/></entry><entry><title>Context engineering</title><link href="https://simonwillison.net/2025/Jun/27/context-engineering/#atom-tag" rel="alternate"/><published>2025-06-27T23:42:43+00:00</published><updated>2025-06-27T23:42:43+00:00</updated><id>https://simonwillison.net/2025/Jun/27/context-engineering/#atom-tag</id><summary type="html">
    &lt;p&gt;The term &lt;strong&gt;context engineering&lt;/strong&gt; has recently started to gain traction as a better alternative to prompt engineering. I like it. I think this one may have sticking power.&lt;/p&gt;
&lt;p&gt;Here's an example tweet &lt;a href="https://twitter.com/tobi/status/1935533422589399127"&gt;from Shopify CEO Tobi Lutke&lt;/a&gt;:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;I really like the term “context engineering” over prompt engineering. &lt;/p&gt;
&lt;p&gt;It describes the core skill better: the art of providing all the context for the task to be plausibly solvable by the LLM.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;Recently amplified &lt;a href="https://twitter.com/karpathy/status/1937902205765607626"&gt;by Andrej Karpathy&lt;/a&gt;:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;+1 for "context engineering" over "prompt engineering".&lt;/p&gt;
&lt;p&gt;People associate prompts with short task descriptions you'd give an LLM in your day-to-day use. When in every industrial-strength LLM app, context engineering is the delicate art and science of filling the context window with just the right information for the next step. Science because doing this right involves task descriptions and explanations, few shot examples, RAG, related (possibly multimodal) data, tools, state and history, compacting [...] Doing this well is highly non-trivial. And art because of the guiding intuition around LLM psychology of people spirits. [...]&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;I've &lt;a href="https://simonwillison.net/2023/Feb/21/in-defense-of-prompt-engineering/"&gt;spoken favorably of prompt engineering&lt;/a&gt; in the past - I hoped that term could capture the inherent complexity of constructing reliable prompts. Unfortunately, most people's inferred definition is that it's a laughably pretentious term for typing things into a chatbot! &lt;/p&gt;
&lt;p&gt;It turns out that inferred definitions are the ones that stick. I think the inferred definition of "context engineering" is likely to be much closer to the intended meaning.&lt;/p&gt;

    &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/andrej-karpathy"&gt;andrej-karpathy&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/prompt-engineering"&gt;prompt-engineering&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/generative-ai"&gt;generative-ai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/ai"&gt;ai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/llms"&gt;llms&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/context-engineering"&gt;context-engineering&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/definitions"&gt;definitions&lt;/a&gt;&lt;/p&gt;



</summary><category term="andrej-karpathy"/><category term="prompt-engineering"/><category term="generative-ai"/><category term="ai"/><category term="llms"/><category term="context-engineering"/><category term="definitions"/></entry><entry><title>Quoting Workaccount2 on Hacker News</title><link href="https://simonwillison.net/2025/Jun/18/context-rot/#atom-tag" rel="alternate"/><published>2025-06-18T23:15:12+00:00</published><updated>2025-06-18T23:15:12+00:00</updated><id>https://simonwillison.net/2025/Jun/18/context-rot/#atom-tag</id><summary type="html">
    &lt;blockquote cite="https://news.ycombinator.com/item?id=44308711#44310054"&gt;&lt;p&gt;They poison their own context. Maybe you can call it &lt;strong&gt;context rot&lt;/strong&gt;, where as context grows and especially if it grows with lots of distractions and dead ends, the output quality falls off rapidly. Even with good context the rot will start to become apparent around 100k tokens (with Gemini 2.5).&lt;/p&gt;
&lt;p&gt;They really need to figure out a way to delete or "forget" prior context, so the user or even the model can go back and prune poisonous tokens.&lt;/p&gt;
&lt;p&gt;Right now I work around it by regularly making summaries of instances, and then spinning up a new instance with fresh context and feed in the summary of the previous instance.&lt;/p&gt;&lt;/blockquote&gt;
&lt;p class="cite"&gt;&amp;mdash; &lt;a href="https://news.ycombinator.com/item?id=44308711#44310054"&gt;Workaccount2 on Hacker News&lt;/a&gt;, coining "context rot"&lt;/p&gt;

    &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/definitions"&gt;definitions&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/ai"&gt;ai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/generative-ai"&gt;generative-ai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/llms"&gt;llms&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/long-context"&gt;long-context&lt;/a&gt;&lt;/p&gt;



</summary><category term="definitions"/><category term="ai"/><category term="generative-ai"/><category term="llms"/><category term="long-context"/></entry></feed>