<?xml version="1.0" encoding="utf-8"?>
<feed xml:lang="en-us" xmlns="http://www.w3.org/2005/Atom"><title>Simon Willison's Weblog: codex-cli</title><link href="http://simonwillison.net/" rel="alternate"/><link href="http://simonwillison.net/tags/codex-cli.atom" rel="self"/><id>http://simonwillison.net/</id><updated>2026-04-30T23:23:17+00:00</updated><author><name>Simon Willison</name></author><entry><title>Codex CLI 0.128.0 adds /goal</title><link href="https://simonwillison.net/2026/Apr/30/codex-goals/#atom-tag" rel="alternate"/><published>2026-04-30T23:23:17+00:00</published><updated>2026-04-30T23:23:17+00:00</updated><id>https://simonwillison.net/2026/Apr/30/codex-goals/#atom-tag</id><summary type="html">
    
&lt;p&gt;&lt;strong&gt;&lt;a href="https://github.com/openai/codex/releases/tag/rust-v0.128.0"&gt;Codex CLI 0.128.0 adds /goal&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;
The latest version of OpenAI's Codex CLI coding agent adds their own version of the &lt;a href="https://ghuntley.com/ralph/"&gt;Ralph loop&lt;/a&gt;: you can now set a &lt;code&gt;/goal&lt;/code&gt; and Codex will keep on looping until it evaluates that the goal has been completed... or the configured token budget has been exhausted.&lt;/p&gt;
&lt;p&gt;It looks like the feature is mainly implemented though the &lt;a href="https://github.com/openai/codex/blob/6014b6679ffbd92eeddffa3ad7b4402be6a7fefe/codex-rs/core/templates/goals/continuation.md"&gt;goals/continuation.md&lt;/a&gt; and &lt;a href="https://github.com/openai/codex/blob/6014b6679ffbd92eeddffa3ad7b4402be6a7fefe/codex-rs/core/templates/goals/budget_limit.md"&gt;goals/budget_limit.md&lt;/a&gt; prompts, which are automatically injected at the end of a turn.

    &lt;p&gt;&lt;small&gt;&lt;/small&gt;Via &lt;a href="https://twitter.com/fcoury/status/2049917871799636201"&gt;@fcoury&lt;/a&gt;&lt;/small&gt;&lt;/p&gt;


    &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/ai"&gt;ai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/openai"&gt;openai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/prompt-engineering"&gt;prompt-engineering&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/generative-ai"&gt;generative-ai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/llms"&gt;llms&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/coding-agents"&gt;coding-agents&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/system-prompts"&gt;system-prompts&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/codex-cli"&gt;codex-cli&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/agentic-engineering"&gt;agentic-engineering&lt;/a&gt;&lt;/p&gt;



</summary><category term="ai"/><category term="openai"/><category term="prompt-engineering"/><category term="generative-ai"/><category term="llms"/><category term="coding-agents"/><category term="system-prompts"/><category term="codex-cli"/><category term="agentic-engineering"/></entry><entry><title>Quoting OpenAI Codex base_instructions</title><link href="https://simonwillison.net/2026/Apr/28/openai-codex/#atom-tag" rel="alternate"/><published>2026-04-28T22:02:53+00:00</published><updated>2026-04-28T22:02:53+00:00</updated><id>https://simonwillison.net/2026/Apr/28/openai-codex/#atom-tag</id><summary type="html">
    &lt;blockquote cite="https://github.com/openai/codex/blob/66b0781502be5de3b1909525c987643b9e5e407d/codex-rs/models-manager/models.json#L55"&gt;&lt;p&gt;&lt;code&gt;Never talk about goblins, gremlins, raccoons, trolls, ogres, pigeons, or other animals or creatures unless it is absolutely and unambiguously relevant to the user's query.&lt;/code&gt;&lt;/p&gt;&lt;/blockquote&gt;
&lt;p class="cite"&gt;&amp;mdash; &lt;a href="https://github.com/openai/codex/blob/66b0781502be5de3b1909525c987643b9e5e407d/codex-rs/models-manager/models.json#L55"&gt;OpenAI Codex base_instructions&lt;/a&gt;, for GPT-5.5&lt;/p&gt;

    &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/ai"&gt;ai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/openai"&gt;openai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/prompt-engineering"&gt;prompt-engineering&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/generative-ai"&gt;generative-ai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/llms"&gt;llms&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/system-prompts"&gt;system-prompts&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/codex-cli"&gt;codex-cli&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/gpt"&gt;gpt&lt;/a&gt;&lt;/p&gt;



</summary><category term="ai"/><category term="openai"/><category term="prompt-engineering"/><category term="generative-ai"/><category term="llms"/><category term="system-prompts"/><category term="codex-cli"/><category term="gpt"/></entry><entry><title>A pelican for GPT-5.5 via the semi-official Codex backdoor API</title><link href="https://simonwillison.net/2026/Apr/23/gpt-5-5/#atom-tag" rel="alternate"/><published>2026-04-23T19:59:47+00:00</published><updated>2026-04-23T19:59:47+00:00</updated><id>https://simonwillison.net/2026/Apr/23/gpt-5-5/#atom-tag</id><summary type="html">
    &lt;p&gt;&lt;a href="https://openai.com/index/introducing-gpt-5-5/"&gt;GPT-5.5 is out&lt;/a&gt;. It's available in OpenAI Codex and is rolling out to paid ChatGPT subscribers. I've had some preview access and found it to be a fast, effective and highly capable model. As is usually the case these days, it's hard to put into words what's good about it - I ask it to build things and it builds exactly what I ask for!&lt;/p&gt;
&lt;p&gt;There's one notable omission from today's release - the API:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;API deployments require different safeguards and we are working closely with partners and customers on the safety and security requirements for serving it at scale. We'll bring GPT‑5.5 and GPT‑5.5 Pro to the API very soon.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;When I run my &lt;a href="https://simonwillison.net/tags/pelican-riding-a-bicycle/"&gt;pelican benchmark&lt;/a&gt; I always prefer to use an API, to avoid hidden system prompts in ChatGPT or other agent harnesses from impacting the results.&lt;/p&gt;
&lt;h4 id="the-openclaw-backdoor"&gt;The OpenClaw backdoor&lt;/h4&gt;
&lt;p&gt;One of the ongoing tension points in the AI world over the past few months has concerned how agent harnesses like OpenClaw and Pi interact with the APIs provided by the big providers.&lt;/p&gt;
&lt;p&gt;Both OpenAI and Anthropic offer popular monthly subscriptions which provide access to their models at a significant discount to their raw API.&lt;/p&gt;
&lt;p&gt;OpenClaw integrated directly with this mechanism, and was then &lt;a href="https://www.theverge.com/ai-artificial-intelligence/907074/anthropic-openclaw-claude-subscription-ban"&gt;blocked from doing so&lt;/a&gt; by Anthropic. This kicked off a whole thing. OpenAI - who recently hired OpenClaw creator Peter Steinberger - saw an opportunity for an easy karma win and announced that OpenClaw was welcome to continue integrating with OpenAI's subscriptions via the same mechanism used by their (open source) Codex CLI tool.&lt;/p&gt;
&lt;p&gt;Does this mean &lt;em&gt;anyone&lt;/em&gt; can write code that integrates with OpenAI's Codex-specific APIs to hook into those existing subscriptions?&lt;/p&gt;
&lt;p&gt;The other day &lt;a href="https://twitter.com/jeremyphoward/status/2046537816834965714"&gt;Jeremy Howard asked&lt;/a&gt;:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;Anyone know whether OpenAI officially supports the use of the &lt;code&gt;/backend-api/codex/responses&lt;/code&gt; endpoint that Pi and Opencode (IIUC) uses?&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;It turned out that on March 30th OpenAI's Romain Huet &lt;a href="https://twitter.com/romainhuet/status/2038699202834841962"&gt;had tweeted&lt;/a&gt;:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;We want people to be able to use Codex, and their ChatGPT subscription, wherever they like! That means in the app, in the terminal, but also in JetBrains, Xcode, OpenCode, Pi, and now Claude Code.&lt;/p&gt;
&lt;p&gt;That’s why Codex CLI and Codex app server are open source too! 🙂&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;And Peter Steinberger &lt;a href="https://twitter.com/steipete/status/2046775849769148838"&gt;replied to Jeremy&lt;/a&gt; that:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;OpenAI sub is officially supported.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;h4 id="llm-openai-via-codex"&gt;llm-openai-via-codex&lt;/h4&gt;
&lt;p&gt;So... I had Claude Code reverse-engineer the &lt;a href="https://github.com/openai/codex"&gt;openai/codex&lt;/a&gt; repo, figure out how authentication tokens were stored and build me &lt;a href="https://github.com/simonw/llm-openai-via-codex"&gt;llm-openai-via-codex&lt;/a&gt;, a new plugin for &lt;a href="https://llm.datasette.io/"&gt;LLM&lt;/a&gt; which picks up your existing Codex subscription and uses it to run prompts!&lt;/p&gt;
&lt;p&gt;(With hindsight I wish I'd used GPT-5.4 or the GPT-5.5 preview, it would have been funnier. I genuinely considered rewriting the project from scratch using Codex and GPT-5.5 for the sake of the joke, but decided not to spend any more time on this!)&lt;/p&gt;
&lt;p&gt;Here's how to use it:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;Install Codex CLI, buy an OpenAI plan, login to Codex&lt;/li&gt;
&lt;li&gt;Install LLM: &lt;code&gt;uv tool install llm&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;Install the new plugin: &lt;code&gt;llm install llm-openai-via-codex&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;Start prompting: &lt;code&gt;llm -m openai-codex/gpt-5.5 'Your prompt goes here'&lt;/code&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;All existing LLM features should also work - use &lt;code&gt;-a filepath.jpg/URL&lt;/code&gt; to attach an image, &lt;code&gt;llm chat -m openai-codex/gpt-5.5&lt;/code&gt; to start an ongoing chat, &lt;code&gt;llm logs&lt;/code&gt; to view logged conversations and &lt;code&gt;llm --tool ...&lt;/code&gt; to &lt;a href="https://llm.datasette.io/en/stable/tools.html"&gt;try it out with tool support&lt;/a&gt;.&lt;/p&gt;
&lt;h4 id="and-some-pelicans"&gt;And some pelicans&lt;/h4&gt;
&lt;p&gt;Let's generate a pelican!&lt;/p&gt;
&lt;div class="highlight highlight-source-shell"&gt;&lt;pre&gt;llm install llm-openai-via-codex
llm -m openai-codex/gpt-5.5 &lt;span class="pl-s"&gt;&lt;span class="pl-pds"&gt;'&lt;/span&gt;Generate an SVG of a pelican riding a bicycle&lt;span class="pl-pds"&gt;'&lt;/span&gt;&lt;/span&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;Here's &lt;a href="https://gist.github.com/simonw/edda1d98f7ba07fd95eeff473cb16634"&gt;what I got back&lt;/a&gt;:&lt;/p&gt;
&lt;p&gt;&lt;img src="https://static.simonwillison.net/static/2026/gpt-5.5-pelican.png" alt="It is a bit mangled to be honest - good beak, pelican body shapes are slightly weird, legs do at least extend to the pedals, bicycle frame is not quite right." style="max-width: 100%;" /&gt;&lt;/p&gt;
&lt;p&gt;I've seen better &lt;a href="https://simonwillison.net/2026/Mar/17/mini-and-nano/#pelicans"&gt;from GPT-5.4&lt;/a&gt;, so I tagged on &lt;code&gt;-o reasoning_effort xhigh&lt;/code&gt; and &lt;a href="https://gist.github.com/simonw/a6168e4165a258e4d664aeae8e602cc5"&gt;tried again&lt;/a&gt;:&lt;/p&gt;
&lt;p&gt;That one took almost four minutes to generate, but I think it's a much better effort.&lt;/p&gt;
&lt;p&gt;&lt;img src="https://static.simonwillison.net/static/2026/gpt-5.5-pelican-xhigh.png" alt="Pelican has gradients now, body is much better put together, bicycle is nearly the right shape albeit with one extra bar between pedals and front wheel, clearly a better image overall." style="max-width: 100%;" /&gt;&lt;/p&gt;
&lt;p&gt;If you compare the SVG code (&lt;a href="https://gist.github.com/simonw/edda1d98f7ba07fd95eeff473cb16634#response"&gt;default&lt;/a&gt;, &lt;a href="https://gist.github.com/simonw/a6168e4165a258e4d664aeae8e602cc5#response"&gt;xhigh&lt;/a&gt;) the &lt;code&gt;xhigh&lt;/code&gt; one took a very different approach, which is much more CSS-heavy - as demonstrated by those gradients. &lt;code&gt;xhigh&lt;/code&gt; used 9,322 reasoning tokens where the default used just 39.&lt;/p&gt;
&lt;h4 id="a-few-more-notes-on-gpt-5-5"&gt;A few more notes on GPT-5.5&lt;/h4&gt;
&lt;p&gt;One of the most notable things about GPT-5.5 is the pricing. Once it goes live in the API it's &lt;a href="https://openai.com/index/introducing-gpt-5-5/#availability-and-pricing"&gt;going to be priced&lt;/a&gt; at &lt;em&gt;twice&lt;/em&gt; the cost of GPT-5.4 - $5 per 1M input tokens and $30 per 1M output tokens, where 5.4 is $2.5 and $15.&lt;/p&gt;
&lt;p&gt;GPT-5.5 Pro will be even more: $30 per 1M input tokens and $180 per 1M output tokens.&lt;/p&gt;
&lt;p&gt;GPT-5.4 will remain available. At half the price of 5.5 this feels like 5.4 is to 5.5 as Claude Sonnet is to Claude Opus.&lt;/p&gt;
&lt;p&gt;Ethan Mollick has a &lt;a href="https://www.oneusefulthing.org/p/sign-of-the-future-gpt-55"&gt;detailed review of GPT-5.5&lt;/a&gt; where he put it (and GPT-5.5 Pro) through an array of interesting challenges. His verdict: the jagged frontier continues to hold, with GPT-5.5 excellent at some things and challenged by others in a way that remains difficult to predict.&lt;/p&gt;
    
        &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/ai"&gt;ai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/openai"&gt;openai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/generative-ai"&gt;generative-ai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/chatgpt"&gt;chatgpt&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/llms"&gt;llms&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/llm"&gt;llm&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/llm-pricing"&gt;llm-pricing&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/pelican-riding-a-bicycle"&gt;pelican-riding-a-bicycle&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/llm-reasoning"&gt;llm-reasoning&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/llm-release"&gt;llm-release&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/codex-cli"&gt;codex-cli&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/gpt"&gt;gpt&lt;/a&gt;&lt;/p&gt;
    

</summary><category term="ai"/><category term="openai"/><category term="generative-ai"/><category term="chatgpt"/><category term="llms"/><category term="llm"/><category term="llm-pricing"/><category term="pelican-riding-a-bicycle"/><category term="llm-reasoning"/><category term="llm-release"/><category term="codex-cli"/><category term="gpt"/></entry><entry><title>llm-openai-via-codex 0.1a0</title><link href="https://simonwillison.net/2026/Apr/23/llm-openai-via-codex/#atom-tag" rel="alternate"/><published>2026-04-23T19:22:29+00:00</published><updated>2026-04-23T19:22:29+00:00</updated><id>https://simonwillison.net/2026/Apr/23/llm-openai-via-codex/#atom-tag</id><summary type="html">
    
        &lt;p&gt;&lt;strong&gt;Release:&lt;/strong&gt; &lt;a href="https://github.com/simonw/llm-openai-via-codex/releases/tag/0.1a0"&gt;llm-openai-via-codex 0.1a0&lt;/a&gt;&lt;/p&gt;
        &lt;p&gt;Hijacks your Codex CLI credentials to make API calls with LLM, as described &lt;a href="https://simonwillison.net/2026/Apr/23/gpt-5-5/#llm-openai-via-codex"&gt;in my post about GPT-5.5&lt;/a&gt;.&lt;/p&gt;
    
    
        &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/openai"&gt;openai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/llm"&gt;llm&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/codex-cli"&gt;codex-cli&lt;/a&gt;&lt;/p&gt;
    

</summary><category term="openai"/><category term="llm"/><category term="codex-cli"/></entry><entry><title>Is Claude Code going to cost $100/month? Probably not - it's all very confusing</title><link href="https://simonwillison.net/2026/Apr/22/claude-code-confusion/#atom-tag" rel="alternate"/><published>2026-04-22T02:07:34+00:00</published><updated>2026-04-22T02:07:34+00:00</updated><id>https://simonwillison.net/2026/Apr/22/claude-code-confusion/#atom-tag</id><summary type="html">
    &lt;p&gt;Anthropic today quietly (as in &lt;em&gt;silently&lt;/em&gt;, no announcement anywhere at all) updated their &lt;a href="https://claude.com/pricing"&gt;claude.com/pricing&lt;/a&gt; page (but not their &lt;a href="https://support.claude.com/en/articles/11049762-choosing-a-claude-plan"&gt;Choosing a Claude plan page&lt;/a&gt;, which shows up first for me on Google) to add this tiny but significant detail (arrow is mine, &lt;a href="https://simonwillison.net/2026/Apr/22/claude-code-confusion/#they-reversed-it"&gt;and it's already reverted&lt;/a&gt;):&lt;/p&gt;
&lt;p&gt;&lt;img src="https://static.simonwillison.net/static/2026/anthropic-x.jpg" alt="Screenshot of the Claude pricing grid - Compare features across plans. Free, Pro, Max 5x and Max 20x all have the same features, with the exception of Claude Code which is on Max only and Claude Cowork which is on Pro and Max only. An arrow highlights the Claude Code for Pro cross." style="max-width: 100%;" /&gt;&lt;/p&gt;
&lt;p&gt;The &lt;a href="https://web.archive.org/web/20260421040656/claude.com/pricing"&gt;Internet Archive copy&lt;/a&gt; from yesterday shows a checkbox there. Claude Code used to be a feature of the $20/month Pro plan, but according to the new pricing page it is now exclusive to the $100/month or $200/month Max plans.&lt;/p&gt;
&lt;p&gt;&lt;em&gt;&lt;strong&gt;Update&lt;/strong&gt;: don't miss &lt;a href="https://simonwillison.net/2026/Apr/22/claude-code-confusion/#they-reversed-it"&gt;the update to this post&lt;/a&gt;, they've already changed course a few hours after this change went live.&lt;/em&gt;&lt;/p&gt;
&lt;p&gt;So what the heck is going on? Unsurprisingly, &lt;a href="https://www.reddit.com/r/ClaudeAI/comments/1srzhd7/psa_claude_pro_no_longer_lists_claude_code_as_an/"&gt;Reddit&lt;/a&gt; and &lt;a href="https://news.ycombinator.com/item?id=47854477"&gt;Hacker News&lt;/a&gt; and &lt;a href="https://twitter.com/i/trending/2046718768634589239"&gt;Twitter&lt;/a&gt; all caught fire.&lt;/p&gt;
&lt;p&gt;I didn't believe the screenshots myself when I first saw them - aside from the pricing grid I could find no announcement from Anthropic anywhere. Then Amol Avasare, Anthropic's Head of Growth, &lt;a href="https://twitter.com/TheAmolAvasare/status/2046724659039932830"&gt;tweeted&lt;/a&gt;:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;For clarity, we're running a small test on ~2% of new prosumer signups. Existing Pro and Max subscribers aren't affected.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;And that appears to be the closest we have had to official messaging from Anthropic.&lt;/p&gt;
&lt;p&gt;I don't buy the "~2% of new prosumer signups" thing, since everyone I've talked to is seeing the new pricing grid and the Internet Archive has already &lt;a href="https://web.archive.org/web/20260422001250/https://claude.com/pricing"&gt;snapped a copy&lt;/a&gt;. Maybe he means that they'll only be running this version of the pricing grid for a limited time which somehow adds up to "2%" of signups?&lt;/p&gt;
&lt;p&gt;I'm also amused to see Claude Cowork remain available on the $20/month plan, because Claude Cowork is effectively a rebranded version of Claude Code wearing a less threatening hat!&lt;/p&gt;
&lt;p&gt;There are a whole bunch of things that are bad about this.&lt;/p&gt;
&lt;p&gt;If we assume this is indeed a test, and that test comes up negative and they decide not to go ahead with it, the damage has still been extensive:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;A whole lot of people got scared or angry or both that a service they relied on was about to be rug-pulled. There really is a significant difference between $20/month and $100/month for most people, especially outside of higher salary countries.&lt;/li&gt;
&lt;li&gt;The uncertainty is really bad! A tweet from an employee is &lt;em&gt;not&lt;/em&gt; the way to make an announcement like this. I wasted a solid hour of my afternoon trying to figure out what had happened here. My trust in Anthropic's transparency around pricing - a &lt;em&gt;crucial factor&lt;/em&gt; in how I understand their products - has been shaken.&lt;/li&gt;
&lt;li&gt;Strategically, should I be taking a bet on Claude Code if I know that they might 5x the minimum price of the product?&lt;/li&gt;
&lt;li&gt;More of a personal issue, but one I care deeply about myself: I invest a &lt;a href="https://simonwillison.net/tags/claude-code/"&gt;great deal of effort&lt;/a&gt; (that's 105 posts and counting) in teaching people how to use Claude Code. I don't want to invest that effort in a product that most people cannot afford to use.&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;Last month I ran &lt;a href="https://simonw.github.io/nicar-2026-coding-agents/"&gt;a tutorial for journalists&lt;/a&gt; on "Coding agents for data analysis" at the annual NICAR data journalism conference. I'm not going to be teaching that audience a course that depends on a $100/month subscription!&lt;/p&gt;
&lt;p&gt;This also doesn't make sense to me as a strategy for Anthropic. Claude Code &lt;em&gt;defined the category&lt;/em&gt; of coding agents. It's responsible for billions of dollars in annual revenue for Anthropic already. It has a stellar reputation, but I'm not convinced that reputation is strong enough for it to lose the $20/month trial and jump people directly to a $100/month subscription.&lt;/p&gt;
&lt;p&gt;OpenAI have been investing heavily in catching up to Claude Code with their Codex products. Anthropic just handed them this marketing opportunity on a plate - here's Codex engineering lead &lt;a href="https://twitter.com/thsottiaux/status/2046740759056162816"&gt;Thibault Sottiaux&lt;/a&gt;:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;I don't know what they are doing over there, but Codex will continue to be available both in the FREE and PLUS ($20) plans. We have the compute and efficient models to support it. For important changes, we will engage with the community well ahead of making them.&lt;/p&gt;
&lt;p&gt;Transparency and trust are two principles we will not break, even if it means momentarily earning less. A reminder that you vote with your subscription for the values you want to see in this world.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;I should note that I pay $200/month for Claude Max and I consider it well worth the money. I've had periods of free access in the past courtesy of Anthropic but I'm currently paying full price, and happy to do so.&lt;/p&gt;
&lt;p&gt;But I care about the accessibility of the tools that I work with and teach. If Codex has a free tier while Claude Code starts at $100/month I should obviously switch to Codex, because that way I can use the same tool as the people I want to teach how to use coding agents.&lt;/p&gt;
&lt;p&gt;Here's what I think happened. I think Anthropic are trying to optimize revenue growth - obviously - and someone pitched making Claude Code only available for Max and higher. That's clearly a bad idea, but "testing" culture says that it's worth putting even bad ideas out to test just in case they surprise you.&lt;/p&gt;
&lt;p&gt;So they started a test, without taking into account the wailing and gnashing of teeth that would result when their test was noticed - or accounting for the longer-term brand damage that would be caused.&lt;/p&gt;
&lt;p&gt;Or maybe they &lt;em&gt;did&lt;/em&gt; account for that, and decided it was worth the risk.&lt;/p&gt;
&lt;p&gt;I don't think that calculation was worthwhile. They're going to have to make a &lt;em&gt;very&lt;/em&gt; firm commitment along the lines of "we heard your feedback and we commit to keeping Claude Code available on our $20/month plan going forward" to regain my trust.&lt;/p&gt;
&lt;p&gt;As it stands, Codex is looking like a much safer bet for me to invest my time in learning and building educational materials around.&lt;/p&gt;
&lt;h4 id="they-reversed-it"&gt;Update: they've reversed it already&lt;/h4&gt;
&lt;p&gt;In the time I was &lt;em&gt;typing this blog entry&lt;/em&gt; Anthropic appear to have reversed course - the &lt;a href="https://claude.com/pricing"&gt;claude.com/pricing page&lt;/a&gt; now has a checkbox back in the Pro column for Claude Code. I can't find any official communication about it though.&lt;/p&gt;
&lt;p&gt;Let's see if they can come up with an explanation/apology that's convincing enough to offset the trust bonfire from this afternoon!&lt;/p&gt;
&lt;h4 id="update-2"&gt;Update 2: it may still affect 2% of signups?&lt;/h4&gt;
&lt;p&gt;Amol &lt;a href="https://x.com/TheAmolAvasare/status/2046788872517066971"&gt;on Twitter&lt;/a&gt;:&lt;/p&gt;&lt;blockquote&gt;&lt;p&gt;was a mistake that the logged-out landing page and docs were updated for this test [&lt;a href="https://twitter.com/TheAmolAvasare/status/2046783926920978681"&gt;embedded self-tweet&lt;/a&gt;]&lt;/p&gt;
&lt;blockquote&gt;&lt;p&gt;Getting lots of questions on why the landing page / docs were updated if only 2% of new signups were affected.&lt;/p&gt;

&lt;p&gt;This was understandably confusing for the 98% of folks not part of the experiment, and we've reverted both the landing page and docs changes.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;/blockquote&gt;
&lt;p&gt;So the experiment is still running, just not visible to the rest of the world?&lt;/p&gt;
    
        &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/ai"&gt;ai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/generative-ai"&gt;generative-ai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/llms"&gt;llms&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/anthropic"&gt;anthropic&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/llm-pricing"&gt;llm-pricing&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/ai-ethics"&gt;ai-ethics&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/coding-agents"&gt;coding-agents&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/claude-code"&gt;claude-code&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/codex-cli"&gt;codex-cli&lt;/a&gt;&lt;/p&gt;
    

</summary><category term="ai"/><category term="generative-ai"/><category term="llms"/><category term="anthropic"/><category term="llm-pricing"/><category term="ai-ethics"/><category term="coding-agents"/><category term="claude-code"/><category term="codex-cli"/></entry><entry><title>Thoughts on OpenAI acquiring Astral and uv/ruff/ty</title><link href="https://simonwillison.net/2026/Mar/19/openai-acquiring-astral/#atom-tag" rel="alternate"/><published>2026-03-19T16:45:15+00:00</published><updated>2026-03-19T16:45:15+00:00</updated><id>https://simonwillison.net/2026/Mar/19/openai-acquiring-astral/#atom-tag</id><summary type="html">
    &lt;p&gt;The big news this morning: &lt;a href="https://astral.sh/blog/openai"&gt;Astral to join OpenAI&lt;/a&gt; (on the Astral blog) and &lt;a href="https://openai.com/index/openai-to-acquire-astral/"&gt;OpenAI to acquire Astral&lt;/a&gt; (the OpenAI announcement). Astral are the company behind &lt;a href="https://simonwillison.net/tags/uv/"&gt;uv&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/ruff/"&gt;ruff&lt;/a&gt;, and &lt;a href="https://simonwillison.net/tags/ty/"&gt;ty&lt;/a&gt; - three increasingly load-bearing open source projects in the Python ecosystem. I have thoughts!&lt;/p&gt;
&lt;h4 id="the-official-line-from-openai-and-astral"&gt;The official line from OpenAI and Astral&lt;/h4&gt;
&lt;p&gt;The Astral team will become part of the Codex team at OpenAI.&lt;/p&gt;
&lt;p&gt;Charlie Marsh &lt;a href="https://astral.sh/blog/openai"&gt;has this to say&lt;/a&gt;:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;Open source is at the heart of that impact and the heart of that story; it sits at the center of everything we do. In line with our philosophy and &lt;a href="https://openai.com/index/openai-to-acquire-astral/"&gt;OpenAI's own announcement&lt;/a&gt;, OpenAI will continue supporting our open source tools after the deal closes. We'll keep building in the open, alongside our community -- and for the broader Python ecosystem -- just as we have from the start. [...]&lt;/p&gt;
&lt;p&gt;After joining the Codex team, we'll continue building our open source tools, explore ways they can work more seamlessly with Codex, and expand our reach to think more broadly about the future of software development.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;OpenAI's message &lt;a href="https://openai.com/index/openai-to-acquire-astral/"&gt;has a slightly different focus&lt;/a&gt; (highlights mine):&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;As part of our developer-first philosophy, after closing OpenAI plans to support Astral’s open source products. &lt;strong&gt;By bringing Astral’s tooling and engineering expertise to OpenAI, we will accelerate our work on Codex&lt;/strong&gt; and expand what AI can do across the software development lifecycle.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;This is a slightly confusing message. The &lt;a href="https://github.com/openai/codex"&gt;Codex CLI&lt;/a&gt; is a Rust application, and Astral have some of the best Rust engineers in the industry - &lt;a href="https://github.com/burntsushi"&gt;BurntSushi&lt;/a&gt; alone (&lt;a href="https://github.com/rust-lang/regex"&gt;Rust regex&lt;/a&gt;, &lt;a href="https://github.com/BurntSushi/ripgrep"&gt;ripgrep&lt;/a&gt;, &lt;a href="https://github.com/BurntSushi/jiff"&gt;jiff&lt;/a&gt;) may be worth the price of acquisition!&lt;/p&gt;
&lt;p&gt;So is this about the talent or about the product? I expect both, but I know from past experience that a product+talent acquisition can turn into a talent-only acquisition later on.&lt;/p&gt;
&lt;h4 id="uv-is-the-big-one"&gt;uv is the big one&lt;/h4&gt;
&lt;p&gt;Of Astral's projects the most impactful is &lt;a href="https://github.com/astral-sh/uv"&gt;uv&lt;/a&gt;. If you're not familiar with it, &lt;code&gt;uv&lt;/code&gt; is by far the most convincing solution to Python's environment management problems, best illustrated by &lt;a href="https://xkcd.com/1987/"&gt;this classic XKCD&lt;/a&gt;:&lt;/p&gt;
&lt;p style="text-align: center"&gt;&lt;img src="https://imgs.xkcd.com/comics/python_environment.png" alt="xkcd comic showing a tangled, chaotic flowchart of Python environment paths and installations. Nodes include &amp;quot;PIP&amp;quot;, &amp;quot;EASY_INSTALL&amp;quot;, &amp;quot;$PYTHONPATH&amp;quot;, &amp;quot;ANACONDA PYTHON&amp;quot;, &amp;quot;ANOTHER PIP??&amp;quot;, &amp;quot;HOMEBREW PYTHON (2.7)&amp;quot;, &amp;quot;OS PYTHON&amp;quot;, &amp;quot;HOMEBREW PYTHON (3.6)&amp;quot;, &amp;quot;PYTHON.ORG BINARY (2.6)&amp;quot;, and &amp;quot;(MISC FOLDERS OWNED BY ROOT)&amp;quot; connected by a mess of overlapping arrows. A stick figure with a &amp;quot;?&amp;quot; stands at the top left. Paths at the bottom include &amp;quot;/usr/local/Cellar&amp;quot;, &amp;quot;/usr/local/opt&amp;quot;, &amp;quot;/usr/local/lib/python3.6&amp;quot;, &amp;quot;/usr/local/lib/python2.7&amp;quot;, &amp;quot;/python/&amp;quot;, &amp;quot;/newenv/&amp;quot;, &amp;quot;$PATH&amp;quot;, &amp;quot;????&amp;quot;, and &amp;quot;/(A BUNCH OF PATHS WITH &amp;quot;FRAMEWORKS&amp;quot; IN THEM SOMEWHERE)/&amp;quot;. Caption reads: &amp;quot;MY PYTHON ENVIRONMENT HAS BECOME SO DEGRADED THAT MY LAPTOP HAS BEEN DECLARED A SUPERFUND SITE.&amp;quot;" style="max-width: 100%;" /&gt;&lt;/p&gt;
&lt;p&gt;Switch from &lt;code&gt;python&lt;/code&gt; to &lt;code&gt;uv run&lt;/code&gt; and most of these problems go away. I've been using it extensively for the past couple of years and it's become an essential part of my workflow.&lt;/p&gt;
&lt;p&gt;I'm not alone in this. According to PyPI Stats &lt;a href="https://pypistats.org/packages/uv"&gt;uv was downloaded&lt;/a&gt; more than 126 million times last month! Since its release in February 2024 - just two years ago - it's become one of the most popular tools for running Python code.&lt;/p&gt;
&lt;h4 id="ruff-and-ty"&gt;Ruff and ty&lt;/h4&gt;
&lt;p&gt;Astral's two other big projects are &lt;a href="https://github.com/astral-sh/ruff"&gt;ruff&lt;/a&gt; - a Python linter and formatter - and &lt;a href="https://github.com/astral-sh/ty"&gt;ty&lt;/a&gt; - a fast Python type checker.&lt;/p&gt;
&lt;p&gt;These are popular tools that provide a great developer experience but they aren't load-bearing in the same way that &lt;code&gt;uv&lt;/code&gt; is.&lt;/p&gt;
&lt;p&gt;They do however resonate well with coding agent tools like Codex - giving an agent access to fast linting and type checking tools can help improve the quality of the code they generate.&lt;/p&gt;
&lt;p&gt;I'm not convinced that integrating them &lt;em&gt;into&lt;/em&gt; the coding agent itself as opposed to telling it when to run them will make a meaningful difference, but I may just not be imaginative enough here.&lt;/p&gt;
&lt;h4 id="what-of-pyx-"&gt;What of pyx?&lt;/h4&gt;
&lt;p&gt;Ever since &lt;code&gt;uv&lt;/code&gt; started to gain traction the Python community has been worrying about the strategic risk of a single VC-backed company owning a key piece of Python infrastructure. I &lt;a href="https://simonwillison.net/2024/Sep/8/uv-under-discussion-on-mastodon/"&gt;wrote about&lt;/a&gt; one of those conversations in detail back in September 2024.&lt;/p&gt;
&lt;p&gt;The conversation back then focused on what Astral's business plan could be, which started to take form &lt;a href="https://simonwillison.net/2025/Aug/13/pyx/"&gt;in August 2025&lt;/a&gt; when they announced &lt;a href="https://astral.sh/pyx"&gt;pyx&lt;/a&gt;, their private PyPI-style package registry for organizations.&lt;/p&gt;
&lt;p&gt;I'm less convinced that pyx makes sense within OpenAI, and it's notably absent from both the Astral and OpenAI announcement posts.&lt;/p&gt;
&lt;h4 id="competitive-dynamics"&gt;Competitive dynamics&lt;/h4&gt;
&lt;p&gt;An interesting aspect of this deal is how it might impact the competition between Anthropic and OpenAI.&lt;/p&gt;
&lt;p&gt;Both companies spent most of 2025 focused on improving the coding ability of their models, resulting in the &lt;a href="https://simonwillison.net/tags/november-2025-inflection/"&gt;November 2025 inflection point&lt;/a&gt; when coding agents went from often-useful to almost-indispensable tools for software development.&lt;/p&gt;
&lt;p&gt;The competition between Anthropic's Claude Code and OpenAI's Codex is &lt;em&gt;fierce&lt;/em&gt;. Those $200/month subscriptions add up to billions of dollars a year in revenue, for companies that very much need that money.&lt;/p&gt;
&lt;p&gt;Anthropic &lt;a href="https://www.anthropic.com/news/anthropic-acquires-bun-as-claude-code-reaches-usd1b-milestone"&gt;acquired the Bun JavaScript runtime&lt;/a&gt; in December 2025, an acquisition that looks somewhat similar in shape to Astral.&lt;/p&gt;
&lt;p&gt;Bun was already a core component of Claude Code and that acquisition looked to mainly be about ensuring that a crucial dependency stayed actively maintained. Claude Code's performance has increased significantly since then thanks to the efforts of Bun's Jarred Sumner.&lt;/p&gt;
&lt;p&gt;One bad version of this deal would be if OpenAI start using their ownership of &lt;code&gt;uv&lt;/code&gt; as leverage in their competition with Anthropic.&lt;/p&gt;
&lt;h4 id="astral-s-quiet-series-a-and-b"&gt;Astral's quiet series A and B&lt;/h4&gt;
&lt;p&gt;One detail that caught my eye from Astral's announcement, in the section thanking the team, investors, and community:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;Second, to our investors, especially &lt;a href="https://www.accel.com/team/casey-aylward#bay-area"&gt;Casey Aylward&lt;/a&gt; from Accel, who led our Seed and Series A, and &lt;a href="https://a16z.com/author/jennifer-li/"&gt;Jennifer Li&lt;/a&gt; from Andreessen Horowitz, who led our Series B. As a first-time, technical, solo founder, you showed far more belief in me than I ever showed in myself, and I will never forget that.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;As far as I can tell neither the Series A nor the Series B were previously announced - I've only been able to find coverage of the original seed round &lt;a href="https://astral.sh/blog/announcing-astral-the-company-behind-ruff"&gt;from April 2023&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;Those investors presumably now get to exchange their stake in Astral for a piece of OpenAI. I wonder how much influence they had on Astral's decision to sell.&lt;/p&gt;
&lt;h4 id="forking-as-a-credible-exit-"&gt;Forking as a credible exit?&lt;/h4&gt;
&lt;p&gt;Armin Ronacher built &lt;a href="https://til.simonwillison.net/python/rye"&gt;Rye&lt;/a&gt;, which was later taken over by Astral and effectively merged with uv. In &lt;a href="https://lucumr.pocoo.org/2024/8/21/harvest-season/"&gt;August 2024&lt;/a&gt; he wrote about the risk involved in a VC-backed company owning a key piece of open source infrastructure and said the following (highlight mine):&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;However having seen the code and what uv is doing, &lt;strong&gt;even in the worst possible future this is a very forkable and maintainable thing&lt;/strong&gt;. I believe that even in case Astral shuts down or were to do something incredibly dodgy licensing wise, the community would be better off than before uv existed.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;Astral's own Douglas Creager &lt;a href="https://news.ycombinator.com/item?id=47438723#47439974"&gt;emphasized this angle on Hacker News today&lt;/a&gt;:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;All I can say is that &lt;em&gt;right now&lt;/em&gt;, we're committed to maintaining our open-source tools with the same level of effort, care, and attention to detail as before. That does not change with this acquisition. No one can guarantee how motives, incentives, and decisions might change years down the line. But that's why we bake optionality into it with the tools being permissively licensed. That makes the worst-case scenarios have the shape of "fork and move on", and not "software disappears forever".&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;I like and trust the Astral team and I'm optimistic that their projects will be well-maintained in their new home.&lt;/p&gt;
&lt;p&gt;OpenAI don't yet have much of a track record with respect to acquiring and maintaining open source projects. They've been on a bit of an acquisition spree over the past three months though, snapping up &lt;a href="https://openai.com/index/openai-to-acquire-promptfoo/"&gt;Promptfoo&lt;/a&gt; and &lt;a href="https://steipete.me/posts/2026/openclaw"&gt;OpenClaw&lt;/a&gt; (sort-of, they hired creator Peter Steinberger and are spinning OpenClaw off to a foundation), plus closed source LaTeX platform &lt;a href="https://openai.com/index/introducing-prism/"&gt;Crixet (now Prism)&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;If things do go south for &lt;code&gt;uv&lt;/code&gt; and the other Astral projects we'll get to see how credible the forking exit strategy turns out to be.&lt;/p&gt;
    
        &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/python"&gt;python&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/ai"&gt;ai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/rust"&gt;rust&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/openai"&gt;openai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/ruff"&gt;ruff&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/uv"&gt;uv&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/astral"&gt;astral&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/charlie-marsh"&gt;charlie-marsh&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/coding-agents"&gt;coding-agents&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/codex-cli"&gt;codex-cli&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/ty"&gt;ty&lt;/a&gt;&lt;/p&gt;
    

</summary><category term="python"/><category term="ai"/><category term="rust"/><category term="openai"/><category term="ruff"/><category term="uv"/><category term="astral"/><category term="charlie-marsh"/><category term="coding-agents"/><category term="codex-cli"/><category term="ty"/></entry><entry><title>Use subagents and custom agents in Codex</title><link href="https://simonwillison.net/2026/Mar/16/codex-subagents/#atom-tag" rel="alternate"/><published>2026-03-16T23:03:56+00:00</published><updated>2026-03-16T23:03:56+00:00</updated><id>https://simonwillison.net/2026/Mar/16/codex-subagents/#atom-tag</id><summary type="html">
    
&lt;p&gt;&lt;strong&gt;&lt;a href="https://developers.openai.com/codex/subagents"&gt;Use subagents and custom agents in Codex&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;
Subagents were announced in general availability today for OpenAI Codex, after several weeks of preview behind a feature flag.&lt;/p&gt;
&lt;p&gt;They're very similar to the Claude Code implementation, with default subagents for "explorer", "worker" and "default". It's unclear to me what the difference between "worker" and "default" is but based on their CSV example I think "worker" is intended for running large numbers of small tasks in parallel.&lt;/p&gt;
&lt;p&gt;Codex also lets you define custom agents as TOML files in &lt;code&gt;~/.codex/agents/&lt;/code&gt;. These can have custom instructions and be assigned to use specific models - including &lt;code&gt;gpt-5.3-codex-spark&lt;/code&gt; if you want &lt;a href="https://simonwillison.net/2026/Feb/12/codex-spark/"&gt;some raw speed&lt;/a&gt;. They can then be referenced by name, as demonstrated by this example prompt from the documentation:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;&lt;code&gt;Investigate why the settings modal fails to save. Have browser_debugger reproduce it, code_mapper trace the responsible code path, and ui_fixer implement the smallest fix once the failure mode is clear.&lt;/code&gt;&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;The subagents pattern is widely supported in coding agents now. Here's documentation across a number of different platforms:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href="https://developers.openai.com/codex/subagents/"&gt;OpenAI Codex subagents&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://code.claude.com/docs/en/sub-agents"&gt;Claude Code subagents&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://geminicli.com/docs/core/subagents/"&gt;Gemini CLI subagents&lt;/a&gt; (experimental)&lt;/li&gt;
&lt;li&gt;&lt;a href="https://docs.mistral.ai/mistral-vibe/agents-skills#agent-selection"&gt;Mistral Vibe subagents&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://opencode.ai/docs/agents/"&gt;OpenCode agents&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://code.visualstudio.com/docs/copilot/agents/subagents"&gt;Subagents in Visual Studio Code&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://cursor.com/docs/subagents"&gt;Cursor Subagents&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;strong&gt;Update&lt;/strong&gt;: I added &lt;a href="https://simonwillison.net/guides/agentic-engineering-patterns/subagents/"&gt;a chapter on Subagents&lt;/a&gt; to my Agentic Engineering Patterns guide.

    &lt;p&gt;&lt;small&gt;&lt;/small&gt;Via &lt;a href="https://twitter.com/OpenAIDevs/status/2033636701848174967"&gt;@OpenAIDevs&lt;/a&gt;&lt;/small&gt;&lt;/p&gt;


    &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/ai"&gt;ai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/openai"&gt;openai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/generative-ai"&gt;generative-ai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/llms"&gt;llms&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/coding-agents"&gt;coding-agents&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/codex-cli"&gt;codex-cli&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/parallel-agents"&gt;parallel-agents&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/agentic-engineering"&gt;agentic-engineering&lt;/a&gt;&lt;/p&gt;



</summary><category term="ai"/><category term="openai"/><category term="generative-ai"/><category term="llms"/><category term="coding-agents"/><category term="codex-cli"/><category term="parallel-agents"/><category term="agentic-engineering"/></entry><entry><title>Coding agents for data analysis</title><link href="https://simonwillison.net/2026/Mar/16/coding-agents-for-data-analysis/#atom-tag" rel="alternate"/><published>2026-03-16T20:12:32+00:00</published><updated>2026-03-16T20:12:32+00:00</updated><id>https://simonwillison.net/2026/Mar/16/coding-agents-for-data-analysis/#atom-tag</id><summary type="html">
    
&lt;p&gt;&lt;strong&gt;&lt;a href="https://simonw.github.io/nicar-2026-coding-agents/"&gt;Coding agents for data analysis&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;
Here's the handout I prepared for my NICAR 2026 workshop "Coding agents for data analysis" - a three hour session aimed at data journalists demonstrating ways that tools like Claude Code and OpenAI Codex can be used to explore, analyze and clean data.&lt;/p&gt;
&lt;p&gt;Here's the table of contents:&lt;/p&gt;
&lt;blockquote&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href="https://simonw.github.io/nicar-2026-coding-agents/coding-agents.html"&gt;Coding agents&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://simonw.github.io/nicar-2026-coding-agents/warmup.html"&gt;Warmup: ChatGPT and Claude&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://simonw.github.io/nicar-2026-coding-agents/setup.html"&gt;Setup Claude Code and Codex&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://simonw.github.io/nicar-2026-coding-agents/asking-questions.html"&gt;Asking questions against a database&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://simonw.github.io/nicar-2026-coding-agents/exploring-data.html"&gt;Exploring data with agents&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://simonw.github.io/nicar-2026-coding-agents/cleaning-trees.html"&gt;Cleaning data: decoding neighborhood codes&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://simonw.github.io/nicar-2026-coding-agents/visualizations.html"&gt;Creating visualizations with agents&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://simonw.github.io/nicar-2026-coding-agents/scraping.html"&gt;Scraping data with agents&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/blockquote&gt;
&lt;p&gt;I ran the workshop using GitHub Codespaces and OpenAI Codex, since it was easy (and inexpensive) to distribute a budget-restricted API key for Codex that attendees could use during the class. Participants ended up burning $23 of Codex tokens.&lt;/p&gt;
&lt;p&gt;The exercises all used Python and SQLite and some of them used Datasette.&lt;/p&gt;
&lt;p&gt;One highlight of the workshop was when we started &lt;a href="https://simonw.github.io/nicar-2026-coding-agents/visualizations.html#javascript-visualizations"&gt;running Datasette&lt;/a&gt; such that it served static content from a &lt;code&gt;viz/&lt;/code&gt; folder, then had Claude Code start vibe coding new interactive visualizations directly in that folder. Here's a heat map it created for my trees database using Leaflet and &lt;a href="https://github.com/Leaflet/Leaflet.heat"&gt;Leaflet.heat&lt;/a&gt;, &lt;a href="https://gist.github.com/simonw/985ae2a6a3cd3df3fd375eb58dabea0f"&gt;source code here&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;&lt;img alt="Screenshot of a &amp;quot;Trees SQL Map&amp;quot; web application with the heading &amp;quot;Trees SQL Map&amp;quot; and subheading &amp;quot;Run a query and render all returned points as a heat map. The default query targets roughly 200,000 trees.&amp;quot; Below is an input field containing &amp;quot;/trees/-/query.json&amp;quot;, a &amp;quot;Run Query&amp;quot; button, and a SQL query editor with the text &amp;quot;SELECT cast(Latitude AS float) AS latitude, cast(Longitude AS float) AS longitude, CASE WHEN DBH IS NULL OR DBH = '' THEN 0.3 WHEN cast(DBH AS float) &amp;lt;= 0 THEN 0.3 WHEN cast(DBH AS float) &amp;gt;= 80 THEN 1.0&amp;quot; (query is truncated). A status message reads &amp;quot;Loaded 1,000 rows and plotted 1,000 points as heat map.&amp;quot; Below is a Leaflet/OpenStreetMap interactive map of San Francisco showing a heat map overlay of tree locations, with blue/green clusters concentrated in areas like the Richmond District, Sunset District, and other neighborhoods. Map includes zoom controls and a &amp;quot;Leaflet | © OpenStreetMap contributors&amp;quot; attribution." src="https://static.simonwillison.net/static/2026/tree-sql-map.jpg" /&gt;&lt;/p&gt;
&lt;p&gt;I designed the handout to also be useful for people who weren't able to attend the session in person. As is usually the case, material aimed at data journalists is equally applicable to anyone else with data to explore.


    &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/data-journalism"&gt;data-journalism&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/geospatial"&gt;geospatial&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/python"&gt;python&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/speaking"&gt;speaking&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/sqlite"&gt;sqlite&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/ai"&gt;ai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/datasette"&gt;datasette&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/generative-ai"&gt;generative-ai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/llms"&gt;llms&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/github-codespaces"&gt;github-codespaces&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/nicar"&gt;nicar&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/coding-agents"&gt;coding-agents&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/claude-code"&gt;claude-code&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/codex-cli"&gt;codex-cli&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/leaflet"&gt;leaflet&lt;/a&gt;&lt;/p&gt;



</summary><category term="data-journalism"/><category term="geospatial"/><category term="python"/><category term="speaking"/><category term="sqlite"/><category term="ai"/><category term="datasette"/><category term="generative-ai"/><category term="llms"/><category term="github-codespaces"/><category term="nicar"/><category term="coding-agents"/><category term="claude-code"/><category term="codex-cli"/><category term="leaflet"/></entry><entry><title>Codex for Open Source</title><link href="https://simonwillison.net/2026/Mar/7/codex-for-open-source/#atom-tag" rel="alternate"/><published>2026-03-07T18:13:39+00:00</published><updated>2026-03-07T18:13:39+00:00</updated><id>https://simonwillison.net/2026/Mar/7/codex-for-open-source/#atom-tag</id><summary type="html">
    
&lt;p&gt;&lt;strong&gt;&lt;a href="https://developers.openai.com/codex/community/codex-for-oss"&gt;Codex for Open Source&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;
Anthropic announced six months of free Claude Max for maintainers of popular open source projects (5,000+ stars or 1M+ NPM downloads) &lt;a href="https://simonwillison.net/2026/Feb/27/claude-max-oss-six-months/"&gt;on 27th February&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;Now OpenAI have launched their comparable offer: six months of ChatGPT Pro (same $200/month price as Claude Max) with Codex and "conditional access to Codex Security" for core maintainers.&lt;/p&gt;
&lt;p&gt;Unlike Anthropic they don't hint at the exact metrics they care about, but the &lt;a href="https://openai.com/form/codex-for-oss/"&gt;application form&lt;/a&gt; does ask for "information such as GitHub stars, monthly downloads, or why the project is important to the ecosystem."

    &lt;p&gt;&lt;small&gt;&lt;/small&gt;Via &lt;a href="https://twitter.com/openaidevs/status/2029998191043911955"&gt;@openaidevs&lt;/a&gt;&lt;/small&gt;&lt;/p&gt;


    &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/open-source"&gt;open-source&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/ai"&gt;ai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/openai"&gt;openai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/generative-ai"&gt;generative-ai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/llms"&gt;llms&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/codex-cli"&gt;codex-cli&lt;/a&gt;&lt;/p&gt;



</summary><category term="open-source"/><category term="ai"/><category term="openai"/><category term="generative-ai"/><category term="llms"/><category term="codex-cli"/></entry><entry><title>How I think about Codex</title><link href="https://simonwillison.net/2026/Feb/22/how-i-think-about-codex/#atom-tag" rel="alternate"/><published>2026-02-22T15:53:43+00:00</published><updated>2026-02-22T15:53:43+00:00</updated><id>https://simonwillison.net/2026/Feb/22/how-i-think-about-codex/#atom-tag</id><summary type="html">
    
&lt;p&gt;&lt;strong&gt;&lt;a href="https://www.linkedin.com/pulse/how-i-think-codex-gabriel-chua-ukhic"&gt;How I think about Codex&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;
Gabriel Chua (Developer Experience Engineer for APAC at OpenAI) provides his take on the confusing terminology behind the term "Codex", which can refer to a bunch of of different things within the OpenAI ecosystem:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;In plain terms, Codex is OpenAI’s software engineering agent, available through multiple interfaces, and an agent is a model plus instructions and tools, wrapped in a runtime that can execute tasks on your behalf. [...]&lt;/p&gt;
&lt;p&gt;At a high level, I see Codex as three parts working together:&lt;/p&gt;
&lt;p&gt;&lt;em&gt;Codex = Model + Harness + Surfaces&lt;/em&gt; [...]&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Model + Harness = the Agent&lt;/li&gt;
&lt;li&gt;Surfaces = how you interact with the Agent&lt;/li&gt;
&lt;/ul&gt;
&lt;/blockquote&gt;
&lt;p&gt;He defines the harness as "the collection of instructions and tools", which is notably open source and lives in the &lt;a href="https://github.com/openai/codex"&gt;openai/codex&lt;/a&gt; repository.&lt;/p&gt;
&lt;p&gt;Gabriel also provides the first acknowledgment I've seen from an OpenAI insider that the Codex model family are directly trained for the Codex harness:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;Codex models are trained in the presence of the harness. Tool use, execution loops, compaction, and iterative verification aren’t bolted on behaviors — they’re part of how the model learns to operate. The harness, in turn, is shaped around how the model plans, invokes tools, and recovers from failure.&lt;/p&gt;
&lt;/blockquote&gt;


    &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/definitions"&gt;definitions&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/openai"&gt;openai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/generative-ai"&gt;generative-ai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/llms"&gt;llms&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/ai-assisted-programming"&gt;ai-assisted-programming&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/codex-cli"&gt;codex-cli&lt;/a&gt;&lt;/p&gt;



</summary><category term="definitions"/><category term="openai"/><category term="generative-ai"/><category term="llms"/><category term="ai-assisted-programming"/><category term="codex-cli"/></entry><entry><title>Introducing GPT‑5.3‑Codex‑Spark</title><link href="https://simonwillison.net/2026/Feb/12/codex-spark/#atom-tag" rel="alternate"/><published>2026-02-12T21:16:07+00:00</published><updated>2026-02-12T21:16:07+00:00</updated><id>https://simonwillison.net/2026/Feb/12/codex-spark/#atom-tag</id><summary type="html">
    
&lt;p&gt;&lt;strong&gt;&lt;a href="https://openai.com/index/introducing-gpt-5-3-codex-spark/"&gt;Introducing GPT‑5.3‑Codex‑Spark&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;
OpenAI announced a partnership with Cerebras &lt;a href="https://openai.com/index/cerebras-partnership/"&gt;on January 14th&lt;/a&gt;. Four weeks later they're already launching the first integration, "an ultra-fast model for real-time coding in Codex".&lt;/p&gt;
&lt;p&gt;Despite being named GPT-5.3-Codex-Spark it's not purely an accelerated alternative to GPT-5.3-Codex - the blog post calls it "a smaller version of GPT‑5.3-Codex" and clarifies that "at launch, Codex-Spark has a 128k context window and is text-only."&lt;/p&gt;
&lt;p&gt;I had some preview access to this model and I can confirm that it's significantly faster than their other models.&lt;/p&gt;
&lt;p&gt;Here's what that speed looks like running in Codex CLI:&lt;/p&gt;
&lt;div style="max-width: 100%;"&gt;
    &lt;video 
        controls 
        preload="none"
        poster="https://static.simonwillison.net/static/2026/gpt-5.3-codex-spark-medium-last.jpg"
        style="width: 100%; height: auto;"&gt;
        &lt;source src="https://static.simonwillison.net/static/2026/gpt-5.3-codex-spark-medium.mp4" type="video/mp4"&gt;
    &lt;/video&gt;
&lt;/div&gt;

&lt;p&gt;That was the "Generate an SVG of a pelican riding a bicycle" prompt - here's the rendered result:&lt;/p&gt;
&lt;p&gt;&lt;img alt="Whimsical flat illustration of an orange duck merged with a bicycle, where the duck's body forms the seat and frame area while its head extends forward over the handlebars, set against a simple light blue sky and green grass background." src="https://static.simonwillison.net/static/2026/gpt-5.3-codex-spark-pelican.png" /&gt;&lt;/p&gt;
&lt;p&gt;Compare that to the speed of regular GPT-5.3 Codex medium:&lt;/p&gt;
&lt;div style="max-width: 100%;"&gt;
    &lt;video 
        controls 
        preload="none"
        poster="https://static.simonwillison.net/static/2026/gpt-5.3-codex-medium-last.jpg"
        style="width: 100%; height: auto;"&gt;
        &lt;source src="https://static.simonwillison.net/static/2026/gpt-5.3-codex-medium.mp4" type="video/mp4"&gt;
    &lt;/video&gt;
&lt;/div&gt;

&lt;p&gt;Significantly slower, but the pelican is a lot better:&lt;/p&gt;
&lt;p&gt;&lt;img alt="Whimsical flat illustration of a white pelican riding a dark blue bicycle at speed, with motion lines behind it, its long orange beak streaming back in the wind, set against a light blue sky and green grass background." src="https://static.simonwillison.net/static/2026/gpt-5.3-codex-pelican.png" /&gt;&lt;/p&gt;
&lt;p&gt;What's interesting about this model isn't the quality though, it's the &lt;em&gt;speed&lt;/em&gt;. When a model responds this fast you can stay in flow state and iterate with the model much more productively.&lt;/p&gt;
&lt;p&gt;I showed a demo of Cerebras running Llama 3.1 70 B at 2,000 tokens/second against Val Town &lt;a href="https://simonwillison.net/2024/Oct/31/cerebras-coder/"&gt;back in October 2024&lt;/a&gt;. OpenAI claim 1,000 tokens/second for their new model, and I expect it will prove to be a ferociously useful partner for hands-on iterative coding sessions.&lt;/p&gt;
&lt;p&gt;It's not yet clear what the pricing will look like for this new model.


    &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/ai"&gt;ai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/openai"&gt;openai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/generative-ai"&gt;generative-ai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/llms"&gt;llms&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/cerebras"&gt;cerebras&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/pelican-riding-a-bicycle"&gt;pelican-riding-a-bicycle&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/llm-release"&gt;llm-release&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/codex-cli"&gt;codex-cli&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/llm-performance"&gt;llm-performance&lt;/a&gt;&lt;/p&gt;



</summary><category term="ai"/><category term="openai"/><category term="generative-ai"/><category term="llms"/><category term="cerebras"/><category term="pelican-riding-a-bicycle"/><category term="llm-release"/><category term="codex-cli"/><category term="llm-performance"/></entry><entry><title>Quoting Karel D'Oosterlinck</title><link href="https://simonwillison.net/2026/Feb/6/karel-doosterlinck/#atom-tag" rel="alternate"/><published>2026-02-06T00:42:22+00:00</published><updated>2026-02-06T00:42:22+00:00</updated><id>https://simonwillison.net/2026/Feb/6/karel-doosterlinck/#atom-tag</id><summary type="html">
    &lt;blockquote cite="https://twitter.com/kareldoostrlnck/status/2019477361557926281"&gt;&lt;p&gt;When I want to quickly implement a one-off experiment in a part of the codebase I am unfamiliar with, I get codex to do extensive due diligence. Codex explores relevant slack channels, reads related discussions, fetches experimental branches from those discussions, and cherry picks useful changes for my experiment. All of this gets summarized in an extensive set of notes, with links back to where each piece of information was found. Using these notes, codex wires the experiment and makes a bunch of hyperparameter decisions I couldn’t  possibly make without much more effort.&lt;/p&gt;&lt;/blockquote&gt;
&lt;p class="cite"&gt;&amp;mdash; &lt;a href="https://twitter.com/kareldoostrlnck/status/2019477361557926281"&gt;Karel D&amp;#x27;Oosterlinck&lt;/a&gt;, I spent $10,000 to automate my research at OpenAI with Codex&lt;/p&gt;

    &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/ai"&gt;ai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/openai"&gt;openai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/generative-ai"&gt;generative-ai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/llms"&gt;llms&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/ai-assisted-programming"&gt;ai-assisted-programming&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/coding-agents"&gt;coding-agents&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/codex-cli"&gt;codex-cli&lt;/a&gt;&lt;/p&gt;



</summary><category term="ai"/><category term="openai"/><category term="generative-ai"/><category term="llms"/><category term="ai-assisted-programming"/><category term="coding-agents"/><category term="codex-cli"/></entry><entry><title>Introducing the Codex app</title><link href="https://simonwillison.net/2026/Feb/2/introducing-the-codex-app/#atom-tag" rel="alternate"/><published>2026-02-02T19:54:36+00:00</published><updated>2026-02-02T19:54:36+00:00</updated><id>https://simonwillison.net/2026/Feb/2/introducing-the-codex-app/#atom-tag</id><summary type="html">
    
&lt;p&gt;&lt;strong&gt;&lt;a href="https://openai.com/index/introducing-the-codex-app/"&gt;Introducing the Codex app&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;
OpenAI just released a new macOS app for their Codex coding agent. I've had a few days of preview access - it's a solid app that provides a nice UI over the capabilities of the Codex CLI agent and adds some interesting new features, most notably first-class support for &lt;a href="https://developers.openai.com/codex/skills"&gt;Skills&lt;/a&gt;, and &lt;a href="https://developers.openai.com/codex/app/automations"&gt;Automations&lt;/a&gt; for running scheduled tasks.&lt;/p&gt;
&lt;p&gt;&lt;img alt="Screenshot of a macOS desktop application with a dark sidebar and light main content area. Left sidebar shows navigation items &amp;quot;New thread&amp;quot;, &amp;quot;Automations&amp;quot;, &amp;quot;Skills&amp;quot;, and a &amp;quot;Threads&amp;quot; section containing two project folders: &amp;quot;local-codex-scratch&amp;quot; with tasks &amp;quot;Reply to greeting task 2h&amp;quot; and &amp;quot;List Codex.app contents 3h&amp;quot;, and &amp;quot;shot-scraper&amp;quot; with tasks &amp;quot;Validate tests on dependency... 3h&amp;quot;, &amp;quot;Document shot-scraper CLI usage 18h&amp;quot; (highlighted), and &amp;quot;Run this: uv run pytest -x 1d&amp;quot;. Bottom left shows &amp;quot;Personal&amp;quot;. Top bar displays &amp;quot;Document shot-scraper CLI usage shot-scraper&amp;quot; with &amp;quot;Open&amp;quot; and &amp;quot;Commit&amp;quot; dropdown buttons. Main content shows a conversation: user message &amp;quot;Build a skill describing how to use the shot-scraper CLI tool&amp;quot; followed by assistant responses: &amp;quot;I'll use the skill-creator skill, scan the repo for existing shot-scraper docs/commands, then draft a concise skill in this workspace and sanity-check it.&amp;quot; then &amp;quot;I've confirmed the skill-creator scripts exist. Next I'll inspect the CLI entrypoint and docs, then scaffold a skill under skills/ in this repo.&amp;quot; then &amp;quot;I've scaffolded skills/shot-scraper-cli/. I'll now replace the template SKILL.md with a task-focused skill and add a compact CLI reference under references/.&amp;quot; then &amp;quot;I'll run the skill packager/validator next to ensure the new skill structure is valid.&amp;quot; then &amp;quot;Done — I created and packaged the skill. Key outputs: • Skill definition: SKILL.md • Compact reference: cli-reference.md • Packaged skill: shot-scraper-cli.skill&amp;quot;. Bottom shows input field with placeholder &amp;quot;Ask for follow-up changes&amp;quot;, &amp;quot;+ Custom Medium&amp;quot; dropdown, and &amp;quot;Local&amp;quot; and &amp;quot;main&amp;quot; branch indicators." src="https://static.simonwillison.net/static/2026/codex-app.jpg" /&gt;&lt;/p&gt;
&lt;p&gt;The app is built with Electron and Node.js. Automations track their state in a SQLite database - here's what that looks like if you explore it with &lt;code&gt;uvx datasette ~/.codex/sqlite/codex-dev.db&lt;/code&gt;:&lt;/p&gt;
&lt;p&gt;&lt;img alt="Database schema documentation on light gray background showing three tables: &amp;quot;automation_runs&amp;quot; (teal underlined link) with italic columns &amp;quot;thread_id, automation_id, status, read_at, thread_title, source_cwd, inbox_title, inbox_summary, created_at, updated_at, archived_user_message, archived_assistant_message, archived_reason&amp;quot; and &amp;quot;1 row&amp;quot;; &amp;quot;automations&amp;quot; (teal underlined link) with italic columns &amp;quot;id, name, prompt, status, next_run_at, last_run_at, cwds, rrule, created_at, updated_at&amp;quot; and &amp;quot;1 row&amp;quot;; &amp;quot;inbox_items&amp;quot; (teal underlined link) with italic columns &amp;quot;id, title, description, thread_id, read_at, created_at&amp;quot; and &amp;quot;0 rows&amp;quot;." src="https://static.simonwillison.net/static/2026/codex-dev-sqlite.jpg" /&gt;&lt;/p&gt;
&lt;p&gt;Here’s an interactive copy of that database &lt;a href="https://lite.datasette.io/?url=https%3A%2F%2Fgist.githubusercontent.com%2Fsimonw%2F274c4ecfaf959890011810e6881864fe%2Fraw%2F51fdf25c9426b76e9693ccc0d9254f64ceeef819%2Fcodex-dev.db#/codex-dev"&gt;in Datasette Lite&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;The announcement gives us a hint at some usage numbers for Codex overall - the holiday spike is notable:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;Since the launch of GPT‑5.2-Codex in mid-December, overall Codex usage has doubled, and in the past month, more than a million developers have used Codex.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;Automations are currently restricted in that they can only run when your laptop is powered on. OpenAI promise that cloud-based automations are coming soon, which will resolve this limitation.&lt;/p&gt;
&lt;p&gt;They chose Electron so they could target other operating systems in the future, with Windows “&lt;a href="https://news.ycombinator.com/item?id=46859054#46859673"&gt;coming very soon&lt;/a&gt;”. OpenAI’s Alexander Embiricos noted &lt;a href="https://news.ycombinator.com/item?id=46859054#46859693"&gt;on the Hacker News thread&lt;/a&gt; that:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;it's taking us some time to get really solid sandboxing working on Windows, where there are fewer OS-level primitives for it.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;Like Claude Code, Codex is really a general agent harness disguised as a tool for programmers. OpenAI acknowledge that here:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;Codex is built on a simple premise: everything is controlled by code. The better an agent is at reasoning about and producing code, the more capable it becomes across all forms of technical and knowledge work. [...] We’ve focused on making Codex the best coding agent, which has also laid the foundation for it to become a strong agent for a broad range of knowledge work tasks that extend beyond writing code.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;Claude Code had to &lt;a href="https://simonwillison.net/2026/Jan/12/claude-cowork/"&gt;rebrand to Cowork&lt;/a&gt; to better cover the general knowledge work case. OpenAI can probably get away with keeping the Codex name for both.&lt;/p&gt;
&lt;p&gt;OpenAI have made Codex available to free and &lt;a href="https://simonwillison.net/2026/Jan/16/chatgpt-ads/"&gt;Go&lt;/a&gt; plans for "a limited time" (update: Sam Altman &lt;a href="https://x.com/sama/status/2018437537103269909"&gt;says two months&lt;/a&gt;) during which they are also doubling the rate limits for paying users.


    &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/sandboxing"&gt;sandboxing&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/sqlite"&gt;sqlite&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/ai"&gt;ai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/datasette"&gt;datasette&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/electron"&gt;electron&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/openai"&gt;openai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/generative-ai"&gt;generative-ai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/llms"&gt;llms&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/ai-agents"&gt;ai-agents&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/coding-agents"&gt;coding-agents&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/codex-cli"&gt;codex-cli&lt;/a&gt;&lt;/p&gt;



</summary><category term="sandboxing"/><category term="sqlite"/><category term="ai"/><category term="datasette"/><category term="electron"/><category term="openai"/><category term="generative-ai"/><category term="llms"/><category term="ai-agents"/><category term="coding-agents"/><category term="codex-cli"/></entry><entry><title>One Human + One Agent = One Browser From Scratch</title><link href="https://simonwillison.net/2026/Jan/27/one-human-one-agent-one-browser/#atom-tag" rel="alternate"/><published>2026-01-27T16:58:08+00:00</published><updated>2026-01-27T16:58:08+00:00</updated><id>https://simonwillison.net/2026/Jan/27/one-human-one-agent-one-browser/#atom-tag</id><summary type="html">
    
&lt;p&gt;&lt;strong&gt;&lt;a href="https://emsh.cat/one-human-one-agent-one-browser/"&gt;One Human + One Agent = One Browser From Scratch&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;
embedding-shapes was &lt;a href="https://emsh.cat/cursor-implied-success-without-evidence/"&gt;so infuriated&lt;/a&gt; by the hype around Cursor's &lt;a href="https://simonwillison.net/2026/Jan/23/fastrender/"&gt;FastRender browser project&lt;/a&gt; - thousands of parallel agents producing ~1.6 million lines of Rust - that they were inspired to take a go at building a web browser using coding agents themselves.&lt;/p&gt;
&lt;p&gt;The result is &lt;a href="https://github.com/embedding-shapes/one-agent-one-browser"&gt;one-agent-one-browser&lt;/a&gt; and it's &lt;em&gt;really&lt;/em&gt; impressive. Over three days they drove a single Codex CLI agent to build 20,000 lines of Rust that successfully renders HTML+CSS with no Rust crate dependencies at all - though it does (reasonably) use Windows, macOS and Linux system frameworks for image and text rendering.&lt;/p&gt;
&lt;p&gt;I installed the &lt;a href="https://github.com/embedding-shapes/one-agent-one-browser/releases/tag/0.1.0"&gt;1MB macOS binary release&lt;/a&gt; and ran it against my blog:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;chmod 755 ~/Downloads/one-agent-one-browser-macOS-ARM64 
~/Downloads/one-agent-one-browser-macOS-ARM64 https://simonwillison.net/
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Here's the result:&lt;/p&gt;
&lt;p&gt;&lt;img alt="My blog rendered in a window. Everything is in the right place, the CSS gradients look good, the feed subscribe SVG icon is rendered correctly but there's a missing PNG image." src="https://static.simonwillison.net/static/2026/one-agent-simonwillison.jpg" /&gt;&lt;/p&gt;
&lt;p&gt;It even rendered my SVG feed subscription icon! A PNG image is missing from the page, which looks like an intermittent bug (there's code to render PNGs).&lt;/p&gt;
&lt;p&gt;The code is pretty readable too - here's &lt;a href="https://github.com/embedding-shapes/one-agent-one-browser/blob/0.1.0/src/layout/flex.rs"&gt;the flexbox implementation&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;I had thought that "build a web browser" was the ideal prompt to really stretch the capabilities of coding agents - and that it would take sophisticated multi-agent harnesses (as seen in the Cursor project) and millions of lines of code to achieve.&lt;/p&gt;
&lt;p&gt;Turns out one agent driven by a talented engineer, three days and 20,000 lines of Rust is enough to get a very solid basic renderer working!&lt;/p&gt;
&lt;p&gt;I'm going to upgrade my &lt;a href="https://simonwillison.net/2026/Jan/8/llm-predictions-for-2026/#3-years-someone-will-build-a-new-browser-using-mainly-ai-assisted-coding-and-it-won-t-even-be-a-surprise"&gt;prediction for 2029&lt;/a&gt;: I think we're going to get a &lt;em&gt;production-grade&lt;/em&gt; web browser built by a small team using AI assistance by then.

    &lt;p&gt;&lt;small&gt;&lt;/small&gt;Via &lt;a href="https://news.ycombinator.com/item?id=46779522"&gt;Show Hacker News&lt;/a&gt;&lt;/small&gt;&lt;/p&gt;


    &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/browsers"&gt;browsers&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/predictions"&gt;predictions&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/ai"&gt;ai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/rust"&gt;rust&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/generative-ai"&gt;generative-ai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/llms"&gt;llms&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/ai-assisted-programming"&gt;ai-assisted-programming&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/coding-agents"&gt;coding-agents&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/codex-cli"&gt;codex-cli&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/browser-challenge"&gt;browser-challenge&lt;/a&gt;&lt;/p&gt;



</summary><category term="browsers"/><category term="predictions"/><category term="ai"/><category term="rust"/><category term="generative-ai"/><category term="llms"/><category term="ai-assisted-programming"/><category term="coding-agents"/><category term="codex-cli"/><category term="browser-challenge"/></entry><entry><title>Introducing GPT-5.2-Codex</title><link href="https://simonwillison.net/2025/Dec/19/introducing-gpt-52-codex/#atom-tag" rel="alternate"/><published>2025-12-19T05:21:17+00:00</published><updated>2025-12-19T05:21:17+00:00</updated><id>https://simonwillison.net/2025/Dec/19/introducing-gpt-52-codex/#atom-tag</id><summary type="html">
    
&lt;p&gt;&lt;strong&gt;&lt;a href="https://openai.com/index/introducing-gpt-5-2-codex/"&gt;Introducing GPT-5.2-Codex&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;
The latest in OpenAI's &lt;a href="https://simonwillison.net/tags/gpt-codex/"&gt;Codex family of models&lt;/a&gt; (not the same thing as their Codex CLI or Codex Cloud coding agent tools).&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;GPT‑5.2-Codex is a version of &lt;a href="https://openai.com/index/introducing-gpt-5-2/"&gt;GPT‑5.2⁠&lt;/a&gt; further optimized for agentic coding in Codex, including improvements on long-horizon work through context compaction, stronger performance on large code changes like refactors and migrations, improved performance in Windows environments, and significantly stronger cybersecurity capabilities.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;As with some previous Codex models this one is available via their Codex coding agents now and will be coming to the API "in the coming weeks". Unlike previous models there's a new invite-only preview process for vetted cybersecurity professionals for "more permissive models".&lt;/p&gt;
&lt;p&gt;I've been very impressed recently with GPT 5.2's ability to &lt;a href="https://simonwillison.net/2025/Dec/15/porting-justhtml/"&gt;tackle multi-hour agentic coding challenges&lt;/a&gt;. 5.2 Codex scores 64% on the Terminal-Bench 2.0 benchmark that GPT-5.2 scored 62.2% on. I'm not sure how concrete that 1.8% improvement will be!&lt;/p&gt;
&lt;p&gt;I didn't hack API access together this time (see &lt;a href="https://simonwillison.net/2025/Nov/9/gpt-5-codex-mini/"&gt;previous attempts&lt;/a&gt;), instead opting to just ask Codex CLI to "Generate an SVG of a pelican riding a bicycle" while running the new model (effort medium). &lt;a href="https://tools.simonwillison.net/codex-timeline?url=https://gist.githubusercontent.com/simonw/10ad81e82889a97a7d28827e0ea6d768/raw/d749473b37d86d519b4c3fa0892b5e54b5941b38/rollout-2025-12-18T16-09-10-019b33f0-6111-7840-89b0-aedf755a6e10.jsonl#tz=local&amp;amp;q=&amp;amp;type=all&amp;amp;payload=all&amp;amp;role=all&amp;amp;hide=1&amp;amp;truncate=1&amp;amp;sel=3"&gt;Here's the transcript&lt;/a&gt; in my new Codex CLI timeline viewer, and here's the pelican it drew:&lt;/p&gt;
&lt;p&gt;&lt;img alt="Alt text by GPT-5.2-Codex: A minimalist illustration of a white pelican with a large orange beak riding a teal bicycle across a sandy strip of ground. The pelican leans forward as if pedaling, its wings tucked back and legs reaching toward the pedals. Simple gray motion lines trail behind it, and a pale yellow sun sits in the top‑right against a warm beige sky." src="https://static.simonwillison.net/static/2025/5.2-codex-pelican.png" /&gt;


    &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/ai"&gt;ai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/openai"&gt;openai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/generative-ai"&gt;generative-ai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/llms"&gt;llms&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/pelican-riding-a-bicycle"&gt;pelican-riding-a-bicycle&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/llm-release"&gt;llm-release&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/codex-cli"&gt;codex-cli&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/gpt-codex"&gt;gpt-codex&lt;/a&gt;&lt;/p&gt;



</summary><category term="ai"/><category term="openai"/><category term="generative-ai"/><category term="llms"/><category term="pelican-riding-a-bicycle"/><category term="llm-release"/><category term="codex-cli"/><category term="gpt-codex"/></entry><entry><title>I ported JustHTML from Python to JavaScript with Codex CLI and GPT-5.2 in 4.5 hours</title><link href="https://simonwillison.net/2025/Dec/15/porting-justhtml/#atom-tag" rel="alternate"/><published>2025-12-15T23:58:38+00:00</published><updated>2025-12-15T23:58:38+00:00</updated><id>https://simonwillison.net/2025/Dec/15/porting-justhtml/#atom-tag</id><summary type="html">
    &lt;p&gt;I &lt;a href="https://simonwillison.net/2025/Dec/14/justhtml/"&gt;wrote about JustHTML yesterday&lt;/a&gt; - Emil Stenström's project to build a new standards compliant HTML5 parser in pure Python code using coding agents running against the comprehensive html5lib-tests testing library. Last night, purely out of curiosity, I decided to try &lt;strong&gt;porting JustHTML from Python to JavaScript&lt;/strong&gt; with the least amount of effort possible, using Codex CLI and GPT-5.2. It worked beyond my expectations.&lt;/p&gt;
&lt;h4 id="tl-dr"&gt;TL;DR&lt;/h4&gt;
&lt;p&gt;I built &lt;a href="https://github.com/simonw/justjshtml"&gt;simonw/justjshtml&lt;/a&gt;, a dependency-free HTML5 parsing library in JavaScript which passes 9,200 tests from the html5lib-tests suite and imitates the API design of Emil's JustHTML library.&lt;/p&gt;
&lt;p&gt;It took two initial prompts and a few tiny follow-ups. &lt;a href="https://simonwillison.net/2025/Dec/11/gpt-52/"&gt;GPT-5.2&lt;/a&gt; running in &lt;a href="https://github.com/openai/codex"&gt;Codex CLI&lt;/a&gt; ran uninterrupted for several hours, burned through 1,464,295 input tokens, 97,122,176 cached input tokens and 625,563 output tokens and ended up producing 9,000 lines of fully tested JavaScript across 43 commits.&lt;/p&gt;
&lt;p&gt;Time elapsed from project idea to finished library: about 4 hours, during which I also bought and decorated a Christmas tree with family and watched the latest Knives Out movie.&lt;/p&gt;
&lt;h4 id="some-background"&gt;Some background&lt;/h4&gt;
&lt;p&gt;One of the most important contributions of the HTML5 specification ten years ago was the way it precisely specified how &lt;em&gt;invalid&lt;/em&gt; HTML should be parsed. The world is full of invalid documents and having a specification that covers those means browsers can treat them in the same way - there's no more "undefined behavior" to worry about when building parsing software.&lt;/p&gt;
&lt;p&gt;Unsurprisingly, those invalid parsing rules are pretty complex! The free online book &lt;a href="https://htmlparser.info/"&gt;Idiosyncrasies of the HTML parser&lt;/a&gt; by Simon Pieters is an excellent deep dive into this topic, in particular &lt;a href="https://htmlparser.info/parser/"&gt;Chapter 3. The HTML parser&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;The Python &lt;a href="https://github.com/html5lib/html5lib-python"&gt;html5lib&lt;/a&gt; project started the &lt;a href="https://github.com/html5lib/html5lib-tests"&gt;html5lib-tests&lt;/a&gt; repository with a set of implementation-independent tests. These have since become the gold standard for interoperability testing of HTML5 parsers, and are used by projects such as &lt;a href="https://github.com/servo/servo"&gt;Servo&lt;/a&gt; which used them to help build &lt;a href="https://github.com/servo/html5ever"&gt;html5ever&lt;/a&gt;, a "high-performance browser-grade HTML5 parser" written in Rust.&lt;/p&gt;
&lt;p&gt;Emil Stenström's &lt;a href="https://github.com/EmilStenstrom/justhtml"&gt;JustHTML&lt;/a&gt; project is a pure-Python implementation of an HTML5 parser that passes the full html5lib-tests suite. Emil &lt;a href="https://friendlybit.com/python/writing-justhtml-with-coding-agents/"&gt;spent a couple of months&lt;/a&gt; working on this as a side project, deliberately picking a problem with a comprehensive existing test suite to see how far he could get with coding agents.&lt;/p&gt;
&lt;p&gt;At one point he had the agents rewrite it based on a close inspection of the Rust html5ever library. I don't know how much of this was direct translation versus inspiration (here's Emil's &lt;a href="https://news.ycombinator.com/item?id=46264195#46267059"&gt;commentary on that&lt;/a&gt;) - his project has 1,215 commits total so it appears to have included a huge amount of iteration, not just a straight port.&lt;/p&gt;
&lt;p&gt;My project &lt;strong&gt;is&lt;/strong&gt; a straight port. I instructed Codex CLI to build a JavaScript version of Emil's Python code.&lt;/p&gt;
&lt;h4 id="the-process-in-detail"&gt;The process in detail&lt;/h4&gt;
&lt;p&gt;I started with a bit of mise en place. I checked out two repos and created an empty third directory for the new project:&lt;/p&gt;
&lt;div class="highlight highlight-source-shell"&gt;&lt;pre&gt;&lt;span class="pl-c1"&gt;cd&lt;/span&gt; &lt;span class="pl-k"&gt;~&lt;/span&gt;/dev
git clone https://github.com/EmilStenstrom/justhtml
git clone https://github.com/html5lib/html5lib-tests
mkdir justjshtml
&lt;span class="pl-c1"&gt;cd&lt;/span&gt; justjshtml&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;Then I started Codex CLI for GPT-5.2 like this:&lt;/p&gt;
&lt;div class="highlight highlight-source-shell"&gt;&lt;pre&gt;codex --yolo -m gpt-5.2&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;That &lt;code&gt;--yolo&lt;/code&gt; flag is a shortcut for &lt;code&gt;--dangerously-bypass-approvals-and-sandbox&lt;/code&gt;, which is every bit as dangerous as it sounds.&lt;/p&gt;
&lt;p&gt;My first prompt told Codex to inspect the existing code and use it to build a specification for the new JavaScript library:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;&lt;code&gt;We are going to create a JavaScript port of ~/dev/justhtml - an HTML parsing library that passes the full ~/dev/html5lib-tests test suite. It is going to have a similar API to the Python library but in JavaScript. It will have no dependencies other than raw JavaScript, hence it will work great in the browser and node.js and other environments. Start by reading ~/dev/justhtml and designing the user-facing API for the new library - create a spec.md containing your plan.&lt;/code&gt;&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;I reviewed the spec, which included a set of proposed milestones, and told it to add another:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;&lt;code&gt;Add an early step to the roadmap that involves an initial version that parses a simple example document that is valid and returns the right results. Then add and commit the spec.md file.&lt;/code&gt;&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;Here's &lt;a href="https://github.com/simonw/justjshtml/blob/19b8eb1f2ca80f428a3c40862d5ec05d36e5166b/spec.md"&gt;the resulting spec.md file&lt;/a&gt;. My request for that initial version became "Milestone 0.5" which looked like this:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Milestone 0.5 — End-to-end smoke parse (single valid document)&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Implement the smallest end-to-end slice so the public API is real early:
&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;new JustHTML("&amp;lt;html&amp;gt;&amp;lt;head&amp;gt;&amp;lt;/head&amp;gt;&amp;lt;body&amp;gt;&amp;lt;p&amp;gt;Hello&amp;lt;/p&amp;gt;&amp;lt;/body&amp;gt;&amp;lt;/html&amp;gt;")&lt;/code&gt; returns a tree with the expected tag structure and text nodes.&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;doc.toText()&lt;/code&gt; returns &lt;code&gt;"Hello"&lt;/code&gt; and &lt;code&gt;doc.errors&lt;/code&gt; is empty for this valid input.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;Add &lt;code&gt;scripts/smoke.js&lt;/code&gt; (no deps) that runs the example and asserts the expected structure/output.&lt;/li&gt;
&lt;li&gt;Gate: &lt;code&gt;node scripts/smoke.js&lt;/code&gt; passes.&lt;/li&gt;
&lt;/ul&gt;
&lt;/blockquote&gt;
&lt;p&gt;Then I told it:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;&lt;code&gt;Implement Milestone 0.5&lt;/code&gt;&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;And off it went. The resulting code appeared to work so I said:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;&lt;code&gt;Configure GitHub Actions test.yml to run that on every commit, then commit and push&lt;/code&gt;&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;I ran &lt;code&gt;git init&lt;/code&gt; and created a private GitHub repository for this project at this point, and set up the local directory to push to that remote.&lt;/p&gt;
&lt;p&gt;Here's &lt;a href="https://github.com/simonw/justjshtml/commit/48497cbd05584c88af757a0736a2ee447468b6b3"&gt;that initial push&lt;/a&gt;. Then I told it:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;&lt;code&gt;OK do the rest, commit and push often&lt;/code&gt;&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;And that was almost it!&lt;/p&gt;
&lt;p&gt;I set my laptop to not fall asleep and left it to its devices while we went off to buy a Christmas tree. The "commit and push often" meant I could monitor its progress on my phone by refreshing &lt;a href="https://github.com/simonw/justjshtml/commits/main"&gt;the commit log on GitHub&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;I was running this against my $20/month ChatGPT Plus account, which has a five hour token allowance window for Codex CLI. That ran out at 6:35pm and Codex paused, so I waited until the reset point at 7:14pm and typed:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;&lt;code&gt;Keep going&lt;/code&gt;&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;At 9:30pm it declared itself done with the following summary message:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;• What’s Done

  - html5lib-tests now run in CI and pass: tokenizer 6810/6810, tree 1770/1782
    (12 skipped: #script-on), serializer 230/230, encoding 82/83 (1 skipped:
    scripted) (.github/workflows/test.yml:1).
  - Public API expanded: stream() (src/stream.js:1), CSS selectors query()/
    matches() (src/selector.js:1), and toMarkdown() (src/markdown.js:1), all
    exported via src/index.js:1.
  - Added no-deps unit scripts for selectors/stream/markdown (scripts/run-
    selector-tests.js:1, scripts/run-stream-tests.js:1, scripts/run-markdown-
    tests.js:1) and wired them into CI.
  - Fixed TreeBuilder so &amp;lt;br&amp;gt; no longer closes &amp;lt;p&amp;gt; (src/treebuilder.js:647).
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;As a finishing touch, I had it add a playground interface so I could try out the new library in my browser. I prompted:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;&lt;code&gt;Add a playground.html in the top level folder which loads the necessary ES modules from ./src/... and implements the exact same functionality as seen on https://tools.simonwillison.net/justhtml but using the JavaScript library instead of Pyodide&lt;/code&gt;&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;It fetched my &lt;a href="https://tools.simonwillison.net/justhtml"&gt;existing JustHTML playground page&lt;/a&gt; (&lt;a href="https://simonwillison.net/2025/Dec/14/justhtml/#first-impressions-of-justhtml"&gt;described here&lt;/a&gt;) using &lt;code&gt;curl&lt;/code&gt; and built a new &lt;code&gt;playground.html&lt;/code&gt; file that loaded the new JavaScript code instead. This worked &lt;em&gt;perfectly&lt;/em&gt;.&lt;/p&gt;

&lt;p&gt;I enabled GitHub Pages for my still-private repo which meant I could access the new playground at this URL:&lt;/p&gt;
&lt;p&gt;&lt;a href="https://simonw.github.io/justjshtml/playground.html"&gt;https://simonw.github.io/justjshtml/playground.html&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&lt;img src="https://static.simonwillison.net/static/2025/justjshtml-playground.jpg" alt="Screenshot of JustJSHTML Playground web application. Header reads &amp;quot;JustJSHTML Playground&amp;quot; with subtitle &amp;quot;A dependency-free JavaScript HTML5 parser - GitHub&amp;quot;. Below is a status bar showing &amp;quot;JavaScript Environment&amp;quot; with a green &amp;quot;Ready&amp;quot; badge. The main input area has &amp;quot;Paste HTML&amp;quot; and &amp;quot;Fetch from URL&amp;quot; buttons, with a text area containing HTML code: &amp;quot;&amp;lt;!DOCTYPE html&amp;gt; &amp;lt;html&amp;gt; &amp;lt;head&amp;gt; &amp;lt;title&amp;gt;Example Page&amp;lt;/title&amp;gt; &amp;lt;/head&amp;gt; &amp;lt;body&amp;gt; &amp;lt;header&amp;gt; &amp;lt;nav&amp;gt; &amp;lt;ul&amp;gt;&amp;quot;. A &amp;quot;Playground Mode&amp;quot; section shows buttons for &amp;quot;CSS Selector Query&amp;quot;, &amp;quot;Pretty Print HTML&amp;quot;, &amp;quot;Tree Structure&amp;quot;, &amp;quot;Stream Events&amp;quot;, &amp;quot;Extract Text&amp;quot;, and &amp;quot;To Markdown&amp;quot; (highlighted in purple). Below is a text field labeled &amp;quot;CSS Selector (optional - leave empty for whole document):&amp;quot; with placeholder &amp;quot;e.g., article, main, .content (or leave empty)&amp;quot; and a green &amp;quot;Convert to Markdown&amp;quot; button. The Output section has a teal header with &amp;quot;Whole document&amp;quot; badge and displays converted markdown: &amp;quot;Example Page&amp;quot; followed by &amp;quot;- [Home](/)&amp;quot; &amp;quot;- [About](/about)&amp;quot; &amp;quot;- [Contact](/contact)&amp;quot;." style="max-width: 100%;" /&gt;&lt;/p&gt;
&lt;p&gt;All it needed now was some documentation:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;&lt;code&gt;Add a comprehensive README with full usage instructions including attribution plus how this was built plus how to use in in HTML plus how to use it in Node.js&lt;/code&gt;&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;You can &lt;a href="https://github.com/simonw/justjshtml/blob/f3a33fdb29bf97846fd017185edc8cf82783032e/README.md"&gt;read the result here&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;We are now at eight prompts total, running for just over four hours and I've decorated for Christmas and watched &lt;a href="https://en.wikipedia.org/wiki/Wake_Up_Dead_Man"&gt;Wake Up Dead Man&lt;/a&gt; on Netflix.&lt;/p&gt;
&lt;p&gt;According to Codex CLI:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;&lt;code&gt;Token usage: total=2,089,858 input=1,464,295 (+ 97,122,176 cached) output=625,563 (reasoning 437,010)&lt;/code&gt;&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;My &lt;a href="https://www.llm-prices.com/#it=2089858&amp;amp;cit=97122176&amp;amp;ot=625563&amp;amp;sel=gpt-5.2"&gt;llm-prices.com calculator&lt;/a&gt; estimates that at $29.41 if I was paying for those tokens at API prices, but they were included in my $20/month ChatGPT Plus subscription so the actual extra cost to me was zero.&lt;/p&gt;
&lt;h4 id="what-can-we-learn-from-this-"&gt;What can we learn from this?&lt;/h4&gt;
&lt;p&gt;I'm sharing this project because I think it demonstrates a bunch of interesting things about the state of LLMs in December 2025.&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Frontier LLMs really can perform complex, multi-hour tasks with hundreds of tool calls and minimal supervision. I used GPT-5.2 for this but I have no reason to believe that Claude Opus 4.5 or Gemini 3 Pro would not be able to achieve the same thing - the only reason I haven't tried is that I don't want to burn another 4 hours of time and several million tokens on more runs.&lt;/li&gt;
&lt;li&gt;If you can reduce a problem to a robust test suite you can set a coding agent loop loose on it with a high degree of confidence that it will eventually succeed. I called this &lt;a href="https://simonwillison.net/2025/Sep/30/designing-agentic-loops/"&gt;designing the agentic loop&lt;/a&gt; a few months ago. I think it's the key skill to unlocking the potential of LLMs for complex tasks.&lt;/li&gt;
&lt;li&gt;Porting entire open source libraries from one language to another via a coding agent works extremely well.&lt;/li&gt;
&lt;li&gt;Code is so cheap it's practically free. Code that &lt;em&gt;works&lt;/em&gt; continues to carry a cost, but that cost has plummeted now that coding agents can check their work as they go.&lt;/li&gt;
&lt;li&gt;We haven't even &lt;em&gt;begun&lt;/em&gt; to unpack the etiquette and ethics around this style of development. Is it responsible and appropriate to churn out a direct port of a library like this in a few hours while watching a movie? What would it take for code built like this to be trusted in production?&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;I'll end with some open questions:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Does this library represent a legal violation of copyright of either the Rust library or the Python one?&lt;/li&gt;
&lt;li&gt;Even if this is legal, is it ethical to build a library in this way?&lt;/li&gt;
&lt;li&gt;Does this format of development hurt the open source ecosystem?&lt;/li&gt;
&lt;li&gt;Can I even assert copyright over this, given how much of the work was produced by the LLM?&lt;/li&gt;
&lt;li&gt;Is it responsible to publish software libraries built in this way?&lt;/li&gt;
&lt;li&gt;How much better would this library be if an expert team hand crafted it over the course of several months?&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;strong&gt;Update 11th January 2026&lt;/strong&gt;: I originally ended this post with just these open questions, but I've now provided &lt;a href="https://simonwillison.net/2026/Jan/11/answers/"&gt;my own answers to the questions&lt;/a&gt; in a new post.&lt;/p&gt;
    
        &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/html"&gt;html&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/javascript"&gt;javascript&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/python"&gt;python&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/ai"&gt;ai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/generative-ai"&gt;generative-ai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/llms"&gt;llms&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/ai-assisted-programming"&gt;ai-assisted-programming&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/gpt-5"&gt;gpt-5&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/codex-cli"&gt;codex-cli&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/november-2025-inflection"&gt;november-2025-inflection&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/vibe-porting"&gt;vibe-porting&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/gpt"&gt;gpt&lt;/a&gt;&lt;/p&gt;
    

</summary><category term="html"/><category term="javascript"/><category term="python"/><category term="ai"/><category term="generative-ai"/><category term="llms"/><category term="ai-assisted-programming"/><category term="gpt-5"/><category term="codex-cli"/><category term="november-2025-inflection"/><category term="vibe-porting"/><category term="gpt"/></entry><entry><title>Quoting OpenAI Codex CLI</title><link href="https://simonwillison.net/2025/Dec/13/openai-codex-cli/#atom-tag" rel="alternate"/><published>2025-12-13T03:47:43+00:00</published><updated>2025-12-13T03:47:43+00:00</updated><id>https://simonwillison.net/2025/Dec/13/openai-codex-cli/#atom-tag</id><summary type="html">
    &lt;blockquote cite="https://github.com/openai/codex/blob/ad7b9d63c326d5c92049abd16f9f5fb64a573a69/codex-rs/core/src/skills/render.rs#L20-L39"&gt;&lt;p&gt;How to use a skill (progressive disclosure):&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;After deciding to use a skill, open its &lt;code&gt;SKILL.md&lt;/code&gt;. Read only enough to follow the workflow.&lt;/li&gt;
&lt;li&gt;If &lt;code&gt;SKILL.md&lt;/code&gt; points to extra folders such as &lt;code&gt;references/&lt;/code&gt;, load only the specific files needed for the request; don't bulk-load everything.&lt;/li&gt;
&lt;li&gt;If &lt;code&gt;scripts/&lt;/code&gt; exist, prefer running or patching them instead of retyping large code blocks.&lt;/li&gt;
&lt;li&gt;If &lt;code&gt;assets/&lt;/code&gt; or templates exist, reuse them instead of recreating from scratch.&lt;/li&gt;&lt;/ol&gt;
&lt;p&gt;Description as trigger: The YAML &lt;code&gt;description&lt;/code&gt; in &lt;code&gt;SKILL.md&lt;/code&gt; is the primary trigger signal; rely on it to decide applicability. If unsure, ask a brief clarification before proceeding.&lt;/p&gt;&lt;/blockquote&gt;
&lt;p class="cite"&gt;&amp;mdash; &lt;a href="https://github.com/openai/codex/blob/ad7b9d63c326d5c92049abd16f9f5fb64a573a69/codex-rs/core/src/skills/render.rs#L20-L39"&gt;OpenAI Codex CLI&lt;/a&gt;, core/src/skills/render.rs, &lt;a href="https://gist.github.com/simonw/25f2c3a9e350274bc2b76a79bc8ae8b2"&gt;full prompt&lt;/a&gt;&lt;/p&gt;

    &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/ai"&gt;ai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/rust"&gt;rust&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/openai"&gt;openai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/prompt-engineering"&gt;prompt-engineering&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/generative-ai"&gt;generative-ai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/llms"&gt;llms&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/codex-cli"&gt;codex-cli&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/skills"&gt;skills&lt;/a&gt;&lt;/p&gt;



</summary><category term="ai"/><category term="rust"/><category term="openai"/><category term="prompt-engineering"/><category term="generative-ai"/><category term="llms"/><category term="codex-cli"/><category term="skills"/></entry><entry><title>OpenAI are quietly adopting skills, now available in ChatGPT and Codex CLI</title><link href="https://simonwillison.net/2025/Dec/12/openai-skills/#atom-tag" rel="alternate"/><published>2025-12-12T23:29:51+00:00</published><updated>2025-12-12T23:29:51+00:00</updated><id>https://simonwillison.net/2025/Dec/12/openai-skills/#atom-tag</id><summary type="html">
    &lt;p&gt;One of the things that most excited me about &lt;a href="https://simonwillison.net/2025/Oct/16/claude-skills/"&gt;Anthropic's new Skills mechanism&lt;/a&gt; back in October is how easy it looked for other platforms to implement. A skill is just a folder with a Markdown file and some optional extra resources and scripts, so any LLM tool with the ability to navigate and read from a filesystem should be capable of using them. It turns out OpenAI are doing exactly that, with skills support quietly showing up in both their Codex CLI tool and now also in ChatGPT itself.&lt;/p&gt;
&lt;h4 id="skills-in-chatgpt"&gt;Skills in ChatGPT&lt;/h4&gt;
&lt;p&gt;I learned about this &lt;a href="https://x.com/elias_judin/status/1999491647563006171"&gt;from Elias Judin&lt;/a&gt; this morning. It turns out the Code Interpreter feature of ChatGPT now has a new &lt;code&gt;/home/oai/skills&lt;/code&gt; folder which you can access simply by prompting:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;&lt;code&gt;Create a zip file of /home/oai/skills&lt;/code&gt;&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;I &lt;a href="https://chatgpt.com/share/693c9645-caa4-8006-9302-0a9226ea7599"&gt;tried that myself&lt;/a&gt; and got back &lt;a href="https://static.simonwillison.net/static/cors-allow/2025/skills.zip"&gt;this zip file&lt;/a&gt;. Here's &lt;a href="https://tools.simonwillison.net/zip-wheel-explorer?url=https%3A%2F%2Fstatic.simonwillison.net%2Fstatic%2Fcors-allow%2F2025%2Fskills.zip"&gt;a UI for exploring its content&lt;/a&gt; (&lt;a href="https://tools.simonwillison.net/colophon#zip-wheel-explorer.html"&gt;more about that tool&lt;/a&gt;).&lt;/p&gt;
&lt;p&gt;&lt;img src="https://static.simonwillison.net/static/2025/skills-explore.jpg" alt="Screenshot of file explorer. Files skills/docs/render_docsx.py and skills/docs/skill.md and skills/pdfs/ and skills/pdfs/skill.md - that last one is expanded and reads: # PDF reading, creation, and review guidance  ## Reading PDFs - Use pdftoppm -png $OUTDIR/$BASENAME.pdf $OUTDIR/$BASENAME to convert PDFs to PNGs. - Then open the PNGs and read the images. - pdfplumber is also installed and can be used to read PDFs. It can be used as a complementary tool to pdftoppm but not replacing it. - Only do python printing as a last resort because you will miss important details with text extraction (e.g. figures, tables, diagrams).  ## Primary tooling for creating PDFs - Generate PDFs programmatically with reportlab as the primary tool. In most cases, you should use reportlab to create PDFs. - If there are other packages you think are necessary for the task (eg. pypdf, pyMuPDF), you can use them but you may need topip install them first. - After each meaningful update—content additions, layout adjustments, or style changes—render the PDF to images to check layout fidelity:   - pdftoppm -png $INPUT_PDF $OUTPUT_PREFIX - Inspect every exported PNG before continuing work. If anything looks off, fix the source and re-run the render → inspect loop until the pages are clean.  ## Quality expectations - Maintain a polished, intentional visual design: consistent typography, spacing, margins, color palette, and clear section breaks across all pages. - Avoid major rendering issues—no clipped text, overlapping elements, black squares, broken tables, or unreadable glyphs. The rendered pages should look like a curated document, not raw template output. - Charts, tables, diagrams, and images must be sharp, well-aligned, and properly labeled in the PNGs. Legends and axes should be readable without excessive zoom. - Text must be readable at normal viewing size; avoid walls of filler text or dense, unstructured bullet lists. Use whitespace to separate ideas. - Never use the U+2011 non-breaking hyphen or other unicode dashes as they will not be" style="max-width: 100%;" /&gt;&lt;/p&gt;
&lt;p&gt;So far they cover spreadsheets, docx and PDFs. Interestingly their chosen approach for PDFs and documents is to convert them to rendered per-page PNGs and then pass those through their vision-enabled GPT models, presumably to maintain information from layout and graphics that would be lost if they just ran text extraction.&lt;/p&gt;
&lt;p&gt;Elias &lt;a href="https://github.com/eliasjudin/oai-skills"&gt;shared copies in a GitHub repo&lt;/a&gt;. They look very similar to Anthropic's implementation of the same kind of idea, currently published in their &lt;a href="https://github.com/anthropics/skills/tree/main/skills"&gt;anthropics/skills&lt;/a&gt; repository.&lt;/p&gt;
&lt;p&gt;I tried it out by prompting:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;Create a PDF with a summary of the rimu tree situation right now and what it means for kakapo breeding season&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;Sure enough, GPT-5.2 Thinking started with:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;Reading skill.md for PDF creation guidelines&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;Then:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;Searching rimu mast and Kākāpō 2025 breeding status&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;It took &lt;a href="https://chatgpt.com/share/693ca54b-f770-8006-904b-9f31a585180a"&gt;just over eleven minutes&lt;/a&gt; to produce &lt;a href="https://static.simonwillison.net/static/cors-allow/2025/rimu_kakapo_breeding_brief.pdf"&gt;this PDF&lt;/a&gt;, which was long enough that I had Claude Code for web &lt;a href="https://github.com/simonw/tools/pull/155"&gt;build me a custom PDF viewing tool&lt;/a&gt; while I waited.&lt;/p&gt;
&lt;p&gt;&lt;a href="https://tools.simonwillison.net/view-pdf?url=https%3A%2F%2Fstatic.simonwillison.net%2Fstatic%2Fcors-allow%2F2025%2Frimu_kakapo_breeding_brief.pdf"&gt;Here's ChatGPT's PDF in that tool&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;&lt;img src="https://static.simonwillison.net/static/2025/rimu.jpg" alt="Screenshot of my tool. There is a URL at the top, a Load PDF button and pagination controls. Then the PDF itself is shown, which reads: Rimu mast status and what it means for the kākāpō breeding season Summary as of 12 December 2025 (Pacific/Auckland context) Kākāpō breeding is tightly linked to rimu (Dacrydium cupressinum) mast events: when rimu trees set and ripen large amounts of fruit, female kākāpō are much more likely to nest, and more chicks can be successfully raised. Current monitoring indicates an unusually strong rimu fruiting signal heading into the 2025/26 season, which sets the stage for a potentially large breeding year in 2026.^1,2 Key numbers at a glance Kākāpō population (official DOC count) 237 birds alive Breeding trigger (rimu fruiting)&amp;gt;10% of rimu branch tips bearing fruit Forecast rimu fruiting for 2026 (DOC monitoring) Around 50–60% fruiting across breeding islands¹Breeding-age females (DOC 2025 planning figure)About 87 females (potentially nearly all could nest)" style="max-width: 100%;" /&gt;&lt;/p&gt;
&lt;p&gt;(I am &lt;strong&gt;very excited&lt;/strong&gt; about &lt;a href="https://www.auckland.ac.nz/en/news/2025/12/03/bumper-breeding-season-for-kakapo-on-the-cards.html"&gt;Kākāpō breeding season this year&lt;/a&gt;.)&lt;/p&gt;
&lt;p&gt;The reason it took so long is that it was fastidious about looking at and tweaking its own work. I appreciated that at one point it tried rendering the PDF and noticed that the macrons in kākāpō were not supported by the chosen font, so it switched to something else:&lt;/p&gt;
&lt;p&gt;&lt;img src="https://static.simonwillison.net/static/2025/skills-macrons.jpg" alt="ChatGPT screenshot. Analyzed image. There's an image of a page of PDF with obvious black blocks on some of the letters in the heading. It then says: Fixing font issues with macrons. The page is showing black squares for words like &amp;quot;kākāpō,&amp;quot; probably because Helvetica can't handle macrons. I'll switch to a font that supports them, such as DejaVu Sans or Noto Sans. I'll register both regular and bold fonts, then apply them to the document. I'll update the footer to note the issue with Helvetica. Time to rebuild the PDF!" style="max-width: 100%;" /&gt;&lt;/p&gt;
&lt;h4 id="skills-in-codex-cli"&gt;Skills in Codex CLI&lt;/h4&gt;
&lt;p&gt;Meanwhile, two weeks ago OpenAI's open source Codex CLI tool landed a PR titled &lt;a href="https://github.com/openai/codex/pull/7412"&gt;feat: experimental support for skills.md&lt;/a&gt;. The most recent docs for that are in &lt;a href="https://github.com/openai/codex/blob/main/docs/skills.md"&gt;docs/skills.md&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;The documentation suggests that any folder in &lt;code&gt;~/.codex/skills&lt;/code&gt; will be treated as a skill.&lt;/p&gt;
&lt;p&gt;I dug around and found the code that generates the prompt that drives the skill system in &lt;a href="https://github.com/openai/codex/blob/ad7b9d63c326d5c92049abd16f9f5fb64a573a69/codex-rs/core/src/skills/render.rs#L20-L38"&gt;codex-rs/core/src/skills/render.rs&lt;/a&gt; - here's a Gist with &lt;a href="https://gist.github.com/simonw/25f2c3a9e350274bc2b76a79bc8ae8b2"&gt;a more readable version of that prompt&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;I &lt;a href="https://claude.ai/share/0a9b369b-f868-4065-91d1-fd646c5db3f4"&gt;used Claude Opus 4.5's skill authoring skill&lt;/a&gt; to create &lt;a href="https://github.com/datasette/skill"&gt;this skill for creating Datasette plugins&lt;/a&gt;, then installed it into my Codex CLI skills folder like this:&lt;/p&gt;
&lt;div class="highlight highlight-source-shell"&gt;&lt;pre&gt;git clone https://github.com/datasette/skill \
  &lt;span class="pl-k"&gt;~&lt;/span&gt;/.codex/skills/datasette-plugin&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;You have to run Codex with the &lt;code&gt;--enable skills&lt;/code&gt; option. I ran this:&lt;/p&gt;
&lt;div class="highlight highlight-source-shell"&gt;&lt;pre&gt;&lt;span class="pl-c1"&gt;cd&lt;/span&gt; /tmp
mkdir datasette-cowsay
&lt;span class="pl-c1"&gt;cd&lt;/span&gt; datasette-cowsay
codex --enable skills -m gpt-5.2&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;Then prompted:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;&lt;code&gt;list skills&lt;/code&gt;&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;And Codex replied:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;&lt;code&gt;- datasette-plugins — Writing Datasette plugins using Python + pluggy (file: /Users/simon/.codex/skills/datasette-plugin/SKILL.md)&lt;/code&gt;&lt;br /&gt;
&lt;code&gt;- Discovery — How to find/identify available skills (no SKILL.md path provided in the list)&lt;/code&gt;&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;Then I said:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;&lt;code&gt;Write a Datasette plugin in this folder adding a /-/cowsay?text=hello page that displays a pre with cowsay from PyPI saying that text&lt;/code&gt;&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;It worked perfectly! Here's &lt;a href="https://github.com/simonw/datasette-cowsay"&gt;the plugin code it wrote&lt;/a&gt; and here's &lt;a href="http://gistpreview.github.io/?96ee928370b18eabc2e0fad9aaa46d4b"&gt;a copy of the full Codex CLI transcript&lt;/a&gt;, generated with my &lt;a href="https://simonwillison.net/2025/Oct/23/claude-code-for-web-video/"&gt;terminal-to-html tool&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;You can try that out yourself if you have &lt;code&gt;uvx&lt;/code&gt; installed like this:&lt;/p&gt;
&lt;div class="highlight highlight-source-shell"&gt;&lt;pre&gt;uvx --with https://github.com/simonw/datasette-cowsay/archive/refs/heads/main.zip \
  datasette&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;Then visit:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;http://127.0.0.1:8001/-/cowsay?text=This+is+pretty+fun
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;img src="https://static.simonwillison.net/static/2025/cowsay-datasette.jpg" alt="Screenshot of that URL in Firefox, an ASCII art cow says This is pretty fun." style="max-width: 100%;" /&gt;&lt;/p&gt;
&lt;h4 id="skills-are-a-keeper"&gt;Skills are a keeper&lt;/h4&gt;
&lt;p&gt;When I first wrote about skills in October I said &lt;a href="https://simonwillison.net/2025/Oct/16/claude-skills/"&gt;Claude Skills are awesome, maybe a bigger deal than MCP&lt;/a&gt;. The fact that it's just turned December and OpenAI have already leaned into them in a big way reinforces to me that I called that one correctly.&lt;/p&gt;
&lt;p&gt;Skills are based on a &lt;em&gt;very&lt;/em&gt; light specification, if you could even call it that, but I still think it would be good for these to be formally documented somewhere. This could be a good initiative for the new &lt;a href="https://aaif.io/"&gt;Agentic AI Foundation&lt;/a&gt; (&lt;a href="https://simonwillison.net/2025/Dec/9/agentic-ai-foundation/"&gt;previously&lt;/a&gt;) to take on.&lt;/p&gt;
    
        &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/pdf"&gt;pdf&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/ai"&gt;ai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/kakapo"&gt;kakapo&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/openai"&gt;openai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/prompt-engineering"&gt;prompt-engineering&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/generative-ai"&gt;generative-ai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/chatgpt"&gt;chatgpt&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/llms"&gt;llms&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/ai-assisted-programming"&gt;ai-assisted-programming&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/anthropic"&gt;anthropic&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/coding-agents"&gt;coding-agents&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/gpt-5"&gt;gpt-5&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/codex-cli"&gt;codex-cli&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/skills"&gt;skills&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/gpt"&gt;gpt&lt;/a&gt;&lt;/p&gt;
    

</summary><category term="pdf"/><category term="ai"/><category term="kakapo"/><category term="openai"/><category term="prompt-engineering"/><category term="generative-ai"/><category term="chatgpt"/><category term="llms"/><category term="ai-assisted-programming"/><category term="anthropic"/><category term="coding-agents"/><category term="gpt-5"/><category term="codex-cli"/><category term="skills"/><category term="gpt"/></entry><entry><title>Building more with GPT-5.1-Codex-Max</title><link href="https://simonwillison.net/2025/Nov/19/gpt-51-codex-max/#atom-tag" rel="alternate"/><published>2025-11-19T23:15:10+00:00</published><updated>2025-11-19T23:15:10+00:00</updated><id>https://simonwillison.net/2025/Nov/19/gpt-51-codex-max/#atom-tag</id><summary type="html">
    
&lt;p&gt;&lt;strong&gt;&lt;a href="https://openai.com/index/gpt-5-1-codex-max/"&gt;Building more with GPT-5.1-Codex-Max&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;
Hot on the heels of yesterday's &lt;a href="https://simonwillison.net/2025/Nov/18/gemini-3/"&gt;Gemini 3 Pro release&lt;/a&gt; comes a new model from OpenAI called GPT-5.1-Codex-Max.&lt;/p&gt;
&lt;p&gt;(Remember when GPT-5 was meant to bring in a new era of less confusing model names? That didn't last!)&lt;/p&gt;
&lt;p&gt;It's currently only available through their &lt;a href="https://developers.openai.com/codex/cli/"&gt;Codex CLI coding agent&lt;/a&gt;, where it's the new default model:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;Starting today, GPT‑5.1-Codex-Max will replace GPT‑5.1-Codex as the default model in Codex surfaces. Unlike GPT‑5.1, which is a general-purpose model, we recommend using GPT‑5.1-Codex-Max and the Codex family of models only for agentic coding tasks in Codex or Codex-like environments.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;It's not available via the API yet but should be shortly.&lt;/p&gt;
&lt;p&gt;The timing of this release is interesting given that Gemini 3 Pro appears to have &lt;a href="https://simonwillison.net/2025/Nov/18/gemini-3/#benchmarks"&gt;aced almost all of the benchmarks&lt;/a&gt; just yesterday. It's reminiscent of the period in 2024 when OpenAI consistently made big announcements that happened to coincide with Gemini releases.&lt;/p&gt;
&lt;p&gt;OpenAI's self-reported &lt;a href="https://openai.com/index/introducing-swe-bench-verified/"&gt;SWE-Bench Verified&lt;/a&gt; score is particularly notable: 76.5% for thinking level "high" and 77.9% for the new "xhigh". That was the one benchmark where Gemini 3 Pro was out-performed by Claude Sonnet 4.5 - Gemini 3 Pro got 76.2% and Sonnet 4.5 got 77.2%. OpenAI now have the highest scoring model there by a full .7 of a percentage point!&lt;/p&gt;
&lt;p&gt;They also report a score of 58.1% on &lt;a href="https://www.tbench.ai/leaderboard/terminal-bench/2.0"&gt;Terminal Bench 2.0&lt;/a&gt;, beating Gemini 3 Pro's 54.2% (and Sonnet 4.5's 42.8%.)&lt;/p&gt;
&lt;p&gt;The most intriguing part of this announcement concerns the model's approach to long context problems:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;GPT‑5.1-Codex-Max is built for long-running, detailed work. It’s our first model natively trained to operate across multiple context windows through a process called &lt;em&gt;compaction&lt;/em&gt;, coherently working over millions of tokens in a single task. [...]&lt;/p&gt;
&lt;p&gt;Compaction enables GPT‑5.1-Codex-Max to complete tasks that would have previously failed due to context-window limits, such as complex refactors and long-running agent loops by pruning its history while preserving the most important context over long horizons. In Codex applications, GPT‑5.1-Codex-Max automatically compacts its session when it approaches its context window limit, giving it a fresh context window. It repeats this process until the task is completed.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;There's a lot of confusion &lt;a href="https://news.ycombinator.com/item?id=45982649"&gt;on Hacker News&lt;/a&gt; about what this actually means. Claude Code already does a version of compaction, automatically summarizing previous turns when the context runs out. Does this just mean that Codex-Max is better at that process?&lt;/p&gt;
&lt;p&gt;I had it draw me a couple of pelicans by typing "Generate an SVG of a pelican riding a bicycle" directly into the Codex CLI tool. Here's thinking level medium:&lt;/p&gt;
&lt;p&gt;&lt;img alt="A flat-style illustration shows a white, round-bodied bird with an orange beak pedaling a red-framed bicycle with thin black wheels along a sandy beach, with a calm blue ocean and clear sky in the background." src="https://static.simonwillison.net/static/2025/codex-max-medium.jpg" /&gt;&lt;/p&gt;
&lt;p&gt;And here's thinking level "xhigh":&lt;/p&gt;
&lt;p&gt;&lt;img alt="A plump white bird with an orange beak and small black eyes crouches low on a blue bicycle with oversized dark wheels, shown racing forward with motion lines against a soft gradient blue sky." src="https://static.simonwillison.net/static/2025/codex-max-xhigh.jpg" /&gt;&lt;/p&gt;
&lt;p&gt;I also tried xhigh on the my &lt;a href="https://simonwillison.net/2025/Nov/18/gemini-3/#and-a-new-pelican-benchmark"&gt;longer pelican test prompt&lt;/a&gt;, which came out like this:&lt;/p&gt;
&lt;p id="advanced-pelican-codex-max"&gt;&lt;img alt="A stylized dark gray bird with layered wings, a yellow head crest, and a long brown beak leans forward in a racing pose on a black-framed bicycle, riding across a glossy blue surface under a pale sky." src="https://static.simonwillison.net/static/2025/codex-breeding-max-xhigh.jpg"&gt;&lt;/p&gt;

&lt;p&gt;Also today: &lt;a href="https://x.com/openai/status/1991266192905179613"&gt;GPT-5.1 Pro is rolling out today to all Pro users&lt;/a&gt;. According to the &lt;a href="https://help.openai.com/en/articles/6825453-chatgpt-release-notes"&gt;ChatGPT release notes&lt;/a&gt;:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;GPT-5.1 Pro is rolling out today for all ChatGPT Pro users and is available in the model picker. GPT-5 Pro will remain available as a legacy model for 90 days before being retired.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;That's a pretty fast deprecation cycle for the GPT-5 Pro model that was released just three months ago.

    &lt;p&gt;&lt;small&gt;&lt;/small&gt;Via &lt;a href="https://news.ycombinator.com/item?id=45982649"&gt;Hacker News&lt;/a&gt;&lt;/small&gt;&lt;/p&gt;


    &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/ai"&gt;ai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/openai"&gt;openai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/generative-ai"&gt;generative-ai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/llms"&gt;llms&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/evals"&gt;evals&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/pelican-riding-a-bicycle"&gt;pelican-riding-a-bicycle&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/llm-release"&gt;llm-release&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/gpt-5"&gt;gpt-5&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/codex-cli"&gt;codex-cli&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/gpt-codex"&gt;gpt-codex&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/november-2025-inflection"&gt;november-2025-inflection&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/gpt"&gt;gpt&lt;/a&gt;&lt;/p&gt;



</summary><category term="ai"/><category term="openai"/><category term="generative-ai"/><category term="llms"/><category term="evals"/><category term="pelican-riding-a-bicycle"/><category term="llm-release"/><category term="gpt-5"/><category term="codex-cli"/><category term="gpt-codex"/><category term="november-2025-inflection"/><category term="gpt"/></entry><entry><title>Six coding agents at once</title><link href="https://simonwillison.net/2025/Nov/11/six-coding-agents-at-once/#atom-tag" rel="alternate"/><published>2025-11-11T22:52:45+00:00</published><updated>2025-11-11T22:52:45+00:00</updated><id>https://simonwillison.net/2025/Nov/11/six-coding-agents-at-once/#atom-tag</id><summary type="html">
    &lt;p&gt;I've been upgrading a &lt;em&gt;ton&lt;/em&gt; of Datasette plugins recently for compatibility with the &lt;a href="https://simonwillison.net/2025/Nov/4/datasette-10a20/"&gt;Datasette 1.0a20 release&lt;/a&gt; from last week - &lt;a href="https://github.com/simonw/datasette/issues/2577#issuecomment-3483537877"&gt;35 so far&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;A lot of the work is very repetitive so I've been outsourcing it to &lt;a href="https://github.com/openai/codex"&gt;Codex CLI&lt;/a&gt;. Here's the recipe I've landed on:&lt;/p&gt;
&lt;div class="highlight highlight-source-shell"&gt;&lt;pre style="font-size: 0.9em"&gt;codex &lt;span class="pl-c1"&gt;exec&lt;/span&gt; --dangerously-bypass-approvals-and-sandbox \
&lt;span class="pl-s"&gt;&lt;span class="pl-pds"&gt;'&lt;/span&gt;Run the command tadd and look at the errors and then&lt;/span&gt;
&lt;span class="pl-s"&gt;read ~/dev/datasette/docs/upgrade-1.0a20.md and apply&lt;/span&gt;
&lt;span class="pl-s"&gt;fixes and run the tests again and get them to pass.&lt;/span&gt;
&lt;span class="pl-s"&gt;&lt;/span&gt;
&lt;span class="pl-s"&gt;Also delete the .github directory entirely and replace&lt;/span&gt;
&lt;span class="pl-s"&gt;it by running this:&lt;/span&gt;
&lt;span class="pl-s"&gt;&lt;/span&gt;
&lt;span class="pl-s"&gt;cp -r ~/dev/ecosystem/datasette-os-info/.github .&lt;/span&gt;
&lt;span class="pl-s"&gt;&lt;/span&gt;
&lt;span class="pl-s"&gt;Run a git diff against that to make sure it looks OK&lt;/span&gt;
&lt;span class="pl-s"&gt;- if there are any notable differences e.g. switching&lt;/span&gt;
&lt;span class="pl-s"&gt;from Twine to the PyPI uploader or deleting code that&lt;/span&gt;
&lt;span class="pl-s"&gt;does a special deploy or configures something like &lt;/span&gt;
&lt;span class="pl-s"&gt;playwright include that in your final report.&lt;/span&gt;
&lt;span class="pl-s"&gt;&lt;/span&gt;
&lt;span class="pl-s"&gt;If the project still uses setup.py then edit that new&lt;/span&gt;
&lt;span class="pl-s"&gt;test.yml and publish.yaml to mention setup.py not pyproject.toml&lt;/span&gt;
&lt;span class="pl-s"&gt;&lt;/span&gt;
&lt;span class="pl-s"&gt;If this project has pyproject.toml make sure the license&lt;/span&gt;
&lt;span class="pl-s"&gt;line in that looks like this:&lt;/span&gt;
&lt;span class="pl-s"&gt;&lt;/span&gt;
&lt;span class="pl-s"&gt;license = "Apache-2.0"&lt;/span&gt;
&lt;span class="pl-s"&gt;&lt;/span&gt;
&lt;span class="pl-s"&gt;And remove any license thing from the classifiers= array&lt;/span&gt;
&lt;span class="pl-s"&gt;&lt;/span&gt;
&lt;span class="pl-s"&gt;Update the Datasette dependency in pyproject.toml or&lt;/span&gt;
&lt;span class="pl-s"&gt;setup.py to "datasette&amp;gt;=1.0a21"&lt;/span&gt;
&lt;span class="pl-s"&gt;&lt;/span&gt;
&lt;span class="pl-s"&gt;And make sure requires-python is &amp;gt;=3.10&lt;span class="pl-pds"&gt;'&lt;/span&gt;&lt;/span&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;I featured a simpler version of this prompt in my &lt;a href="https://simonwillison.net/2025/Nov/6/upgrading-datasette-plugins/"&gt;Datasette plugin upgrade video&lt;/a&gt;, but I've expanded it quite a bit since then.&lt;/p&gt;
&lt;p&gt;At one point I had six terminal windows open running this same prompt against six different repos - probably my most extreme case of &lt;a href="https://simonwillison.net/2025/Oct/5/parallel-coding-agents/"&gt;parallel agents&lt;/a&gt; yet.&lt;/p&gt;
&lt;p&gt;&lt;img alt="Animated GIF demo. Six terminal windows are arranged in a 3x2 grid, each one of them is running the above prompt and working its way through making modifications to one of six different projects: datasette-extract, datasette-create-view, datasette-write, datasette-secrets, datasette-public, and datasette-write-ui." src="https://static.simonwillison.net/static/2025/multiple-codexes.gif" /&gt;&lt;/p&gt;
&lt;p&gt;Here are the six resulting commits from those six coding agent sessions:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href="https://github.com/datasette/datasette-extract/commit/deb6ae3f3069d45c5227a57067c6621cd3b8d6ea"&gt;datasette-extract deb6ae&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://github.com/datasette/datasette-create-view/commit/d940f42fdab205c645fe4a2f1d7a4e44d41104d8"&gt;datasette-create-view d940f4&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://github.com/simonw/datasette-write/commit/e0af01f931498a3dfbf5f2597534df109559fe71"&gt;datasette-write e0af01&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://github.com/datasette/datasette-secrets/commit/e93d1410bcd9a4af87a046b584e9e3f9cae503c4"&gt;datasette-secrets e93d14&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://github.com/datasette/datasette-write-ui/commit/1d2459fbc35ad02633bb7441c92bc5f8a5d919d5"&gt;datasette-write-ui 1d2459&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://github.com/datasette/datasette-public/commit/5213c41521821c03688c6099581e198a831f85d5"&gt;datasette-public 5213c4&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

    &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/ai"&gt;ai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/datasette"&gt;datasette&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/prompt-engineering"&gt;prompt-engineering&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/generative-ai"&gt;generative-ai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/llms"&gt;llms&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/ai-assisted-programming"&gt;ai-assisted-programming&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/coding-agents"&gt;coding-agents&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/codex-cli"&gt;codex-cli&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/parallel-agents"&gt;parallel-agents&lt;/a&gt;&lt;/p&gt;



</summary><category term="ai"/><category term="datasette"/><category term="prompt-engineering"/><category term="generative-ai"/><category term="llms"/><category term="ai-assisted-programming"/><category term="coding-agents"/><category term="codex-cli"/><category term="parallel-agents"/></entry><entry><title>Reverse engineering Codex CLI to get GPT-5-Codex-Mini to draw me a pelican</title><link href="https://simonwillison.net/2025/Nov/9/gpt-5-codex-mini/#atom-tag" rel="alternate"/><published>2025-11-09T03:31:34+00:00</published><updated>2025-11-09T03:31:34+00:00</updated><id>https://simonwillison.net/2025/Nov/9/gpt-5-codex-mini/#atom-tag</id><summary type="html">
    &lt;p&gt;OpenAI partially released a new model yesterday called GPT-5-Codex-Mini, which they &lt;a href="https://x.com/OpenAIDevs/status/1986861734619947305"&gt;describe&lt;/a&gt; as "a more compact and cost-efficient version of GPT-5-Codex". It's currently only available via their Codex CLI tool and VS Code extension, with proper API access "&lt;a href="https://x.com/OpenAIDevs/status/1986861736041853368"&gt;coming soon&lt;/a&gt;". I decided to use Codex to reverse engineer the Codex CLI tool and give me the ability to prompt the new model directly.&lt;/p&gt;
&lt;p&gt;I made &lt;a href="https://www.youtube.com/watch?v=9o1_DL9uNlM"&gt;a video&lt;/a&gt; talking through my progress and demonstrating the final results.&lt;/p&gt;

&lt;p&gt;&lt;lite-youtube videoid="9o1_DL9uNlM" js-api="js-api" title="Reverse engineering Codex CLI to get GPT-5-Codex-Mini to draw me a pelican" playlabel="Play: Reverse engineering Codex CLI to get GPT-5-Codex-Mini to draw me a pelican"&gt; &lt;/lite-youtube&gt;&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;&lt;a href="https://simonwillison.net/2025/Nov/9/gpt-5-codex-mini/#this-is-a-little-bit-cheeky"&gt;This is a little bit cheeky&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href="https://simonwillison.net/2025/Nov/9/gpt-5-codex-mini/#codex-cli-is-written-in-rust"&gt;Codex CLI is written in Rust&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href="https://simonwillison.net/2025/Nov/9/gpt-5-codex-mini/#iterating-on-the-code"&gt;Iterating on the code&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href="https://simonwillison.net/2025/Nov/9/gpt-5-codex-mini/#let-s-draw-some-pelicans"&gt;Let's draw some pelicans&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href="https://simonwillison.net/2025/Nov/9/gpt-5-codex-mini/#bonus-the-debug-option"&gt;Bonus: the --debug option&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h4 id="this-is-a-little-bit-cheeky"&gt;This is a little bit cheeky&lt;/h4&gt;
&lt;p&gt;OpenAI clearly don't intend for people to access this model directly just yet. It's available exclusively through Codex CLI which is a privileged application - it gets to access a special backend API endpoint that's not publicly documented, and it uses a special authentication mechanism that bills usage directly to the user's existing ChatGPT account.&lt;/p&gt;
&lt;p&gt;I figured reverse-engineering that API directly would be somewhat impolite. But... Codex CLI is an open source project released under an Apache 2.0 license. How about upgrading that to let me run my own prompts through its existing API mechanisms instead?&lt;/p&gt;
&lt;p&gt;This felt like a somewhat absurd loophole, and I couldn't resist trying it out and seeing what happened.&lt;/p&gt;
&lt;h4 id="codex-cli-is-written-in-rust"&gt;Codex CLI is written in Rust&lt;/h4&gt;
&lt;p&gt;The &lt;a href="https://github.com/openai/codex"&gt;openai/codex&lt;/a&gt; repository contains the source code for the Codex CLI tool, which OpenAI rewrote in Rust just a few months ago.&lt;/p&gt;
&lt;p&gt;I don't know much Rust at all.&lt;/p&gt;
&lt;p&gt;I made my own clone on GitHub and checked it out locally:&lt;/p&gt;
&lt;div class="highlight highlight-source-shell"&gt;&lt;pre&gt;git clone git@github.com:simonw/codex
&lt;span class="pl-c1"&gt;cd&lt;/span&gt; codex&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;Then I fired up Codex itself (in dangerous mode, because I like living dangerously):&lt;/p&gt;
&lt;div class="highlight highlight-source-shell"&gt;&lt;pre&gt;codex --dangerously-bypass-approvals-and-sandbox&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;And ran this prompt:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;Figure out how to build the rust version of this tool and then build it&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;This worked. It churned away for a bit and figured out how to build itself. This is a useful starting point for a project like this - in figuring out the compile step the coding agent gets seeded with a little bit of relevant information about the project, and if it can compile that means it can later partially test the code it is writing while it works.&lt;/p&gt;
&lt;p&gt;Once the compile had succeeded I fed it the design for the new feature I wanted:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;Add a new sub-command to the Rust tool called "codex prompt"&lt;/p&gt;
&lt;p&gt;codex prompt "prompt goes here" - this runs the given prompt directly against the OpenAI API that Codex uses, with the same code path and authentication credentials as the rest of Codex.&lt;/p&gt;
&lt;p&gt;codex prompt -m &amp;lt;model_name&amp;gt; "prompt goes here" - same again but lets you set the model&lt;/p&gt;
&lt;p&gt;codex prompt "prompt goes here" -s/--system "system prompt goes here" - runs with a custom system/developer message&lt;/p&gt;
&lt;p&gt;codex prompt --models - this lists all available models that can be used with the "codex prompt" command&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;My design for this new sub-command is lifted directly from my own &lt;a href="https://llm.datasette.io/en/stable/help.html#llm-prompt-help"&gt;llm command&lt;/a&gt;.&lt;/p&gt;
&lt;h4 id="iterating-on-the-code"&gt;Iterating on the code&lt;/h4&gt;
&lt;p&gt;Codex got to work. I've &lt;a href="https://gistpreview.github.io/?ddabbff092bdd658e06d8a2e8f142098"&gt;shared the full transcript here&lt;/a&gt; (using the tool &lt;a href="https://simonwillison.net/2025/Oct/23/claude-code-for-web-video/"&gt;I described here&lt;/a&gt;), but this TODO list it made itself is a useful summary of what it decided to do:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;Updated Plan
└ ✔ Inspect CLI structure and relevant core utilities for sending prompts and listing models
  ✔ Implement new `codex prompt` subcommand (CLI plumbing + command handler)
  ✔ Format, lint, and test (`just fmt`, `just fix -p codex-cli`, `cargo test -p codex-cli`)
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;I like that it figured out &lt;a href="https://github.com/openai/codex/blob/a47181e471b6efe55e95f9858c913fc89a3a44fa/codex-rs/justfile"&gt;the justfile&lt;/a&gt; in the repo and decided to use it to run formatting and linting commands without me needing to tell it to. (Update: it turns out that was dictated by the &lt;a href="https://github.com/openai/codex/blob/f8b30af6dc275b3e64de5f1987e6cafe604cb72a/AGENTS.md"&gt;AGENTS.md&lt;/a&gt; file.)&lt;/p&gt;
&lt;p&gt;I tried running the first version of the code it wrote like this:&lt;/p&gt;
&lt;div class="highlight highlight-source-shell"&gt;&lt;pre&gt;./target/debug/codex prompt &lt;span class="pl-s"&gt;&lt;span class="pl-pds"&gt;'&lt;/span&gt;Generate an SVG of a pelican riding a bicycle&lt;span class="pl-pds"&gt;'&lt;/span&gt;&lt;/span&gt; -m gpt-5-codex-mini&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;... and it didn't quite work. I got this:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;(reasoning summary) **Seeking
(reasoning summary)  instructions
(reasoning summary)  and
(reasoning summary)  sandbox
(reasoning summary)  info
(reasoning summary) **
(reasoning summary) **Dec
(reasoning summary) iding
(reasoning summary)  on
(reasoning summary)  SVG
(reasoning summary)  creation
(reasoning summary)  approach
(reasoning summary) **
(reasoning summary) **Checking
(reasoning summary)  current
(reasoning summary)  directory
(reasoning summary) **
(reasoning summary) **Preparing
(reasoning summary)  to
(reasoning summary)  check
(reasoning summary)  current
(reasoning summary)  directory
(reasoning summary) **
I�m ready to help�what would you like me to do next?I�m ready to help�what would you like me to do next?
Token usage: total=2459 input=2374 cached_input=0 output=85 reasoning_output=64
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Note that it DID think about SVG creation, but then decided it should look at the current directory. This isn't what I want - it appeared to be running in Codex's normal mode with a system prompt telling it to edit files on disk. I wanted it to respond to the prompt without acting as if it had a full workspace available to it.&lt;/p&gt;
&lt;p&gt;I prompted it again, pasting in the broken output:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;this did not seem to work - here is what happened - note that the reasoning summary comes out on a lot of lines when I would rather it streams out to stderr on the same line (flushing constantly) - and then despite the reasoning summary thinking about pelican on SVG it did not actually act on that and display the result, which is very surprising. [pasted content]&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;It churned for a bit more, but from the output it looked like it was still trying to include Codex-specific tools and workspace information. So I said:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;Note that this new 'codex prompt' command should NOT be able to execute any tools at all, and should not have to think about the workspace or sandbox - it should send a single prompt with no tools and then stream and display a response and then stop.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;This got me a new build, but when I ran it:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;Error: unexpected status 400 Bad Request: {"detail":"Instructions are not valid"}
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;My guess is that OpenAI's custom backend API for Codex doesn't like it if you attempt to call it &lt;em&gt;without&lt;/em&gt; passing in the expected instructions. Fair enough!&lt;/p&gt;
&lt;p&gt;Rather than paste that error in I got Codex to try the command itself, so it could see the error:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;You try running this and see what happens: ./target/debug/codex prompt 'Generate an SVG of a pelican riding a bicycle'&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;Codex switched back to the default instructions. From its reply:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;&lt;code&gt;cli/src/prompt_cmd.rs#L1-L320&lt;/code&gt; now forces a tool-free request: we keep the workspace defaults, inject a lightweight system/developer message (“don’t run tools”), disable apply_patch/web search via config overrides, and build the outbound prompt as just that system span plus the user text.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;I guess that will have to do. I asked for one more feature:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;Add a --debug option to the codex prompt command which causes the full JSON request and response to be printed to stderr, plus the URL that is being accessed and the HTTP verb&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;... and we're ready to try this thing out!&lt;/p&gt;
&lt;p&gt;Notably I haven't written a single line of Rust myself here and paid almost no attention to what it was actually doing. My main contribution was to run the binary every now and then to see if it was doing what I needed yet.&lt;/p&gt;
&lt;p&gt;I've pushed the working code to &lt;a href="https://github.com/simonw/codex/compare/a47181e471b6efe55e95f9858c913fc89a3a44fa...ae5f98a9248a8edb5d3c53261273a482fc0b5306"&gt;a prompt-subcommand branch in my repo&lt;/a&gt; if you want to take a look and see how it all works.&lt;/p&gt;

&lt;h4 id="let-s-draw-some-pelicans"&gt;Let's draw some pelicans&lt;/h4&gt;
&lt;p&gt;With the final version of the code built, I drew some pelicans. Here's the &lt;a href="https://gistpreview.github.io/?a11f9ac456d2b2bc3715ba900ef1203d"&gt;full terminal transcript&lt;/a&gt;, but here are some highlights.&lt;/p&gt;
&lt;p&gt;This is with the default GPT-5-Codex model:&lt;/p&gt;
&lt;div class="highlight highlight-source-shell"&gt;&lt;pre&gt;./target/debug/codex prompt &lt;span class="pl-s"&gt;&lt;span class="pl-pds"&gt;"&lt;/span&gt;Generate an SVG of a pelican riding a bicycle&lt;span class="pl-pds"&gt;"&lt;/span&gt;&lt;/span&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;I pasted it into my &lt;a href="https://tools.simonwillison.net/svg-render"&gt;tools.simonwillison.net/svg-render&lt;/a&gt; tool and got the following:&lt;/p&gt;
&lt;p&gt;&lt;img src="https://static.simonwillison.net/static/2025/codex-hacking-default.png" alt="It's a dumpy little pelican with a weird face, not particularly great" style="max-width: 100%;" /&gt;&lt;/p&gt;
&lt;p&gt;I ran it again for GPT-5:&lt;/p&gt;
&lt;div class="highlight highlight-source-shell"&gt;&lt;pre&gt;./target/debug/codex prompt &lt;span class="pl-s"&gt;&lt;span class="pl-pds"&gt;"&lt;/span&gt;Generate an SVG of a pelican riding a bicycle&lt;span class="pl-pds"&gt;"&lt;/span&gt;&lt;/span&gt; -m gpt-5&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;&lt;img src="https://static.simonwillison.net/static/2025/codex-hacking-gpt-5.png" alt="Much better bicycle, pelican is a bit line-drawing-ish but does have the necessary parts in the right places" style="max-width: 100%;" /&gt;&lt;/p&gt;
&lt;p&gt;And now the moment of truth... GPT-5 Codex Mini!&lt;/p&gt;
&lt;div class="highlight highlight-source-shell"&gt;&lt;pre&gt;./target/debug/codex prompt &lt;span class="pl-s"&gt;&lt;span class="pl-pds"&gt;"&lt;/span&gt;Generate an SVG of a pelican riding a bicycle&lt;span class="pl-pds"&gt;"&lt;/span&gt;&lt;/span&gt; -m gpt-5-codex-mini&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;&lt;img src="https://static.simonwillison.net/static/2025/codex-hacking-mini.png" alt="This is terrible. The pelican is an abstract collection of shapes, the bicycle is likewise very messed up" style="max-width: 100%;" /&gt;&lt;/p&gt;
&lt;p&gt;I don't think I'll be adding that one to my SVG drawing toolkit any time soon.&lt;/p&gt;

&lt;h4 id="bonus-the-debug-option"&gt;Bonus: the --debug option&lt;/h4&gt;
&lt;p&gt;I had Codex add a &lt;code&gt;--debug&lt;/code&gt; option to help me see exactly what was going on.&lt;/p&gt;
&lt;div class="highlight highlight-source-shell"&gt;&lt;pre&gt;./target/debug/codex prompt -m gpt-5-codex-mini &lt;span class="pl-s"&gt;&lt;span class="pl-pds"&gt;"&lt;/span&gt;Generate an SVG of a pelican riding a bicycle&lt;span class="pl-pds"&gt;"&lt;/span&gt;&lt;/span&gt; --debug&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;The output starts like this:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;[codex prompt debug] POST https://chatgpt.com/backend-api/codex/responses
[codex prompt debug] Request JSON:
&lt;/code&gt;&lt;/pre&gt;
&lt;div class="highlight highlight-source-json"&gt;&lt;pre&gt;{
  &lt;span class="pl-ent"&gt;"model"&lt;/span&gt;: &lt;span class="pl-s"&gt;&lt;span class="pl-pds"&gt;"&lt;/span&gt;gpt-5-codex-mini&lt;span class="pl-pds"&gt;"&lt;/span&gt;&lt;/span&gt;,
  &lt;span class="pl-ent"&gt;"instructions"&lt;/span&gt;: &lt;span class="pl-s"&gt;&lt;span class="pl-pds"&gt;"&lt;/span&gt;You are Codex, based on GPT-5. You are running as a coding agent ...&lt;span class="pl-pds"&gt;"&lt;/span&gt;&lt;/span&gt;,
  &lt;span class="pl-ent"&gt;"input"&lt;/span&gt;: [
    {
      &lt;span class="pl-ent"&gt;"type"&lt;/span&gt;: &lt;span class="pl-s"&gt;&lt;span class="pl-pds"&gt;"&lt;/span&gt;message&lt;span class="pl-pds"&gt;"&lt;/span&gt;&lt;/span&gt;,
      &lt;span class="pl-ent"&gt;"role"&lt;/span&gt;: &lt;span class="pl-s"&gt;&lt;span class="pl-pds"&gt;"&lt;/span&gt;developer&lt;span class="pl-pds"&gt;"&lt;/span&gt;&lt;/span&gt;,
      &lt;span class="pl-ent"&gt;"content"&lt;/span&gt;: [
        {
          &lt;span class="pl-ent"&gt;"type"&lt;/span&gt;: &lt;span class="pl-s"&gt;&lt;span class="pl-pds"&gt;"&lt;/span&gt;input_text&lt;span class="pl-pds"&gt;"&lt;/span&gt;&lt;/span&gt;,
          &lt;span class="pl-ent"&gt;"text"&lt;/span&gt;: &lt;span class="pl-s"&gt;&lt;span class="pl-pds"&gt;"&lt;/span&gt;You are a helpful assistant. Respond directly to the user request without running tools or shell commands.&lt;span class="pl-pds"&gt;"&lt;/span&gt;&lt;/span&gt;
        }
      ]
    },
    {
      &lt;span class="pl-ent"&gt;"type"&lt;/span&gt;: &lt;span class="pl-s"&gt;&lt;span class="pl-pds"&gt;"&lt;/span&gt;message&lt;span class="pl-pds"&gt;"&lt;/span&gt;&lt;/span&gt;,
      &lt;span class="pl-ent"&gt;"role"&lt;/span&gt;: &lt;span class="pl-s"&gt;&lt;span class="pl-pds"&gt;"&lt;/span&gt;user&lt;span class="pl-pds"&gt;"&lt;/span&gt;&lt;/span&gt;,
      &lt;span class="pl-ent"&gt;"content"&lt;/span&gt;: [
        {
          &lt;span class="pl-ent"&gt;"type"&lt;/span&gt;: &lt;span class="pl-s"&gt;&lt;span class="pl-pds"&gt;"&lt;/span&gt;input_text&lt;span class="pl-pds"&gt;"&lt;/span&gt;&lt;/span&gt;,
          &lt;span class="pl-ent"&gt;"text"&lt;/span&gt;: &lt;span class="pl-s"&gt;&lt;span class="pl-pds"&gt;"&lt;/span&gt;Generate an SVG of a pelican riding a bicycle&lt;span class="pl-pds"&gt;"&lt;/span&gt;&lt;/span&gt;
        }
      ]
    }
  ],
  &lt;span class="pl-ent"&gt;"tools"&lt;/span&gt;: [],
  &lt;span class="pl-ent"&gt;"tool_choice"&lt;/span&gt;: &lt;span class="pl-s"&gt;&lt;span class="pl-pds"&gt;"&lt;/span&gt;auto&lt;span class="pl-pds"&gt;"&lt;/span&gt;&lt;/span&gt;,
  &lt;span class="pl-ent"&gt;"parallel_tool_calls"&lt;/span&gt;: &lt;span class="pl-c1"&gt;false&lt;/span&gt;,
  &lt;span class="pl-ent"&gt;"reasoning"&lt;/span&gt;: {
    &lt;span class="pl-ent"&gt;"summary"&lt;/span&gt;: &lt;span class="pl-s"&gt;&lt;span class="pl-pds"&gt;"&lt;/span&gt;auto&lt;span class="pl-pds"&gt;"&lt;/span&gt;&lt;/span&gt;
  },
  &lt;span class="pl-ent"&gt;"store"&lt;/span&gt;: &lt;span class="pl-c1"&gt;false&lt;/span&gt;,
  &lt;span class="pl-ent"&gt;"stream"&lt;/span&gt;: &lt;span class="pl-c1"&gt;true&lt;/span&gt;,
  &lt;span class="pl-ent"&gt;"include"&lt;/span&gt;: [
    &lt;span class="pl-s"&gt;&lt;span class="pl-pds"&gt;"&lt;/span&gt;reasoning.encrypted_content&lt;span class="pl-pds"&gt;"&lt;/span&gt;&lt;/span&gt;
  ],
  &lt;span class="pl-ent"&gt;"prompt_cache_key"&lt;/span&gt;: &lt;span class="pl-s"&gt;&lt;span class="pl-pds"&gt;"&lt;/span&gt;019a66bf-3e2c-7412-b05e-db9b90bbad6e&lt;span class="pl-pds"&gt;"&lt;/span&gt;&lt;/span&gt;
}&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;This reveals that OpenAI's private API endpoint for Codex CLI is &lt;code&gt;https://chatgpt.com/backend-api/codex/responses&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;Also interesting is how the &lt;code&gt;"instructions"&lt;/code&gt; key (truncated above, &lt;a href="https://gist.github.com/simonw/996388ecf785ad54de479315bd4d33b7"&gt;full copy here&lt;/a&gt;) contains the default instructions, without which the API appears not to work - but it also shows that you can send a message with &lt;code&gt;role="developer"&lt;/code&gt; in advance of your user prompt.&lt;/p&gt;
    
        &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/ai"&gt;ai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/rust"&gt;rust&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/openai"&gt;openai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/generative-ai"&gt;generative-ai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/llms"&gt;llms&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/ai-assisted-programming"&gt;ai-assisted-programming&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/pelican-riding-a-bicycle"&gt;pelican-riding-a-bicycle&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/llm-release"&gt;llm-release&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/vibe-coding"&gt;vibe-coding&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/coding-agents"&gt;coding-agents&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/gpt-5"&gt;gpt-5&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/codex-cli"&gt;codex-cli&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/gpt-codex"&gt;gpt-codex&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/gpt"&gt;gpt&lt;/a&gt;&lt;/p&gt;
    

</summary><category term="ai"/><category term="rust"/><category term="openai"/><category term="generative-ai"/><category term="llms"/><category term="ai-assisted-programming"/><category term="pelican-riding-a-bicycle"/><category term="llm-release"/><category term="vibe-coding"/><category term="coding-agents"/><category term="gpt-5"/><category term="codex-cli"/><category term="gpt-codex"/><category term="gpt"/></entry><entry><title>Using Codex CLI with gpt-oss:120b on an NVIDIA DGX Spark via Tailscale</title><link href="https://simonwillison.net/2025/Nov/7/codex-tailscale-spark/#atom-tag" rel="alternate"/><published>2025-11-07T07:23:12+00:00</published><updated>2025-11-07T07:23:12+00:00</updated><id>https://simonwillison.net/2025/Nov/7/codex-tailscale-spark/#atom-tag</id><summary type="html">
    
&lt;p&gt;&lt;strong&gt;&lt;a href="https://til.simonwillison.net/llms/codex-spark-gpt-oss"&gt;Using Codex CLI with gpt-oss:120b on an NVIDIA DGX Spark via Tailscale&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;
Inspired by a &lt;a href="https://www.youtube.com/watch?v=qy4ci7AoF9Y&amp;amp;lc=UgzaGdLX8TAuQ9ugx1Z4AaABAg"&gt;YouTube comment&lt;/a&gt; I wrote up how I run OpenAI's Codex CLI coding agent against the gpt-oss:120b model running in Ollama on my &lt;a href="https://simonwillison.net/2025/Oct/14/nvidia-dgx-spark/"&gt;NVIDIA DGX Spark&lt;/a&gt; via a Tailscale network.&lt;/p&gt;
&lt;p&gt;It takes a little bit of work to configure but the result is I can now use Codex CLI on my laptop anywhere in the world against a self-hosted model.&lt;/p&gt;
&lt;p&gt;I used it to build &lt;a href="https://static.simonwillison.net/static/2025/gpt-oss-120b-invaders.html"&gt;this space invaders clone&lt;/a&gt;.


    &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/ai"&gt;ai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/tailscale"&gt;tailscale&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/til"&gt;til&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/generative-ai"&gt;generative-ai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/local-llms"&gt;local-llms&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/llms"&gt;llms&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/nvidia"&gt;nvidia&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/coding-agents"&gt;coding-agents&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/space-invaders"&gt;space-invaders&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/codex-cli"&gt;codex-cli&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/nvidia-spark"&gt;nvidia-spark&lt;/a&gt;&lt;/p&gt;



</summary><category term="ai"/><category term="tailscale"/><category term="til"/><category term="generative-ai"/><category term="local-llms"/><category term="llms"/><category term="nvidia"/><category term="coding-agents"/><category term="space-invaders"/><category term="codex-cli"/><category term="nvidia-spark"/></entry><entry><title>Video + notes on upgrading a Datasette plugin for the latest 1.0 alpha, with help from uv and OpenAI Codex CLI</title><link href="https://simonwillison.net/2025/Nov/6/upgrading-datasette-plugins/#atom-tag" rel="alternate"/><published>2025-11-06T18:26:05+00:00</published><updated>2025-11-06T18:26:05+00:00</updated><id>https://simonwillison.net/2025/Nov/6/upgrading-datasette-plugins/#atom-tag</id><summary type="html">
    &lt;p&gt;I'm upgrading various plugins for compatibility with the new &lt;a href="https://simonwillison.net/2025/Nov/4/datasette-10a20/"&gt;Datasette 1.0a20 alpha release&lt;/a&gt; and I decided to record &lt;a href="https://www.youtube.com/watch?v=qy4ci7AoF9Y"&gt;a video&lt;/a&gt; of the process. This post accompanies that video with detailed additional notes.&lt;/p&gt;

&lt;p&gt;&lt;lite-youtube videoid="qy4ci7AoF9Y" js-api="js-api" title="My process for upgrading Datasette plugins with uv and OpenAI Codex CLI" playlabel="Play: My process for upgrading Datasette plugins with uv and OpenAI Codex CLI"&gt; &lt;/lite-youtube&gt;&lt;/p&gt;

&lt;h4 id="the-datasette-checkbox-plugin"&gt;The datasette-checkbox plugin&lt;/h4&gt;
&lt;p&gt;I picked a very simple plugin to illustrate the upgrade process (possibly too simple). &lt;a href="https://github.com/datasette/datasette-checkbox"&gt;datasette-checkbox&lt;/a&gt; adds just one feature to Datasette: if you are viewing a table with boolean columns (detected as integer columns with names like &lt;code&gt;is_active&lt;/code&gt; or &lt;code&gt;has_attachments&lt;/code&gt; or &lt;code&gt;should_notify&lt;/code&gt;) &lt;em&gt;and&lt;/em&gt; your current user has permission to update rows in that table it adds an inline checkbox UI that looks like this:&lt;/p&gt;
&lt;p&gt;&lt;img src="https://static.simonwillison.net/static/2025/datasette-checkbox.gif" alt="Animated demo of a table with name, is_done, should_be_deleted and is_happy columns. Each column has checkboxes, and clicking a checkboxflashes a little &amp;quot;updated&amp;quot; message." style="max-width: 100%;" /&gt;&lt;/p&gt;
&lt;p&gt;I built the first version with the help of Claude back in August 2024 - details &lt;a href="https://github.com/datasette/datasette-checkbox/issues/1#issuecomment-2294168693"&gt;in this issue comment&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;Most of the implementation is JavaScript that makes calls to Datasette 1.0's &lt;a href="https://simonwillison.net/2022/Dec/2/datasette-write-api/"&gt;JSON write API&lt;/a&gt;. The Python code just checks that the user has the necessary permissions before including the extra JavaScript.&lt;/p&gt;
&lt;h4 id="running-the-plugin-s-tests"&gt;Running the plugin's tests&lt;/h4&gt;
&lt;p&gt;The first step in upgrading any plugin is to run its tests against the latest Datasette version.&lt;/p&gt;
&lt;p&gt;Thankfully &lt;code&gt;uv&lt;/code&gt; makes it easy to run code in scratch virtual environments that include the different code versions you want to test against.&lt;/p&gt;
&lt;p&gt;I have a test utility called &lt;code&gt;tadd&lt;/code&gt; (for "test against development Datasette") which I use for that purpose. I can run it in any plugin directory like this:&lt;/p&gt;
&lt;div class="highlight highlight-source-shell"&gt;&lt;pre&gt;tadd&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;And it will run the existing plugin tests against whatever version of Datasette I have checked out in my &lt;code&gt;~/dev/datasette&lt;/code&gt; directory.&lt;/p&gt;
&lt;p&gt;You can see the full implementation of &lt;code&gt;tadd&lt;/code&gt; (and its friend &lt;code&gt;radd&lt;/code&gt; described below) &lt;a href="https://til.simonwillison.net/python/uv-tests#variants-tadd-and-radd"&gt;in this TIL&lt;/a&gt; - the basic version looks like this:&lt;/p&gt;
&lt;div class="highlight highlight-source-shell"&gt;&lt;pre&gt;&lt;span class="pl-c"&gt;&lt;span class="pl-c"&gt;#!&lt;/span&gt;/bin/sh&lt;/span&gt;
uv run --no-project --isolated \
  --with-editable &lt;span class="pl-s"&gt;&lt;span class="pl-pds"&gt;'&lt;/span&gt;.[test]&lt;span class="pl-pds"&gt;'&lt;/span&gt;&lt;/span&gt; --with-editable &lt;span class="pl-k"&gt;~&lt;/span&gt;/dev/datasette \
  python -m pytest &lt;span class="pl-s"&gt;&lt;span class="pl-pds"&gt;"&lt;/span&gt;&lt;span class="pl-smi"&gt;$@&lt;/span&gt;&lt;span class="pl-pds"&gt;"&lt;/span&gt;&lt;/span&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;I started by running &lt;code&gt;tadd&lt;/code&gt; in the &lt;code&gt;datasette-checkbox&lt;/code&gt; directory, and got my first failure... but it wasn't due to permissions, it was because the &lt;code&gt;pyproject.toml&lt;/code&gt; for the plugin was &lt;a href="https://github.com/datasette/datasette-checkbox/blob/0.1a3/pyproject.toml#L13C1-L15C2"&gt;pinned&lt;/a&gt; to a specific mismatched version of Datasette:&lt;/p&gt;
&lt;div class="highlight highlight-source-toml"&gt;&lt;pre&gt;&lt;span class="pl-smi"&gt;dependencies&lt;/span&gt; = [
    &lt;span class="pl-s"&gt;&lt;span class="pl-pds"&gt;"&lt;/span&gt;datasette==1.0a19&lt;span class="pl-pds"&gt;"&lt;/span&gt;&lt;/span&gt;
]&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;I fixed this problem by swapping &lt;code&gt;==&lt;/code&gt; to &lt;code&gt;&amp;gt;=&lt;/code&gt; and ran the tests again... and they passed! Which was a problem because I was expecting permission-related failures.&lt;/p&gt;
&lt;p&gt;It turns out when I first wrote the plugin I was &lt;a href="https://github.com/datasette/datasette-checkbox/blob/0.1a3/tests/test_checkbox.py"&gt;lazy with the tests&lt;/a&gt; - they weren't actually confirming that the table page loaded without errors.&lt;/p&gt;
&lt;p&gt;I needed to actually run the code myself to see the expected bug.&lt;/p&gt;
&lt;p&gt;First I created myself a demo database using &lt;a href="https://sqlite-utils.datasette.io/en/stable/cli.html#creating-tables"&gt;sqlite-utils create-table&lt;/a&gt;:&lt;/p&gt;
&lt;div class="highlight highlight-source-shell"&gt;&lt;pre&gt;sqlite-utils create-table demo.db \
  demo id integer is_checked integer --pk id&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;Then I ran it with Datasette against the plugin's code like so:&lt;/p&gt;
&lt;div class="highlight highlight-source-shell"&gt;&lt;pre&gt;radd demo.db&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;Sure enough, visiting &lt;code&gt;/demo/demo&lt;/code&gt; produced a 500 error about the missing &lt;code&gt;Datasette.permission_allowed()&lt;/code&gt; method.&lt;/p&gt;
&lt;p&gt;The next step was to update the test to also trigger this error:&lt;/p&gt;
&lt;pre&gt;&lt;span class="pl-en"&gt;@&lt;span class="pl-s1"&gt;pytest&lt;/span&gt;.&lt;span class="pl-c1"&gt;mark&lt;/span&gt;.&lt;span class="pl-c1"&gt;asyncio&lt;/span&gt;&lt;/span&gt;
&lt;span class="pl-k"&gt;async&lt;/span&gt; &lt;span class="pl-k"&gt;def&lt;/span&gt; &lt;span class="pl-en"&gt;test_plugin_adds_javascript&lt;/span&gt;():
    &lt;span class="pl-s1"&gt;datasette&lt;/span&gt; &lt;span class="pl-c1"&gt;=&lt;/span&gt; &lt;span class="pl-en"&gt;Datasette&lt;/span&gt;()
    &lt;span class="pl-s1"&gt;db&lt;/span&gt; &lt;span class="pl-c1"&gt;=&lt;/span&gt; &lt;span class="pl-s1"&gt;datasette&lt;/span&gt;.&lt;span class="pl-c1"&gt;add_memory_database&lt;/span&gt;(&lt;span class="pl-s"&gt;"demo"&lt;/span&gt;)
    &lt;span class="pl-k"&gt;await&lt;/span&gt; &lt;span class="pl-s1"&gt;db&lt;/span&gt;.&lt;span class="pl-c1"&gt;execute_write&lt;/span&gt;(
        &lt;span class="pl-s"&gt;"CREATE TABLE IF NOT EXISTS test (id INTEGER PRIMARY KEY, is_active INTEGER)"&lt;/span&gt;
    )
    &lt;span class="pl-k"&gt;await&lt;/span&gt; &lt;span class="pl-s1"&gt;datasette&lt;/span&gt;.&lt;span class="pl-c1"&gt;invoke_startup&lt;/span&gt;()
    &lt;span class="pl-s1"&gt;response&lt;/span&gt; &lt;span class="pl-c1"&gt;=&lt;/span&gt; &lt;span class="pl-k"&gt;await&lt;/span&gt; &lt;span class="pl-s1"&gt;datasette&lt;/span&gt;.&lt;span class="pl-c1"&gt;client&lt;/span&gt;.&lt;span class="pl-c1"&gt;get&lt;/span&gt;(&lt;span class="pl-s"&gt;"/demo/test"&lt;/span&gt;)
    &lt;span class="pl-k"&gt;assert&lt;/span&gt; &lt;span class="pl-s1"&gt;response&lt;/span&gt;.&lt;span class="pl-c1"&gt;status_code&lt;/span&gt; &lt;span class="pl-c1"&gt;==&lt;/span&gt; &lt;span class="pl-c1"&gt;200&lt;/span&gt;&lt;/pre&gt;
&lt;p&gt;And now &lt;code&gt;tadd&lt;/code&gt; fails as expected.&lt;/p&gt;
&lt;h4 id="upgrading-the-plugin-with-codex"&gt;Upgrading the plugin with Codex&lt;/h4&gt;
&lt;p&gt;It this point I could have manually fixed the plugin itself - which would likely have been faster given the small size of the fix - but instead I demonstrated a bash one-liner I've been using to apply these kinds of changes automatically:&lt;/p&gt;
&lt;div class="highlight highlight-source-shell"&gt;&lt;pre&gt;codex &lt;span class="pl-c1"&gt;exec&lt;/span&gt; --dangerously-bypass-approvals-and-sandbox \
&lt;span class="pl-s"&gt;&lt;span class="pl-pds"&gt;"&lt;/span&gt;Run the command tadd and look at the errors and then&lt;/span&gt;
&lt;span class="pl-s"&gt;read ~/dev/datasette/docs/upgrade-1.0a20.md and apply&lt;/span&gt;
&lt;span class="pl-s"&gt;fixes and run the tests again and get them to pass&lt;span class="pl-pds"&gt;"&lt;/span&gt;&lt;/span&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;&lt;code&gt;codex exec&lt;/code&gt; runs OpenAI Codex in non-interactive mode - it will loop until it has finished the prompt you give it.&lt;/p&gt;
&lt;p&gt;I tell it to consult the subset of the &lt;a href="https://docs.datasette.io/en/latest/upgrade_guide.html#datasette-1-0a20-plugin-upgrade-guide"&gt;Datasette upgrade documentation&lt;/a&gt; that talks about Datasette permissions and then get the &lt;code&gt;tadd&lt;/code&gt; command to pass its tests.&lt;/p&gt;
&lt;p&gt;This is an example of what I call &lt;a href="https://simonwillison.net/2025/Sep/30/designing-agentic-loops/"&gt;designing agentic loops&lt;/a&gt; - I gave Codex the tools it needed (&lt;code&gt;tadd&lt;/code&gt;) and a clear goal and let it get to work on my behalf.&lt;/p&gt;
&lt;p&gt;The remainder of the video covers finishing up the work - testing the fix manually, commiting my work using:&lt;/p&gt;
&lt;div class="highlight highlight-source-shell"&gt;&lt;pre&gt;git commit -a -m &lt;span class="pl-s"&gt;&lt;span class="pl-pds"&gt;"&lt;/span&gt;&lt;span class="pl-s"&gt;&lt;span class="pl-pds"&gt;$(&lt;/span&gt;basename &lt;span class="pl-s"&gt;&lt;span class="pl-pds"&gt;"&lt;/span&gt;&lt;span class="pl-smi"&gt;$PWD&lt;/span&gt;&lt;span class="pl-pds"&gt;"&lt;/span&gt;&lt;/span&gt;&lt;span class="pl-pds"&gt;)&lt;/span&gt;&lt;/span&gt; for datasette&amp;gt;=1.0a20&lt;span class="pl-pds"&gt;"&lt;/span&gt;&lt;/span&gt; \
  -m &lt;span class="pl-s"&gt;&lt;span class="pl-pds"&gt;"&lt;/span&gt;Refs https://github.com/simonw/datasette/issues/2577&lt;span class="pl-pds"&gt;"&lt;/span&gt;&lt;/span&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;Then shipping a &lt;a href="https://pypi.org/project/datasette-checkbox/0.1a4/"&gt;0.1a4 release&lt;/a&gt; to PyPI using the pattern &lt;a href="https://til.simonwillison.net/pypi/pypi-releases-from-github"&gt;described in this TIL&lt;/a&gt;.
Finally, I demonstrated that the shipped plugin worked in a fresh environment using &lt;code&gt;uvx&lt;/code&gt; like this:&lt;/p&gt;
&lt;div class="highlight highlight-source-shell"&gt;&lt;pre&gt;uvx --prerelease=allow --with datasette-checkbox \
  datasette --root &lt;span class="pl-k"&gt;~&lt;/span&gt;/dev/ecosystem/datasette-checkbox/demo.db&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;Executing this command installs and runs a fresh Datasette instance with a fresh copy of the new alpha plugin (&lt;code&gt;--prerelease=allow&lt;/code&gt;). It's a neat way of confirming that freshly released software works as expected.&lt;/p&gt;
&lt;h4 id="a-colophon-for-the-video"&gt;A colophon for the video&lt;/h4&gt;
&lt;p&gt;This video was shot in a single take using &lt;a href="https://www.descript.com/"&gt;Descript&lt;/a&gt;, with no rehearsal and perilously little preparation in advance. I recorded through my AirPods and applied the "Studio Sound" filter to clean up the audio. I pasted in a &lt;code&gt;simonwillison.net&lt;/code&gt; closing slide from &lt;a href="https://simonwillison.net/2025/Oct/23/claude-code-for-web-video/"&gt;my previous video&lt;/a&gt; and exported it locally at 1080p, then uploaded it to YouTube.&lt;/p&gt;
&lt;p&gt;Something I learned from the Software Carpentry &lt;a href="https://simonwillison.net/2020/Sep/26/weeknotes-software-carpentry-sqlite/"&gt;instructor training course&lt;/a&gt; is that making mistakes in front of an audience is actively helpful - it helps them see a realistic version of how software development works and they can learn from watching you recover. I see this as a great excuse for not editing out all of my mistakes!&lt;/p&gt;
&lt;p&gt;I'm trying to build new habits around video content that let me produce useful videos while minimizing the amount of time I spend on production.&lt;/p&gt;
&lt;p&gt;I plan to iterate more on the format as I get more comfortable with the process. I'm hoping I can find the right balance between production time and value to viewers.&lt;/p&gt;
    
        &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/plugins"&gt;plugins&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/python"&gt;python&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/youtube"&gt;youtube&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/ai"&gt;ai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/datasette"&gt;datasette&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/generative-ai"&gt;generative-ai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/llms"&gt;llms&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/ai-assisted-programming"&gt;ai-assisted-programming&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/uv"&gt;uv&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/coding-agents"&gt;coding-agents&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/codex-cli"&gt;codex-cli&lt;/a&gt;&lt;/p&gt;
    

</summary><category term="plugins"/><category term="python"/><category term="youtube"/><category term="ai"/><category term="datasette"/><category term="generative-ai"/><category term="llms"/><category term="ai-assisted-programming"/><category term="uv"/><category term="coding-agents"/><category term="codex-cli"/></entry><entry><title>Code research projects with async coding agents like Claude Code and Codex</title><link href="https://simonwillison.net/2025/Nov/6/async-code-research/#atom-tag" rel="alternate"/><published>2025-11-06T15:53:23+00:00</published><updated>2025-11-06T15:53:23+00:00</updated><id>https://simonwillison.net/2025/Nov/6/async-code-research/#atom-tag</id><summary type="html">
    &lt;p&gt;I've been experimenting with a pattern for LLM usage recently that's working out really well: &lt;strong&gt;asynchronous code research tasks&lt;/strong&gt;. Pick a research question, spin up an asynchronous coding agent and let it go and run some experiments and report back when it's done.&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;&lt;a href="https://simonwillison.net/2025/Nov/6/async-code-research/#code-research"&gt;Code research&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href="https://simonwillison.net/2025/Nov/6/async-code-research/#coding-agents"&gt;Coding agents&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href="https://simonwillison.net/2025/Nov/6/async-code-research/#asynchronous-coding-agents"&gt;Asynchronous coding agents&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href="https://simonwillison.net/2025/Nov/6/async-code-research/#give-them-a-dedicated-github-repository"&gt;Give them a dedicated GitHub repository&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href="https://simonwillison.net/2025/Nov/6/async-code-research/#let-them-rip-with-unlimited-network-access"&gt;Let them rip with unlimited network access&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href="https://simonwillison.net/2025/Nov/6/async-code-research/#my-simonw-research-collection"&gt;My simonw/research collection&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href="https://simonwillison.net/2025/Nov/6/async-code-research/#this-is-total-slop-of-course"&gt;This is total slop, of course&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href="https://simonwillison.net/2025/Nov/6/async-code-research/#try-it-yourself"&gt;Try it yourself&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;h4 id="code-research"&gt;Code research&lt;/h4&gt;
&lt;p&gt;Software development benefits enormously from something I call &lt;strong&gt;code research&lt;/strong&gt;. The great thing about questions about code is that they can often be definitively answered by writing and executing code.&lt;/p&gt;
&lt;p&gt;I often see questions on forums which hint at a lack of understanding of this skill.&lt;/p&gt;
&lt;p&gt;"Could Redis work for powering the notifications feed for my app?" is a great example. The answer is &lt;em&gt;always&lt;/em&gt; "it depends", but a better answer is that a good programmer already has everything they need to answer that question for themselves. Build a proof-of-concept, simulate the patterns you expect to see in production, then run experiments to see if it's going to work.&lt;/p&gt;
&lt;p&gt;I've been a keen practitioner of code research for a long time. Many of my most interesting projects started out as a few dozen lines of experimental code to prove to myself that something was possible.&lt;/p&gt;
&lt;h4 id="coding-agents"&gt;Coding agents&lt;/h4&gt;
&lt;p&gt;It turns out &lt;strong&gt;coding agents&lt;/strong&gt; like Claude Code and Codex are a fantastic fit for this kind of work as well. Give them the right goal and a useful environment and they'll churn through a basic research project without any further supervision.&lt;/p&gt;
&lt;p&gt;LLMs hallucinate and make mistakes. This is far less important for code research tasks because the code itself doesn't lie: if they write code and execute it and it does the right things then they've demonstrated to both themselves and to you that something really does work.&lt;/p&gt;
&lt;p&gt;They can't prove something is impossible - just because the coding agent couldn't find a way to do something doesn't mean it can't be done - but they can often demonstrate that something &lt;em&gt;is&lt;/em&gt; possible in just a few minutes of crunching.&lt;/p&gt;
&lt;h4 id="asynchronous-coding-agents"&gt;Asynchronous coding agents&lt;/h4&gt;
&lt;p&gt;I've used interactive coding agents like Claude Code and Codex CLI for a bunch of these, but today I'm increasingly turning to their &lt;strong&gt;asynchronous coding agent&lt;/strong&gt; family members instead.&lt;/p&gt;
&lt;p&gt;An asynchronous coding agent is a coding agent that operates on a fire-and-forget basis. You pose it a task, it churns away on a server somewhere and when it's done it files a pull request against your chosen GitHub repository.&lt;/p&gt;
&lt;p&gt;OpenAI's &lt;a href="https://chatgpt.com/codex"&gt;Codex Cloud&lt;/a&gt;, Anthropic's &lt;a href="https://claude.ai/code"&gt;Claude Code for web&lt;/a&gt;, Google Gemini's &lt;a href="https://jules.google/"&gt;Jules&lt;/a&gt;, and GitHub's &lt;a href="https://docs.github.com/en/copilot/concepts/agents/coding-agent/about-coding-agent?utm_source=chatgpt.com"&gt;Copilot coding agent&lt;/a&gt; are four prominent examples of this pattern.&lt;/p&gt;
&lt;p&gt;These are &lt;em&gt;fantastic&lt;/em&gt; tools for code research projects. Come up with a clear goal, turn it into a few paragraphs of prompt, set them loose and check back ten minutes later to see what they've come up with.&lt;/p&gt;
&lt;p&gt;I'm firing off 2-3 code research projects a day right now. My own time commitment is minimal and they frequently come back with useful or interesting results.&lt;/p&gt;
&lt;h4 id="give-them-a-dedicated-github-repository"&gt;Give them a dedicated GitHub repository&lt;/h4&gt;
&lt;p&gt;You can run a code research task against an existing GitHub repository, but I find it's much more liberating to have a separate, dedicated repository for your coding agents to run their projects in.&lt;/p&gt;
&lt;p&gt;This frees you from being limited to research against just code you've already written, and also means you can be much less cautious about what you let the agents do.&lt;/p&gt;
&lt;p&gt;I have two repositories that I use for this - one public, one private. I use the public one for research tasks that have no need to be private, and the private one for anything that I'm not yet ready to share with the world.&lt;/p&gt;
&lt;h4 id="let-them-rip-with-unlimited-network-access"&gt;Let them rip with unlimited network access&lt;/h4&gt;
&lt;p&gt;The biggest benefit of a dedicated repository is that you don't need to be cautious about what the agents operating in that repository can do.&lt;/p&gt;
&lt;p&gt;Both Codex Cloud and Claude Code for web default to running agents in a locked-down environment, with strict restrictions on how they can access the network. This makes total sense if they are running against sensitive repositories - a prompt injection attack of the &lt;a href="https://simonwillison.net/2025/Jun/16/the-lethal-trifecta/"&gt;lethal trifecta&lt;/a&gt; variety could easily be used to steal sensitive code or environment variables.&lt;/p&gt;
&lt;p&gt;If you're running in a fresh, non-sensitive repository you don't need to worry about this at all! I've configured my research repositories for full network access, which means my coding agents can install any dependencies they need, fetch data from the web and generally do anything I'd be able to do on my own computer.&lt;/p&gt;
&lt;h4 id="my-simonw-research-collection"&gt;My simonw/research collection&lt;/h4&gt;
&lt;p&gt;Let's dive into some examples. My public research repository is at &lt;a href="https://github.com/simonw/research"&gt;simonw/research&lt;/a&gt; on GitHub. It currently contains 13 folders, each of which is a separate research project. I only created it two weeks ago so I'm already averaging nearly one a day!&lt;/p&gt;
&lt;p&gt;It also includes &lt;a href="https://github.com/simonw/research/blob/main/.github/workflows/update-readme.yml"&gt;a GitHub Workflow&lt;/a&gt; which uses &lt;a href="https://docs.github.com/en/github-models"&gt;GitHub Models&lt;/a&gt; to automatically update &lt;a href="https://github.com/simonw/research/blob/main/README.md"&gt;the README&lt;/a&gt; file with a summary of every new project, using &lt;a href="https://cog.readthedocs.io/"&gt;Cog&lt;/a&gt;, &lt;a href="https://llm.datasette.io/"&gt;LLM&lt;/a&gt;, &lt;a href="https://github.com/tonybaloney/llm-github-models"&gt;llm-github-models&lt;/a&gt; and &lt;a href="https://github.com/simonw/research/blob/b059108dfefeb05a48e1c27f7a127dc9fd648129/README.md#L9-L116"&gt;this snippet of Python&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;Here are a some example research projects from the repo.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;&lt;a href="https://github.com/simonw/research/tree/main/node-pyodide"&gt;node-pyodide&lt;/a&gt;&lt;/strong&gt; shows an example of a &lt;a href="https://github.com/simonw/research/blob/main/node-pyodide/server-simple.js"&gt;Node.js script&lt;/a&gt; that runs the &lt;a href="https://pyodide.org/"&gt;Pyodide&lt;/a&gt; WebAssembly distribution of Python inside it - yet another of my &lt;a href="https://simonwillison.net/tags/sandboxing+python/"&gt;ongoing attempts&lt;/a&gt; to find a great way of running Python in a WebAssembly sandbox on a server.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;&lt;a href="https://github.com/simonw/research/tree/main/python-markdown-comparison"&gt;python-markdown-comparison&lt;/a&gt;&lt;/strong&gt; (&lt;a href="https://gistpreview.github.io/?fb07c2a3fd2d4cfb814a46696a58a00e"&gt;transcript&lt;/a&gt;) provides a detailed performance benchmark of seven different Python Markdown libraries. I fired this one off because I stumbled across &lt;a href="https://pypi.org/project/cmarkgfm/"&gt;cmarkgfm&lt;/a&gt;, a Python binding around GitHub's Markdown implementation in C, and wanted to see how it compared to the other options. This one produced some charts! &lt;code&gt;cmarkgfm&lt;/code&gt; came out on top by a significant margin:&lt;/p&gt;
&lt;p&gt;&lt;img src="https://static.simonwillison.net/static/2025/markdown-performance.png" alt="Bar chart titled &amp;quot;Relative Performance vs cmarkgfm (Large Document)&amp;quot; comparing relative speed of markdown libraries, with marko at 52.1x, markdown2 at 16.9x, mistletoe at 14.1x, markdown at 12.9x, commonmark at 12.1x, mistune at 10.0x, and cmarkgfm at 1.0x baseline marked by a red dashed line; x-axis labeled &amp;quot;Relative Speed (lower is better)&amp;quot; ranging from 0 to 50+" style="max-width: 100%;" /&gt;&lt;/p&gt;
&lt;p&gt;Here's the entire prompt I used for that project:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;Create a performance benchmark and feature comparison report on PyPI cmarkgfm compared to other popular Python markdown libraries - check all of them out from github and read the source to get an idea for features, then design and run a benchmark including generating some charts, then create a report in a new python-markdown-comparison folder (do not create a _summary.md file or edit anywhere outside of that folder). Make sure the performance chart images are directly displayed in the README.md in the folder.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;Note that I didn't specify any Markdown libraries other than &lt;code&gt;cmarkgfm&lt;/code&gt; - Claude Code ran a search and found the other six by itself.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;&lt;a href="https://github.com/simonw/research/tree/main/cmarkgfm-in-pyodide"&gt;cmarkgfm-in-pyodide&lt;/a&gt;&lt;/strong&gt; is a lot more fun. A neat thing about having all of my research projects in the same repository is that new projects can build on previous ones. Here I decided to see how hard it would be to get &lt;code&gt;cmarkgfm&lt;/code&gt; - which has a C extension - working inside Pyodide inside Node.js. Claude successfully compiled a 88.4KB &lt;code&gt;cmarkgfm_pyodide-2025.10.22-cp312-cp312-emscripten_3_1_46_wasm32.whl&lt;/code&gt; file with the necessary C extension and proved it could be loaded into Pyodide in WebAssembly inside of Node.js.&lt;/p&gt;
&lt;p&gt;I ran this one using Claude Code on my laptop after an initial attempt failed. The starting prompt was:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;Figure out how to get the cmarkgfm markdown lover &lt;em&gt;[typo in prompt, this should have been "library" but it figured it out anyway]&lt;/em&gt; for Python working in pyodide. This will be hard because it uses C so you will need to compile it to pyodide compatible webassembly somehow. Write a report on your results plus code to a new cmarkgfm-in-pyodide directory. Test it using pytest to exercise a node.js test script that calls pyodide as seen in the existing node.js and pyodide directory&lt;/p&gt;
&lt;p&gt;There is an existing branch that was an initial attempt at this research, but which failed because it did not have Internet access. You do have Internet access. Use that existing branch to accelerate your work, but do not commit any code unless you are certain that you have successfully executed tests that prove that the pyodide module you created works correctly.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;This one gave up half way through, complaining that emscripten would take too long. I told it:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;Complete this project, actually run emscripten, I do not care how long it takes, update the report if it works&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;It churned away for a bit longer and complained that the existing Python library used CFFI which isn't available in Pyodide. I asked it:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;Can you figure out how to rewrite cmarkgfm to not use FFI and to use a pyodide-friendly way of integrating that C code instead?&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;... and it did. You can &lt;a href="https://gistpreview.github.io/?6d778a8f9c4c2c005a189ff308c3bc47"&gt;see the full transcript here&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;&lt;a href="https://github.com/simonw/research/tree/main/blog-tags-scikit-learn"&gt;blog-tags-scikit-learn&lt;/a&gt;&lt;/strong&gt;. Taking a short break from WebAssembly, I thought it would be fun to put &lt;a href="https://scikit-learn.org/stable/"&gt;scikit-learn&lt;/a&gt; through its paces on a text classification task against my blog:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;Work in a new folder called blog-tags-scikit-learn&lt;/p&gt;
&lt;p&gt;Download &lt;code&gt;https://datasette.simonwillison.net/simonwillisonblog.db&lt;/code&gt; - a SQLite database. Take a look at the blog_entry table and the associated tags - a lot of the earlier entries do not have tags associated with them, where the later entries do. Design, implement and execute models to suggests tags for those earlier entries based on textual analysis against later ones&lt;/p&gt;
&lt;p&gt;Use Python scikit learn and try several different strategies&lt;/p&gt;
&lt;p&gt;Produce JSON of the results for each one, plus scripts for running them and a detailed markdown description&lt;/p&gt;
&lt;p&gt;Also include an HTML page with a nice visualization of the results that works by loading those JSON files.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;This resulted in seven &lt;code&gt;.py&lt;/code&gt; files, four &lt;code&gt;.json&lt;/code&gt; results files and a detailed &lt;a href="https://github.com/simonw/research/blob/main/blog-tags-scikit-learn/README.md"&gt;report&lt;/a&gt;. (It ignored the bit about an HTML page with a nice visualization for some reason.) Not bad for a few moments of idle curiosity typed into my phone!&lt;/p&gt;
&lt;p&gt;That's just three of the thirteen projects in the repository so far. The commit history for each one usually links to the prompt and sometimes the transcript if you want to see how they unfolded.&lt;/p&gt;
&lt;p&gt;More recently I added a short &lt;code&gt;AGENTS.md&lt;/code&gt; file to the repo with a few extra tips for my research agents. You can &lt;a href="https://github.com/simonw/research/blob/b059108dfefeb05a48e1c27f7a127dc9fd648129/AGENTS.md"&gt;read that here&lt;/a&gt;.&lt;/p&gt;
&lt;h4 id="this-is-total-slop-of-course"&gt;This is total slop, of course&lt;/h4&gt;
&lt;p&gt;My preferred definition of &lt;a href="https://simonwillison.net/2024/May/8/slop/"&gt;AI slop&lt;/a&gt; is AI-generated content that is published without human review. I've not been reviewing these reports in great detail myself, and I wouldn't usually publish them online without some serious editing and verification.&lt;/p&gt;
&lt;p&gt;I want to share the pattern I'm using though, so I decided to keep them quarantined in this one public &lt;code&gt;simonw/research&lt;/code&gt; repository.&lt;/p&gt;
&lt;p&gt;A tiny feature request for GitHub: I'd love to be able to mark a repository as "exclude from search indexes" such that it gets labelled with &lt;code&gt;&amp;lt;meta name="robots" content="noindex"&amp;gt;&lt;/code&gt; tags. I still like to keep AI-generated content out of search, to avoid contributing more to the &lt;a href="https://en.wikipedia.org/wiki/Dead_Internet_theory"&gt;dead internet&lt;/a&gt;.&lt;/p&gt;
&lt;h4 id="try-it-yourself"&gt;Try it yourself&lt;/h4&gt;
&lt;p&gt;It's pretty easy to get started trying out this coding agent research pattern. Create a free GitHub repository (public or private) and let some agents loose on it and see what happens.&lt;/p&gt;
&lt;p&gt;You can run agents locally but I find the asynchronous agents to be more convenient - especially as I can run them (or trigger them from my phone) without any fear of them damaging my own machine or leaking any of my private data.&lt;/p&gt;
&lt;p&gt;Claude Code for web offers &lt;a href="https://support.claude.com/en/articles/12690958-claude-code-promotion"&gt;a free $250 of credits&lt;/a&gt; for their $20/month users for a limited time (until November 18, 2025). Gemini Jules has &lt;a href="https://jules.google/docs/usage-limits/"&gt;a free tier&lt;/a&gt;. There are plenty of other coding agents you can try out as well.&lt;/p&gt;
&lt;p&gt;Let me know if your research agents come back with anything interesting!&lt;/p&gt;
    
        &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/ai"&gt;ai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/webassembly"&gt;webassembly&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/generative-ai"&gt;generative-ai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/llms"&gt;llms&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/ai-assisted-programming"&gt;ai-assisted-programming&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/slop"&gt;slop&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/ai-agents"&gt;ai-agents&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/coding-agents"&gt;coding-agents&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/claude-code"&gt;claude-code&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/jules"&gt;jules&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/codex-cli"&gt;codex-cli&lt;/a&gt;&lt;/p&gt;
    

</summary><category term="ai"/><category term="webassembly"/><category term="generative-ai"/><category term="llms"/><category term="ai-assisted-programming"/><category term="slop"/><category term="ai-agents"/><category term="coding-agents"/><category term="claude-code"/><category term="jules"/><category term="codex-cli"/></entry><entry><title>A new SQL-powered permissions system in Datasette 1.0a20</title><link href="https://simonwillison.net/2025/Nov/4/datasette-10a20/#atom-tag" rel="alternate"/><published>2025-11-04T21:34:42+00:00</published><updated>2025-11-04T21:34:42+00:00</updated><id>https://simonwillison.net/2025/Nov/4/datasette-10a20/#atom-tag</id><summary type="html">
    &lt;p&gt;&lt;a href="https://docs.datasette.io/en/latest/changelog.html#a20-2025-11-03"&gt;Datasette 1.0a20 is out&lt;/a&gt; with the biggest breaking API change on the road to 1.0, improving how Datasette's permissions system works by migrating permission logic to SQL running in SQLite. This release involved &lt;a href="https://github.com/simonw/datasette/compare/1.0a19...1.0a20"&gt;163 commits&lt;/a&gt;, with 10,660 additions and 1,825 deletions, most of which was written with the help of Claude Code.&lt;/p&gt;


&lt;ul&gt;
  &lt;li&gt;&lt;a href="https://simonwillison.net/2025/Nov/4/datasette-10a20/#understanding-the-permissions-system"&gt;Understanding the permissions system&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href="https://simonwillison.net/2025/Nov/4/datasette-10a20/#permissions-systems-need-to-be-able-to-efficiently-list-things"&gt;Permissions systems need to be able to efficiently list things&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href="https://simonwillison.net/2025/Nov/4/datasette-10a20/#the-new-permission-resources-sql-plugin-hook"&gt;The new permission_resources_sql() plugin hook&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href="https://simonwillison.net/2025/Nov/4/datasette-10a20/#hierarchies-plugins-vetoes-and-restrictions"&gt;Hierarchies, plugins, vetoes, and restrictions&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href="https://simonwillison.net/2025/Nov/4/datasette-10a20/#new-debugging-tools"&gt;New debugging tools&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href="https://simonwillison.net/2025/Nov/4/datasette-10a20/#the-missing-feature-list-actors-who-can-act-on-this-resource"&gt;The missing feature: list actors who can act on this resource&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href="https://simonwillison.net/2025/Nov/4/datasette-10a20/#upgrading-plugins-for-datasette-1-0a20"&gt;Upgrading plugins for Datasette 1.0a20&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href="https://simonwillison.net/2025/Nov/4/datasette-10a20/#using-claude-code-to-implement-this-change"&gt;Using Claude Code to implement this change&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href="https://simonwillison.net/2025/Nov/4/datasette-10a20/#starting-with-a-proof-of-concept"&gt;Starting with a proof-of-concept&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href="https://simonwillison.net/2025/Nov/4/datasette-10a20/#miscellaneous-tips-i-picked-up-along-the-way"&gt;Miscellaneous tips I picked up along the way&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href="https://simonwillison.net/2025/Nov/4/datasette-10a20/#what-s-next-"&gt;What's next?&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h4 id="understanding-the-permissions-system"&gt;Understanding the permissions system&lt;/h4&gt;
&lt;p&gt;Datasette's &lt;a href="https://docs.datasette.io/en/latest/authentication.html"&gt;permissions system&lt;/a&gt; exists to answer the following question:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;Is this &lt;strong&gt;actor&lt;/strong&gt; allowed to perform this &lt;strong&gt;action&lt;/strong&gt;, optionally against this particular &lt;strong&gt;resource&lt;/strong&gt;?&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;An &lt;strong&gt;actor&lt;/strong&gt; is usually a user, but might also be an automation operating via the Datasette API.&lt;/p&gt;
&lt;p&gt;An &lt;strong&gt;action&lt;/strong&gt; is a thing they need to do - things like view-table, execute-sql, insert-row.&lt;/p&gt;
&lt;p&gt;A &lt;strong&gt;resource&lt;/strong&gt; is the subject of the action - the database you are executing SQL against, the table you want to insert a row into.&lt;/p&gt;
&lt;p&gt;Datasette's default configuration is public but read-only: anyone can view databases and tables or execute read-only SQL queries but no-one can modify data.&lt;/p&gt;
&lt;p&gt;Datasette plugins can enable all sorts of additional ways to interact with databases, many of which need to be protected by a form of authentication Datasette also 1.0 includes &lt;a href="https://simonwillison.net/2022/Dec/2/datasette-write-api/"&gt;a write API&lt;/a&gt; with a need to configure who can insert, update, and delete rows or create new tables.&lt;/p&gt;
&lt;p&gt;Actors can be authenticated in a number of different ways provided by plugins using the &lt;a href="https://docs.datasette.io/en/latest/plugin_hooks.html#actor-from-request-datasette-request"&gt;actor_from_request()&lt;/a&gt; plugin hook. &lt;a href="https://datasette.io/plugins/datasette-auth-passwords"&gt;datasette-auth-passwords&lt;/a&gt; and &lt;a href="https://datasette.io/plugins/datasette-auth-github"&gt;datasette-auth-github&lt;/a&gt; and &lt;a href="https://datasette.io/plugins/datasette-auth-existing-cookies"&gt;datasette-auth-existing-cookies&lt;/a&gt; are examples of authentication plugins.&lt;/p&gt;
&lt;h4 id="permissions-systems-need-to-be-able-to-efficiently-list-things"&gt;Permissions systems need to be able to efficiently list things&lt;/h4&gt;
&lt;p&gt;The previous implementation included a design flaw common to permissions systems of this nature: each permission check involved a function call which would delegate to one or more plugins and return a True/False result.&lt;/p&gt;
&lt;p&gt;This works well for single checks, but has a significant problem: what if you need to show the user a list of things they can access, for example the tables they can view?&lt;/p&gt;
&lt;p&gt;I want Datasette to be able to handle potentially thousands of tables - tables in SQLite are cheap! I don't want to have to run 1,000+ permission checks just to show the user a list of tables.&lt;/p&gt;
&lt;p&gt;Since Datasette is built on top of SQLite we already have a powerful mechanism to help solve this problem. SQLite is &lt;em&gt;really&lt;/em&gt; good at filtering large numbers of records.&lt;/p&gt;
&lt;h4 id="the-new-permission-resources-sql-plugin-hook"&gt;The new permission_resources_sql() plugin hook&lt;/h4&gt;
&lt;p&gt;The biggest change in the new release is that I've replaced the previous  &lt;code&gt;permission_allowed(actor, action, resource)&lt;/code&gt; plugin hook - which let a plugin determine if an actor could perform an action against a resource - with a new &lt;a href="https://docs.datasette.io/en/latest/plugin_hooks.html#plugin-hook-permission-resources-sql"&gt;permission_resources_sql(actor, action)&lt;/a&gt; plugin hook.&lt;/p&gt;
&lt;p&gt;Instead of returning a True/False result, this new hook returns a SQL query that returns rules helping determine the resources the current actor can execute the specified action against.&lt;/p&gt;
&lt;p&gt;Here's an example, lifted from the documentation:&lt;/p&gt;
&lt;pre&gt;&lt;span class="pl-k"&gt;from&lt;/span&gt; &lt;span class="pl-s1"&gt;datasette&lt;/span&gt; &lt;span class="pl-k"&gt;import&lt;/span&gt; &lt;span class="pl-s1"&gt;hookimpl&lt;/span&gt;
&lt;span class="pl-k"&gt;from&lt;/span&gt; &lt;span class="pl-s1"&gt;datasette&lt;/span&gt;.&lt;span class="pl-s1"&gt;permissions&lt;/span&gt; &lt;span class="pl-k"&gt;import&lt;/span&gt; &lt;span class="pl-v"&gt;PermissionSQL&lt;/span&gt;


&lt;span class="pl-en"&gt;@&lt;span class="pl-s1"&gt;hookimpl&lt;/span&gt;&lt;/span&gt;
&lt;span class="pl-k"&gt;def&lt;/span&gt; &lt;span class="pl-en"&gt;permission_resources_sql&lt;/span&gt;(&lt;span class="pl-s1"&gt;datasette&lt;/span&gt;, &lt;span class="pl-s1"&gt;actor&lt;/span&gt;, &lt;span class="pl-s1"&gt;action&lt;/span&gt;):
    &lt;span class="pl-k"&gt;if&lt;/span&gt; &lt;span class="pl-s1"&gt;action&lt;/span&gt; &lt;span class="pl-c1"&gt;!=&lt;/span&gt; &lt;span class="pl-s"&gt;"view-table"&lt;/span&gt;:
        &lt;span class="pl-k"&gt;return&lt;/span&gt; &lt;span class="pl-c1"&gt;None&lt;/span&gt;
    &lt;span class="pl-k"&gt;if&lt;/span&gt; &lt;span class="pl-c1"&gt;not&lt;/span&gt; &lt;span class="pl-s1"&gt;actor&lt;/span&gt; &lt;span class="pl-c1"&gt;or&lt;/span&gt; &lt;span class="pl-s1"&gt;actor&lt;/span&gt;.&lt;span class="pl-c1"&gt;get&lt;/span&gt;(&lt;span class="pl-s"&gt;"id"&lt;/span&gt;) &lt;span class="pl-c1"&gt;!=&lt;/span&gt; &lt;span class="pl-s"&gt;"alice"&lt;/span&gt;:
        &lt;span class="pl-k"&gt;return&lt;/span&gt; &lt;span class="pl-c1"&gt;None&lt;/span&gt;

    &lt;span class="pl-k"&gt;return&lt;/span&gt; &lt;span class="pl-en"&gt;PermissionSQL&lt;/span&gt;(
        &lt;span class="pl-s1"&gt;sql&lt;/span&gt;&lt;span class="pl-c1"&gt;=&lt;/span&gt;&lt;span class="pl-s"&gt;"""&lt;/span&gt;
&lt;span class="pl-s"&gt;            SELECT&lt;/span&gt;
&lt;span class="pl-s"&gt;                'accounting' AS parent,&lt;/span&gt;
&lt;span class="pl-s"&gt;                'sales' AS child,&lt;/span&gt;
&lt;span class="pl-s"&gt;                1 AS allow,&lt;/span&gt;
&lt;span class="pl-s"&gt;                'alice can view accounting/sales' AS reason&lt;/span&gt;
&lt;span class="pl-s"&gt;        """&lt;/span&gt;,
    )&lt;/pre&gt;
&lt;p&gt;This hook grants the actor with ID "alice" permission to view the "sales" table in the "accounting" database.&lt;/p&gt;
&lt;p&gt;The &lt;code&gt;PermissionSQL&lt;/code&gt; object should always return four columns: a parent, child, allow (1 or 0), and a reason string for debugging.&lt;/p&gt;
&lt;p&gt;When you ask Datasette to list the resources an actor can access for a specific action, it will combine the SQL returned by all installed plugins into a single query that joins against &lt;a href="https://docs.datasette.io/en/latest/internals.html#internal-database-schema"&gt;the internal catalog tables&lt;/a&gt; and efficiently lists all the resources the actor can access.&lt;/p&gt;
&lt;p&gt;This query can then be limited or paginated to avoid loading too many results at once.&lt;/p&gt;
&lt;h4 id="hierarchies-plugins-vetoes-and-restrictions"&gt;Hierarchies, plugins, vetoes, and restrictions&lt;/h4&gt;
&lt;p&gt;Datasette has several additional requirements that make the permissions system more complicated.&lt;/p&gt;
&lt;p&gt;Datasette permissions can optionally act against a two-level &lt;strong&gt;hierarchy&lt;/strong&gt;. You can grant a user the ability to insert-row against a specific table, or every table in a specific database, or every table in &lt;em&gt;every&lt;/em&gt; database in that Datasette instance.&lt;/p&gt;
&lt;p&gt;Some actions can apply at the table level, others the database level and others only make sense globally - enabling a new feature that isn't tied to tables or databases, for example.&lt;/p&gt;
&lt;p&gt;Datasette currently has &lt;a href="https://docs.datasette.io/en/latest/authentication.html#built-in-actions"&gt;ten default actions&lt;/a&gt; but &lt;strong&gt;plugins&lt;/strong&gt; that add additional features can &lt;a href="https://docs.datasette.io/en/latest/plugin_hooks.html#register-actions-datasette"&gt;register new actions&lt;/a&gt; to better participate in the permission systems.&lt;/p&gt;
&lt;p&gt;Datasette's permission system has a mechanism to &lt;strong&gt;veto&lt;/strong&gt; permission checks - a plugin can return a deny for a specific permission check which will override any allows. This needs to be hierarchy-aware - a deny at the database level can be outvoted by an allow at the table level.&lt;/p&gt;
&lt;p&gt;Finally, Datasette includes a mechanism for applying additional &lt;strong&gt;restrictions&lt;/strong&gt; to a request. This was introduced for Datasette's API - it allows a user to create an API token that can act on their behalf but is only allowed to perform a subset of their capabilities - just reading from two specific tables, for example. Restrictions are &lt;a href="https://docs.datasette.io/en/latest/authentication.html#restricting-the-actions-that-a-token-can-perform"&gt;described in more detail&lt;/a&gt; in the documentation.&lt;/p&gt;
&lt;p&gt;That's a lot of different moving parts for the new implementation to cover.&lt;/p&gt;
&lt;h4 id="new-debugging-tools"&gt;New debugging tools&lt;/h4&gt;
&lt;p&gt;Since permissions are critical to the security of a Datasette deployment it's vital that they are as easy to understand and debug as possible.&lt;/p&gt;
&lt;p&gt;The new alpha adds several new debugging tools, including this page that shows the full list of resources matching a specific action for the current user:&lt;/p&gt;
&lt;p&gt;&lt;img src="https://static.simonwillison.net/static/2025/datasette-allowed-resources.jpg" alt="Allowed resources. Tabs are Playground, Check, Allowed, Rules, Actions, Allow debug. There is a form where you can select an action (here view-table) and optionally filter by parent and child. Below is a table of results listing resource paths - e.g. /fixtures/name-of-table - plus parent, child and reason columns. The reason is a JSON list for example &amp;quot;datasette.default_permissions: root user&amp;quot;,&amp;quot;datasette.default_permissions: default allow for view-table&amp;quot;." style="max-width: 100%;" /&gt;&lt;/p&gt;
&lt;p&gt;And this page listing the &lt;em&gt;rules&lt;/em&gt; that apply to that question - since different plugins may return different rules which get combined together:&lt;/p&gt;
&lt;p&gt;&lt;img src="https://static.simonwillison.net/static/2025/datasette-rules.jpg" alt="The rules tab for the same view-table question. Here there are two allow rules - one from datasette.default_permissions for the root user and another from default_permissions labelled default allow for view-table." style="max-width: 100%;" /&gt;&lt;/p&gt;
&lt;p&gt;This screenshot illustrates two of Datasette's built-in rules: there is a default allow for read-only operations such as view-table (which can be over-ridden by plugins) and another rule that says the root user can do anything (provided Datasette was started with the &lt;code&gt;--root&lt;/code&gt; option.)&lt;/p&gt;
&lt;p&gt;Those rules are defined in the &lt;a href="https://github.com/simonw/datasette/blob/1.0a20/datasette/default_permissions.py"&gt;datasette/default_permissions.py&lt;/a&gt; Python module.&lt;/p&gt;
&lt;h4 id="the-missing-feature-list-actors-who-can-act-on-this-resource"&gt;The missing feature: list actors who can act on this resource&lt;/h4&gt;
&lt;p&gt;There's one question that the new system cannot answer: provide a full list of actors who can perform this action against this resource.&lt;/p&gt;
&lt;p&gt;It's not possibly to provide this globally for Datasette because Datasette doesn't have a way to track what "actors" exist in the system. SSO plugins such as &lt;code&gt;datasette-auth-github&lt;/code&gt; mean a new authenticated GitHub user might show up at any time, with the ability to perform actions despite the Datasette system never having encountered that particular username before.&lt;/p&gt;
&lt;p&gt;API tokens and actor restrictions come into play here as well. A user might create a signed API token that can perform a subset of actions on their behalf - the existence of that token can't be predicted by the permissions system.&lt;/p&gt;
&lt;p&gt;This is a notable omission, but it's also quite common in other systems. AWS cannot provide a list of all actors who have permission to access a specific S3 bucket, for example - presumably for similar reasons.&lt;/p&gt;
&lt;h4 id="upgrading-plugins-for-datasette-1-0a20"&gt;Upgrading plugins for Datasette 1.0a20&lt;/h4&gt;
&lt;p&gt;Datasette's plugin ecosystem is the reason I'm paying so much attention to ensuring Datasette 1.0 has a stable API. I don't want plugin authors to need to chase breaking changes once that 1.0 release is out.&lt;/p&gt;
&lt;p&gt;The &lt;a href="https://docs.datasette.io/en/latest/upgrade_guide.html"&gt;Datasette upgrade guide&lt;/a&gt; includes detailed notes on upgrades that are needed between the 0.x and 1.0 alpha releases. I've added an extensive section about the permissions changes to that document.&lt;/p&gt;
&lt;p&gt;I've also been experimenting with dumping those instructions directly into coding agent tools - Claude Code and Codex CLI - to have them upgrade existing plugins for me. This has been working &lt;em&gt;extremely well&lt;/em&gt;. I've even had Claude Code &lt;a href="https://github.com/simonw/datasette/commit/fa978ec1006297416e2cd87a2f0d3cac99283cf8"&gt;update those notes itself&lt;/a&gt; with things it learned during an upgrade process!&lt;/p&gt;
&lt;p&gt;This is greatly helped by the fact that every single Datasette plugin has an automated test suite that demonstrates the core functionality works as expected. Coding agents can use those tests to verify that their changes have had the desired effect.&lt;/p&gt;
&lt;p&gt;I've also been leaning heavily on &lt;code&gt;uv&lt;/code&gt; to help with the upgrade process. I wrote myself two new helper scripts - &lt;code&gt;tadd&lt;/code&gt; and &lt;code&gt;radd&lt;/code&gt; - to help test the new plugins.&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;tadd&lt;/code&gt; = "test against datasette dev" - it runs a plugin's existing test suite against the current development version of Datasette checked out on my machine. It passes extra options through to &lt;code&gt;pytest&lt;/code&gt; so I can run &lt;code&gt;tadd -k test_name&lt;/code&gt; or &lt;code&gt;tadd -x --pdb&lt;/code&gt; as needed.&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;radd&lt;/code&gt; = "run against datasette dev" - it runs the latest dev &lt;code&gt;datasette&lt;/code&gt; command with the plugin installed.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;The &lt;code&gt;tadd&lt;/code&gt; and &lt;code&gt;radd&lt;/code&gt; implementations &lt;a href="https://til.simonwillison.net/python/uv-tests#variants-tadd-and-radd"&gt;can be found in this TIL&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;Some of my plugin upgrades have become a one-liner to the &lt;code&gt;codex exec&lt;/code&gt; command, which runs OpenAI Codex CLI with a prompt without entering interactive mode:&lt;/p&gt;
&lt;div class="highlight highlight-source-shell"&gt;&lt;pre&gt;codex &lt;span class="pl-c1"&gt;exec&lt;/span&gt; --dangerously-bypass-approvals-and-sandbox \
&lt;span class="pl-s"&gt;&lt;span class="pl-pds"&gt;"&lt;/span&gt;Run the command tadd and look at the errors and then&lt;/span&gt;
&lt;span class="pl-s"&gt;read ~/dev/datasette/docs/upgrade-1.0a20.md and apply&lt;/span&gt;
&lt;span class="pl-s"&gt;fixes and run the tests again and get them to pass&lt;span class="pl-pds"&gt;"&lt;/span&gt;&lt;/span&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;There are still a bunch more to go - there's &lt;a href="https://github.com/simonw/datasette/issues/2577"&gt;a list in this tracking issue&lt;/a&gt; - but I expect to have the plugins I maintain all upgraded pretty quickly now that I have a solid process in place.&lt;/p&gt;
&lt;h4 id="using-claude-code-to-implement-this-change"&gt;Using Claude Code to implement this change&lt;/h4&gt;
&lt;p&gt;This change to Datasette core &lt;em&gt;by far&lt;/em&gt; the most ambitious piece of work I've ever attempted using a coding agent.&lt;/p&gt;
&lt;p&gt;Last year I agreed with the prevailing opinion that LLM assistance was much more useful for greenfield coding tasks than working on existing codebases. The amount you could usefully get done was greatly limited by the need to fit the entire codebase into the model's context window.&lt;/p&gt;
&lt;p&gt;Coding agents have entirely changed that calculation. Claude Code and Codex CLI still have relatively limited token windows - albeit larger than last year - but their ability to search through the codebase, read extra files on demand and "reason" about the code they are working with has made them vastly more capable.&lt;/p&gt;
&lt;p&gt;I no longer see codebase size as a limiting factor for how useful they can be.&lt;/p&gt;
&lt;p&gt;I've also spent enough time with Claude Sonnet 4.5 to build a weird level of trust in it. I can usually predict exactly what changes it will make for a prompt. If I tell it "extract this code into a separate function" or "update every instance of this pattern" I know it's likely to get it right.&lt;/p&gt;
&lt;p&gt;For something like permission code I still review everything it does, often by watching it as it works since it displays diffs in the UI.&lt;/p&gt;
&lt;p&gt;I also pay extremely close attention to the tests it's writing. Datasette 1.0a19 already had 1,439 tests, many of which exercised the existing permission system. 1.0a20 increases that to 1,583 tests. I feel very good about that, especially since most of the existing tests continued to pass without modification.&lt;/p&gt;
&lt;h4 id="starting-with-a-proof-of-concept"&gt;Starting with a proof-of-concept&lt;/h4&gt;
&lt;p&gt;I built several different proof-of-concept implementations of SQL permissions before settling on the final design. My &lt;a href="https://github.com/simonw/research/tree/main/sqlite-permissions-poc"&gt;research/sqlite-permissions-poc&lt;/a&gt; project was the one that finally convinced me of a viable approach,&lt;/p&gt;
&lt;p&gt;That one started as a &lt;a href="https://claude.ai/share/8fd432bc-a718-4883-9978-80ab82a75c87"&gt;free ranging conversation with Claude&lt;/a&gt;, at the end of which I told it to generate a specification which I then &lt;a href="https://chatgpt.com/share/68f6532f-9920-8006-928a-364e15b6e9ef"&gt;fed into GPT-5&lt;/a&gt; to implement. You can see that specification &lt;a href="https://github.com/simonw/research/tree/main/sqlite-permissions-poc#original-prompt"&gt;at the end of the README&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;I later fed the POC itself into Claude Code and had it implement the first version of the new Datasette system based on that previous experiment.&lt;/p&gt;
&lt;p&gt;This is admittedly a very weird way of working, but it helped me finally break through on a problem that I'd been struggling with for months.&lt;/p&gt;
&lt;h4 id="miscellaneous-tips-i-picked-up-along-the-way"&gt;Miscellaneous tips I picked up along the way&lt;/h4&gt;
&lt;ul&gt;
&lt;li&gt;When working on anything relating to plugins it's vital to have at least a few real plugins that you upgrade in lock-step with the core changes. The &lt;code&gt;tadd&lt;/code&gt; and &lt;code&gt;radd&lt;/code&gt; shortcuts were invaluable for productively working on those plugins while I made changes to core.&lt;/li&gt;
&lt;li&gt;Coding agents make experiments &lt;em&gt;much&lt;/em&gt; cheaper. I threw away so much code on the way to the final implementation, which was psychologically easier because the cost to create that code in the first place was so low.&lt;/li&gt;
&lt;li&gt;Tests, tests, tests. This project would have been impossible without that existing test suite. The additional tests we built along the way give me confidence that the new system is as robust as I need it to be.&lt;/li&gt;
&lt;li&gt;Claude writes good commit messages now! I finally gave in and let it write these - previously I've been determined to write them myself. It's a big time saver to be able to say "write a tasteful commit message for these changes".&lt;/li&gt;
&lt;li&gt;Claude is also great at breaking up changes into smaller commits. It can also productively rewrite history to make it easier to follow, especially useful if you're still working in a branch.&lt;/li&gt;
&lt;li&gt;A really great way to review Claude's changes is with the GitHub PR interface. You can attach comments to individual lines of code and then later prompt Claude like this: &lt;code&gt;Use gh CLI to fetch comments on URL-to-PR and make the requested changes&lt;/code&gt;. This is a very quick way to apply little nitpick changes - rename this function, refactor this repeated code, add types here etc.&lt;/li&gt;
&lt;li&gt;The code I write with LLMs is &lt;em&gt;higher quality code&lt;/em&gt;. I usually find myself making constant trade-offs while coding: this function would be neater if I extracted this helper, it would be nice to have inline documentation here, this changing this would be good but would break a dozen tests... for each of those I have to determine if the additional time is worth the benefit. Claude can apply changes so much faster than me that these calculations have changed - almost any improvement is worth applying, no matter how trivial, because the time cost is so low.&lt;/li&gt;
&lt;li&gt;Internal tools are cheap now. The new debugging interfaces were mostly written by Claude and are significantly nicer to use and look at than the hacky versions I would have knocked out myself, if I had even taken the extra time to build them.&lt;/li&gt;
&lt;li&gt;That trick with a Markdown file full of upgrade instructions works astonishingly well - it's the same basic idea as &lt;a href="https://simonwillison.net/2025/Oct/16/claude-skills/"&gt;Claude Skills&lt;/a&gt;. I maintain over 100 Datasette plugins now and I expect I'll be automating all sorts of minor upgrades in the future using this technique.&lt;/li&gt;
&lt;/ul&gt;
&lt;h4 id="what-s-next-"&gt;What's next?&lt;/h4&gt;
&lt;p&gt;Now that the new alpha is out my focus is upgrading the existing plugin ecosystem to use it, and supporting other plugin authors who are doing the same.&lt;/p&gt;
&lt;p&gt;The new permissions system unlocks some key improvements to Datasette Cloud concerning finely-grained permissions for larger teams, so I'll be integrating the new alpha there this week.&lt;/p&gt;
&lt;p&gt;This is the single biggest backwards-incompatible change required before Datasette 1.0. I plan to apply the lessons I learned from this project to the other, less intimidating changes. I'm hoping this can result in a final 1.0 release before the end of the year!&lt;/p&gt;
    
        &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/plugins"&gt;plugins&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/projects"&gt;projects&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/python"&gt;python&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/sql"&gt;sql&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/sqlite"&gt;sqlite&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/datasette"&gt;datasette&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/annotated-release-notes"&gt;annotated-release-notes&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/uv"&gt;uv&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/coding-agents"&gt;coding-agents&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/claude-code"&gt;claude-code&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/codex-cli"&gt;codex-cli&lt;/a&gt;&lt;/p&gt;
    

</summary><category term="plugins"/><category term="projects"/><category term="python"/><category term="sql"/><category term="sqlite"/><category term="datasette"/><category term="annotated-release-notes"/><category term="uv"/><category term="coding-agents"/><category term="claude-code"/><category term="codex-cli"/></entry><entry><title>Just Talk To It - the no-bs Way of Agentic Engineering</title><link href="https://simonwillison.net/2025/Oct/14/agentic-engineering/#atom-tag" rel="alternate"/><published>2025-10-14T21:26:40+00:00</published><updated>2025-10-14T21:26:40+00:00</updated><id>https://simonwillison.net/2025/Oct/14/agentic-engineering/#atom-tag</id><summary type="html">
    
&lt;p&gt;&lt;strong&gt;&lt;a href="https://steipete.me/posts/just-talk-to-it"&gt;Just Talk To It - the no-bs Way of Agentic Engineering&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;
Peter Steinberger's long, detailed description of his current process for using Codex CLI and GPT-5 Codex. This is information dense and full of actionable tips, plus plenty of strong opinions about the differences between Claude 4.5 an GPT-5:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;While Claude reacts well to 🚨 SCREAMING ALL-CAPS 🚨 commands that threaten it that it will imply ultimate failure and 100 kittens will die if it runs command X, that freaks out GPT-5. (Rightfully so). So drop all of that and just use words like a human.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;Peter is a &lt;em&gt;heavy&lt;/em&gt; user of parallel agents:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;I've completely moved to &lt;code&gt;codex&lt;/code&gt; cli as daily driver. I run between 3-8 in parallel in a 3x3 terminal grid, most of them &lt;a href="https://x.com/steipete/status/1977771686176174352"&gt;in the same folder&lt;/a&gt;, some experiments go in separate folders. I experimented with worktrees, PRs but always revert back to this setup as it gets stuff done the fastest.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;He shares my preference for CLI utilities over MCPs:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;I can just refer to a cli by name. I don't need any explanation in my agents file. The agent will try $randomcrap on the first call, the cli will present the help menu, context now has full info how this works and from now on we good. I don't have to pay a price for any tools, unlike MCPs which are a constant cost and garbage in my context. Use GitHub's MCP and see 23k tokens gone. Heck, they did make it better because it was almost 50.000 tokens when it first launched. Or use the &lt;code&gt;gh&lt;/code&gt; cli which has basically the same feature set, models already know how to use it, and pay zero context tax.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;It's worth reading the &lt;a href="https://steipete.me/posts/just-talk-to-it#do-you-do-spec-driven-development"&gt;section on why he abandoned spec driven development&lt;/a&gt; in full.


    &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/ai"&gt;ai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/generative-ai"&gt;generative-ai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/llms"&gt;llms&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/ai-assisted-programming"&gt;ai-assisted-programming&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/model-context-protocol"&gt;model-context-protocol&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/coding-agents"&gt;coding-agents&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/claude-code"&gt;claude-code&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/codex-cli"&gt;codex-cli&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/parallel-agents"&gt;parallel-agents&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/peter-steinberger"&gt;peter-steinberger&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/agentic-engineering"&gt;agentic-engineering&lt;/a&gt;&lt;/p&gt;



</summary><category term="ai"/><category term="generative-ai"/><category term="llms"/><category term="ai-assisted-programming"/><category term="model-context-protocol"/><category term="coding-agents"/><category term="claude-code"/><category term="codex-cli"/><category term="parallel-agents"/><category term="peter-steinberger"/><category term="agentic-engineering"/></entry><entry><title>Embracing the parallel coding agent lifestyle</title><link href="https://simonwillison.net/2025/Oct/5/parallel-coding-agents/#atom-tag" rel="alternate"/><published>2025-10-05T12:06:55+00:00</published><updated>2025-10-05T12:06:55+00:00</updated><id>https://simonwillison.net/2025/Oct/5/parallel-coding-agents/#atom-tag</id><summary type="html">
    &lt;p&gt;For a while now I've been hearing from engineers who run multiple coding agents at once - firing up several Claude Code or Codex CLI instances at the same time, sometimes in the same repo, sometimes against multiple checkouts or &lt;a href="https://docs.claude.com/en/docs/claude-code/common-workflows#run-parallel-claude-code-sessions-with-git-worktrees"&gt;git worktrees&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;I was pretty skeptical about this at first. AI-generated code needs to be reviewed, which means the natural bottleneck on all of this is how fast I can review the results. It's tough keeping up with just a single LLM given how fast they can churn things out, where's the benefit from running more than one at a time if it just leaves me further behind?&lt;/p&gt;
&lt;p&gt;Despite my misgivings, over the past few weeks I've noticed myself quietly starting to embrace the parallel coding agent lifestyle.&lt;/p&gt;
&lt;p&gt;I can only focus on reviewing and landing one significant change at a time, but I'm finding an increasing number of tasks that can still be fired off in parallel without adding too much cognitive overhead to my primary work.&lt;/p&gt;
&lt;p&gt;Here are some patterns I've found for applying parallel agents effectively.&lt;/p&gt;
&lt;h4 id="research-poc"&gt;Research for proof of concepts&lt;/h4&gt;
&lt;p&gt;The first category of tasks I've been applying this pattern to is &lt;strong&gt;research&lt;/strong&gt;.&lt;/p&gt;
&lt;p&gt;Research tasks answer questions or provide recommendations without making modifications to a project that you plan to keep.&lt;/p&gt;
&lt;p&gt;A lot of software projects start with a proof of concept. Can &lt;a href="https://yjs.dev"&gt;Yjs&lt;/a&gt; be used to implement a simple collaborative note writing tool with a Python backend? The &lt;a href="https://github.com/y-crdt/pycrdt"&gt;libraries exist&lt;/a&gt;, but do they work when you wire them together?&lt;/p&gt;
&lt;p&gt;Today's coding agents can build a proof of concept with new libraries and resolve those kinds of basic questions. Libraries too new to be in the training data? Doesn't matter: tell them to checkout the repos for those new dependencies and read the code to figure out how to use them.&lt;/p&gt;
&lt;h4 id="how-does-that-work-again"&gt;How does that work again?&lt;/h4&gt;
&lt;p&gt;If you need a reminder about how a portion of your existing system works, modern "reasoning" LLMs can provide a detailed, actionable answer in just a minute or two.&lt;/p&gt;
&lt;p&gt;It doesn't matter how large your codebase is: coding agents are extremely effective with tools like grep and can follow codepaths through dozens of different files if they need to.&lt;/p&gt;
&lt;p&gt;Ask them to make notes on where your signed cookies are set and read, or how your application uses subprocesses and threads, or which aspects of your JSON API aren't yet covered by your documentation.&lt;/p&gt;
&lt;p&gt;These LLM-generated explanations are worth stashing away somewhere, because they can make excellent context to paste into further prompts in the future.&lt;/p&gt;
&lt;h4 id="small-maintenance-tasks"&gt;Small maintenance tasks&lt;/h4&gt;
&lt;p&gt;Now we're moving on to code edits that we intend to keep, albeit with &lt;em&gt;very&lt;/em&gt; low-stakes. It turns out there are a lot of problems that really just require a little bit of extra cognitive overhead which can be outsourced to a bot.&lt;/p&gt;
&lt;p&gt;Warnings are a great example. Is your test suite spitting out a warning that something you are using is deprecated? Chuck that at a bot - tell it to run the test suite and figure out how to fix the warning. No need to take a break from what you're doing to resolve minor irritations like that.&lt;/p&gt;
&lt;p&gt;There is a definite knack to spotting opportunities like this. As always, the best way to develop that instinct is to try things - any small maintenance task is something that's worth trying with a coding agent. You can learn from both their successes &lt;em&gt;and&lt;/em&gt; their failures.&lt;/p&gt;
&lt;h4 id="carefully-specified-and-directed-actual-work"&gt;Carefully specified and directed actual work&lt;/h4&gt;
&lt;p&gt;Reviewing code that lands on your desk out of nowhere is a &lt;em&gt;lot&lt;/em&gt; of work. First you have to derive the goals of the new implementation: what's it trying to achieve? Is this something the project needs? Is the approach taken the best for this current project, given other future planned changes? A lot of big questions before you can even start digging into the details of the code.&lt;/p&gt;
&lt;p&gt;Code that started from your own specification is a lot less effort to review. If you already decided what to solve, picked the approach and worked out a detailed specification for the work itself, confirming it was built to your needs can take a lot less time.&lt;/p&gt;
&lt;p&gt;I described my &lt;a href="https://simonwillison.net/2025/Mar/11/using-llms-for-code/#tell-them-exactly-what-to-do"&gt;more authoritarian approach&lt;/a&gt; to prompting models for code back in March. If I tell them &lt;em&gt;exactly&lt;/em&gt; how to build something the work needed to review the resulting changes is a whole lot less taxing.&lt;/p&gt;
&lt;h4 id="how-i-m-using-these-tools-today"&gt;How I'm using these tools today&lt;/h4&gt;
&lt;p&gt;My daily drivers are currently &lt;a href="https://www.claude.com/product/claude-code"&gt;Claude Code&lt;/a&gt; (on Sonnet 4.5), &lt;a href="https://github.com/openai/codex"&gt;Codex CLI&lt;/a&gt; (on GPT-5-Codex), and &lt;a href="https://chatgpt.com/codex"&gt;Codex Cloud&lt;/a&gt; (for asynchronous tasks, frequently launched from my phone.)&lt;/p&gt;
&lt;p&gt;I'm also dabbling with &lt;a href="https://docs.github.com/en/copilot/concepts/agents/coding-agent/about-coding-agent"&gt;GitHub Copilot Coding Agent&lt;/a&gt; (the agent baked into the &lt;a href="https://github.com"&gt;GitHub.com&lt;/a&gt; web interface in various places) and &lt;a href="https://jules.google"&gt;Google Jules&lt;/a&gt;, Google's currently-free alternative to Codex Cloud.&lt;/p&gt;
&lt;p&gt;I'm still settling into patterns that work for me. I imagine I'll be iterating on my processes for a long time to come, especially as the landscape of coding agents continues to evolve.&lt;/p&gt;
&lt;p&gt;I frequently have multiple terminal windows open running different coding agents in different directories. These are currently a mixture of Claude Code and Codex CLI, running in &lt;a href="https://simonwillison.net/2025/Sep/30/designing-agentic-loops/#the-joy-of-yolo-mode"&gt;YOLO mode&lt;/a&gt; (no approvals) for tasks where I'm confident malicious instructions can't sneak into the context.&lt;/p&gt;
&lt;p&gt;(I need to start habitually running my local agents in Docker containers to further limit the blast radius if something goes wrong.)&lt;/p&gt;
&lt;p&gt;I haven't adopted git worktrees yet: if I want to run two agents in isolation against the same repo I do a fresh checkout, often into &lt;code&gt;/tmp&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;For riskier tasks I'm currently using asynchronous coding agents - usually Codex Cloud - so if anything goes wrong the worst that can happen is my source code getting leaked (since &lt;a href="https://simonwillison.net/2025/Jun/3/codex-agent-internet-access/"&gt;I allow it to have network access&lt;/a&gt; while running). Most of what I work on is open source anyway so that's not a big concern for me.&lt;/p&gt;
&lt;p&gt;I occasionally use &lt;a href="https://github.com/features/codespaces"&gt;GitHub Codespaces&lt;/a&gt; to run VS Code's agent mode, which is surprisingly effective and runs directly in my browser. This is particularly great for workshops and demos since it works for anyone with GitHub account, no extra API key necessary.&lt;/p&gt;
&lt;h4 id="please-share-your-patterns-that-work"&gt;Please share your patterns that work&lt;/h4&gt;
&lt;p&gt;This category of coding agent software is still really new, and the models have only really got good enough to drive them effectively in the past few months - Claude 4 and GPT-5 in particular.&lt;/p&gt;
&lt;p&gt;I plan to write more as I figure out the ways of using them that are most effective. I encourage other practitioners to do the same!&lt;/p&gt;
&lt;h4 id="recommended-reading"&gt;Recommended reading&lt;/h4&gt;
&lt;p&gt;Jesse Vincent wrote &lt;a href="https://blog.fsck.com/2025/10/05/how-im-using-coding-agents-in-september-2025/"&gt;How I'm using coding agents in September, 2025&lt;/a&gt; which describes his workflow for parallel agents in detail, including having an architect agent iterate on a plan which is then reviewed and implemented by fresh instances of Claude Code.&lt;/p&gt;
&lt;p&gt;In &lt;a href="https://sketch.dev/blog/seven-prompting-habits"&gt;The 7 Prompting Habits of Highly Effective Engineers&lt;/a&gt; Josh Bleecher Snyder describes several patterns for this kind of work. I particularly like this one:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Send out a scout&lt;/strong&gt;. Hand the AI agent a task just to find out where the sticky bits are, so you don’t have to make those mistakes.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;I've tried this a few times with good results: give the agent a genuinely difficult task against a large codebase, with no intention of actually landing its code, just to get ideas from which files it modifies and how it approaches the problem.&lt;/p&gt;
&lt;p&gt;Peter Steinberger's &lt;a href="https://steipete.me/posts/just-talk-to-it"&gt;Just Talk To It - the no-bs Way of Agentic Engineering&lt;/a&gt; provides a very detailed description of his current process built around Codex CLI.&lt;/p&gt;
    
        &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/ai"&gt;ai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/generative-ai"&gt;generative-ai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/llms"&gt;llms&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/ai-assisted-programming"&gt;ai-assisted-programming&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/ai-agents"&gt;ai-agents&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/coding-agents"&gt;coding-agents&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/claude-code"&gt;claude-code&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/async-coding-agents"&gt;async-coding-agents&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/jules"&gt;jules&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/codex-cli"&gt;codex-cli&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/parallel-agents"&gt;parallel-agents&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/jesse-vincent"&gt;jesse-vincent&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/peter-steinberger"&gt;peter-steinberger&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/agentic-engineering"&gt;agentic-engineering&lt;/a&gt;&lt;/p&gt;
    

</summary><category term="ai"/><category term="generative-ai"/><category term="llms"/><category term="ai-assisted-programming"/><category term="ai-agents"/><category term="coding-agents"/><category term="claude-code"/><category term="async-coding-agents"/><category term="jules"/><category term="codex-cli"/><category term="parallel-agents"/><category term="jesse-vincent"/><category term="peter-steinberger"/><category term="agentic-engineering"/></entry><entry><title>GitHub Copilot CLI is now in public preview</title><link href="https://simonwillison.net/2025/Sep/25/github-copilot-cli/#atom-tag" rel="alternate"/><published>2025-09-25T23:58:34+00:00</published><updated>2025-09-25T23:58:34+00:00</updated><id>https://simonwillison.net/2025/Sep/25/github-copilot-cli/#atom-tag</id><summary type="html">
    
&lt;p&gt;&lt;strong&gt;&lt;a href="https://github.blog/changelog/2025-09-25-github-copilot-cli-is-now-in-public-preview/"&gt;GitHub Copilot CLI is now in public preview&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;
GitHub now have their own entry in the coding terminal CLI agent space: &lt;a href="https://github.com/features/copilot/cli"&gt;Copilot CLI&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;It's the same basic shape as Claude Code, Codex CLI, Gemini CLI and a growing number of other tools in this space. It's a terminal UI which you accepts instructions and can modify files, run commands and integrate with GitHub's MCP server and other MCP servers that you configure.&lt;/p&gt;
&lt;p&gt;Two notable features compared to many of the others:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;It works against the &lt;a href="https://docs.github.com/en/github-models"&gt;GitHub Models&lt;/a&gt; backend. It defaults to Claude Sonnet 4 but you can set &lt;code&gt;COPILOT_MODEL=gpt-5&lt;/code&gt; to switch to GPT-5. Presumably other models will become available soon.&lt;/li&gt;
&lt;li&gt;It's billed against your existing GitHub Copilot account. &lt;a href="https://github.com/features/copilot/plans"&gt;Pricing details are here&lt;/a&gt; - they're split into "Agent mode" requests and "Premium" requests. Different plans get different allowances, which are shared with other products in the GitHub Copilot family.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;The best available documentation right now is the &lt;code&gt;copilot --help&lt;/code&gt; screen - &lt;a href="https://gist.github.com/simonw/bc739b8c67aa6e7a5f4f519942e66671"&gt;here's a copy of that in a Gist&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;It's a competent entry into the market, though it's missing features like the ability to paste in images which have been introduced to Claude Code and Codex CLI over the past few months.&lt;/p&gt;
&lt;p&gt;&lt;em&gt;Disclosure: I got a preview of this at an event at Microsoft's offices in Seattle last week. They did not pay me for my time but they did cover my flight, hotel and some dinners.&lt;/em&gt;


    &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/github"&gt;github&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/microsoft"&gt;microsoft&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/ai"&gt;ai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/generative-ai"&gt;generative-ai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/github-copilot"&gt;github-copilot&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/llms"&gt;llms&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/ai-assisted-programming"&gt;ai-assisted-programming&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/ai-agents"&gt;ai-agents&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/coding-agents"&gt;coding-agents&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/claude-code"&gt;claude-code&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/codex-cli"&gt;codex-cli&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/disclosures"&gt;disclosures&lt;/a&gt;&lt;/p&gt;



</summary><category term="github"/><category term="microsoft"/><category term="ai"/><category term="generative-ai"/><category term="github-copilot"/><category term="llms"/><category term="ai-assisted-programming"/><category term="ai-agents"/><category term="coding-agents"/><category term="claude-code"/><category term="codex-cli"/><category term="disclosures"/></entry><entry><title>GPT-5-Codex</title><link href="https://simonwillison.net/2025/Sep/23/gpt-5-codex/#atom-tag" rel="alternate"/><published>2025-09-23T23:59:20+00:00</published><updated>2025-09-23T23:59:20+00:00</updated><id>https://simonwillison.net/2025/Sep/23/gpt-5-codex/#atom-tag</id><summary type="html">
    
&lt;p&gt;&lt;strong&gt;&lt;a href="https://platform.openai.com/docs/models/gpt-5-codex"&gt;GPT-5-Codex&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;
OpenAI &lt;a href="https://simonwillison.net/2025/Sep/15/gpt-5-codex/"&gt;half-released this model&lt;/a&gt; earlier this month, adding it to their Codex CLI tool but not their API.&lt;/p&gt;
&lt;p&gt;Today they've fixed that - the new model can now be accessed as &lt;code&gt;gpt-5-codex&lt;/code&gt;. It's priced the same as regular GPT-5: $1.25/million input tokens, $10/million output tokens, and the same hefty 90% discount for previously cached input tokens, especially important for agentic tool-using workflows which quickly produce a lengthy conversation.&lt;/p&gt;
&lt;p&gt;It's only available via their Responses API, which means you currently need to install the &lt;a href="https://github.com/simonw/llm-openai-plugin"&gt;llm-openai-plugin&lt;/a&gt; to use it with LLM:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;llm install -U llm-openai-plugin
llm -m openai/gpt-5-codex -T llm_version 'What is the LLM version?'
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Outputs:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;The installed LLM version is 0.27.1.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;I added &lt;a href="https://llm.datasette.io/en/stable/tools.html"&gt;tool support&lt;/a&gt; to that plugin today, &lt;a href="https://github.com/simonw/llm-openai-plugin/issues/20#issuecomment-3325921197"&gt;mostly authored by GPT-5 Codex itself&lt;/a&gt; using OpenAI's Codex CLI.&lt;/p&gt;
&lt;p&gt;The new &lt;a href="https://cookbook.openai.com/examples/gpt-5-codex_prompting_guide"&gt;prompting guide for GPT-5-Codex&lt;/a&gt; is worth a read.&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;GPT-5-Codex is purpose-built for Codex CLI, the Codex IDE extension, the Codex cloud environment, and working in GitHub, and also supports versatile tool use. We recommend using GPT-5-Codex only for agentic and interactive coding use cases.&lt;/p&gt;
&lt;p&gt;Because the model is trained specifically for coding, many best practices you once had to prompt into general purpose models are built in, and over prompting can reduce quality.&lt;/p&gt;
&lt;p&gt;The core prompting principle for GPT-5-Codex is &lt;strong&gt;“less is more.”&lt;/strong&gt;&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;I &lt;a href="https://gist.github.com/simonw/b371949ae984b0431848cd16cba24b27"&gt;tried my pelican benchmark&lt;/a&gt; at a cost of &lt;a href="https://www.llm-prices.com/#it=16&amp;amp;ot=2154&amp;amp;ic=1.25&amp;amp;oc=10"&gt;2.156 cents&lt;/a&gt;.&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;llm -m openai/gpt-5-codex "Generate an SVG of a pelican riding a bicycle"
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;img alt="See description below" src="https://static.simonwillison.net/static/2025/gpt-5-codex-api-pelican.png" /&gt;&lt;/p&gt;
&lt;p&gt;I asked Codex to describe this image and it correctly identified it as a pelican!&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;llm -m openai/gpt-5-codex -a https://static.simonwillison.net/static/2025/gpt-5-codex-api-pelican.png \
  -s 'Write very detailed alt text'
&lt;/code&gt;&lt;/pre&gt;
&lt;blockquote&gt;
&lt;p&gt;Cartoon illustration of a cream-colored pelican with a large orange beak and tiny black eye riding a minimalist dark-blue bicycle. The bird’s wings are tucked in, its legs resemble orange stick limbs pushing the pedals, and its tail feathers trail behind with light blue motion streaks to suggest speed. A small coral-red tongue sticks out of the pelican’s beak. The bicycle has thin light gray spokes, and the background is a simple pale blue gradient with faint curved lines hinting at ground and sky.&lt;/p&gt;
&lt;/blockquote&gt;


    &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/ai"&gt;ai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/openai"&gt;openai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/prompt-engineering"&gt;prompt-engineering&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/generative-ai"&gt;generative-ai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/llms"&gt;llms&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/ai-assisted-programming"&gt;ai-assisted-programming&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/pelican-riding-a-bicycle"&gt;pelican-riding-a-bicycle&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/llm-reasoning"&gt;llm-reasoning&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/llm-release"&gt;llm-release&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/gpt-5"&gt;gpt-5&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/codex-cli"&gt;codex-cli&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/gpt-codex"&gt;gpt-codex&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/gpt"&gt;gpt&lt;/a&gt;&lt;/p&gt;



</summary><category term="ai"/><category term="openai"/><category term="prompt-engineering"/><category term="generative-ai"/><category term="llms"/><category term="ai-assisted-programming"/><category term="pelican-riding-a-bicycle"/><category term="llm-reasoning"/><category term="llm-release"/><category term="gpt-5"/><category term="codex-cli"/><category term="gpt-codex"/><category term="gpt"/></entry><entry><title>httpjail</title><link href="https://simonwillison.net/2025/Sep/19/httpjail/#atom-tag" rel="alternate"/><published>2025-09-19T21:57:29+00:00</published><updated>2025-09-19T21:57:29+00:00</updated><id>https://simonwillison.net/2025/Sep/19/httpjail/#atom-tag</id><summary type="html">
    
&lt;p&gt;&lt;strong&gt;&lt;a href="https://github.com/coder/httpjail"&gt;httpjail&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;
Here's a promising new (experimental) project in the sandboxing space from Ammar Bandukwala at &lt;a href="https://coder.com/"&gt;Coder&lt;/a&gt;. &lt;code&gt;httpjail&lt;/code&gt; provides a Rust CLI tool for running an individual process against a custom configured HTTP proxy.&lt;/p&gt;
&lt;p&gt;The initial goal is to help run coding agents like Claude Code and Codex CLI with extra rules governing how they interact with outside services. From Ammar's blog post that introduces the new tool, &lt;a href="https://ammar.io/blog/httpjail"&gt;Fine-grained HTTP filtering for Claude Code&lt;/a&gt;:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;&lt;code&gt;httpjail&lt;/code&gt; implements an HTTP(S) interceptor alongside process-level network isolation. Under default configuration, all DNS (udp:53) is permitted and all other non-HTTP(S) traffic is blocked.&lt;/p&gt;
&lt;p&gt;&lt;code&gt;httpjail&lt;/code&gt; rules are either JavaScript expressions or custom programs. This approach makes them far more flexible than traditional rule-oriented firewalls and avoids the learning curve of a DSL.&lt;/p&gt;
&lt;p&gt;Block all HTTP requests other than the LLM API traffic itself:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;$ httpjail --js "r.host === 'api.anthropic.com'" -- claude "build something great"
&lt;/code&gt;&lt;/pre&gt;
&lt;/blockquote&gt;
&lt;p&gt;I tried it out using OpenAI's Codex CLI instead and found this recipe worked:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;brew upgrade rust
cargo install httpjail # Drops it in `~/.cargo/bin`
httpjail --js "r.host === 'chatgpt.com'" -- codex
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Within that Codex instance the model ran fine but any attempts to access other URLs (e.g. telling it "&lt;code&gt;Use curl to fetch simonwillison.net&lt;/code&gt;)" failed at the proxy layer.&lt;/p&gt;
&lt;p&gt;This is still at a really early stage but there's a lot I like about this project. Being able to use JavaScript to filter requests via the &lt;code&gt;--js&lt;/code&gt; option is neat (it's using V8 under the hood), and there's also a &lt;code&gt;--sh shellscript&lt;/code&gt; option which instead runs a shell program passing environment variables that can be used to determine if the request should be allowed.&lt;/p&gt;
&lt;p&gt;At a basic level it works by running a proxy server and setting &lt;code&gt;HTTP_PROXY&lt;/code&gt; and &lt;code&gt;HTTPS_PROXY&lt;/code&gt; environment variables so well-behaving software knows how to route requests.&lt;/p&gt;
&lt;p&gt;It can also add a bunch of other layers. On Linux it sets up &lt;a href="https://en.wikipedia.org/wiki/Nftables"&gt;nftables&lt;/a&gt; rules to explicitly deny additional network access. There's also a &lt;code&gt;--docker-run&lt;/code&gt; option which can launch a Docker container with the specified image but first locks that container down to only have network access to the &lt;code&gt;httpjail&lt;/code&gt; proxy server.&lt;/p&gt;
&lt;p&gt;It can intercept, filter and log HTTPS requests too by generating its own certificate and making that available to the underlying process.&lt;/p&gt;
&lt;p&gt;I'm always interested in new approaches to sandboxing, and fine-grained network access is a particularly tricky problem to solve. This looks like a very promising step in that direction - I'm looking forward to seeing how this project continues to evolve.

    &lt;p&gt;&lt;small&gt;&lt;/small&gt;Via &lt;a href="https://ammar.io/blog/httpjail"&gt;Fine-grained HTTP filtering for Claude Code&lt;/a&gt;&lt;/small&gt;&lt;/p&gt;


    &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/http"&gt;http&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/javascript"&gt;javascript&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/proxies"&gt;proxies&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/sandboxing"&gt;sandboxing&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/security"&gt;security&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/v8"&gt;v8&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/rust"&gt;rust&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/claude-code"&gt;claude-code&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/codex-cli"&gt;codex-cli&lt;/a&gt;&lt;/p&gt;



</summary><category term="http"/><category term="javascript"/><category term="proxies"/><category term="sandboxing"/><category term="security"/><category term="v8"/><category term="rust"/><category term="claude-code"/><category term="codex-cli"/></entry></feed>