<?xml version="1.0" encoding="utf-8"?>
<feed xml:lang="en-us" xmlns="http://www.w3.org/2005/Atom"><title>Simon Willison's Weblog: openai</title><link href="http://simonwillison.net/" rel="alternate"/><link href="http://simonwillison.net/tags/openai.atom" rel="self"/><id>http://simonwillison.net/</id><updated>2026-05-12T17:45:07+00:00</updated><author><name>Simon Willison</name></author><entry><title>llm 0.32a2</title><link href="https://simonwillison.net/2026/May/12/llm/#atom-tag" rel="alternate"/><published>2026-05-12T17:45:07+00:00</published><updated>2026-05-12T17:45:07+00:00</updated><id>https://simonwillison.net/2026/May/12/llm/#atom-tag</id><summary type="html">
    
        &lt;p&gt;&lt;strong&gt;Release:&lt;/strong&gt; &lt;a href="https://github.com/simonw/llm/releases/tag/0.32a2"&gt;llm 0.32a2&lt;/a&gt;&lt;/p&gt;
        &lt;p&gt;A bunch of useful stuff in this &lt;a href="https://llm.datasette.io/"&gt;LLM&lt;/a&gt; alpha, but the most important detail is this one:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;Most reasoning-capable OpenAI models now use the &lt;a href="https://developers.openai.com/api/reference/responses/overview"&gt;&lt;code&gt;/v1/responses&lt;/code&gt;&lt;/a&gt; endpoint instead of &lt;code&gt;/v1/chat/completions&lt;/code&gt;. This enables interleaved reasoning across tool calls for GPT-5 class models. &lt;a href="https://github.com/simonw/llm/pull/1435"&gt;#1435&lt;/a&gt;&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;This means you can now see the summarized reasoning tokens when you run prompts against an OpenAI model, displayed in a different color to standard error. Use the &lt;code&gt;-R&lt;/code&gt; or &lt;code&gt;--hide-reasoning&lt;/code&gt; flags if you don't want to see that.&lt;/p&gt;
    
    
        &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/llm"&gt;llm&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/projects"&gt;projects&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/openai"&gt;openai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/generative-ai"&gt;generative-ai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/annotated-release-notes"&gt;annotated-release-notes&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/ai"&gt;ai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/llms"&gt;llms&lt;/a&gt;&lt;/p&gt;
    

</summary><category term="llm"/><category term="projects"/><category term="openai"/><category term="generative-ai"/><category term="annotated-release-notes"/><category term="ai"/><category term="llms"/></entry><entry><title>Quoting Luke Curley</title><link href="https://simonwillison.net/2026/May/9/luke-curley/#atom-tag" rel="alternate"/><published>2026-05-09T01:03:58+00:00</published><updated>2026-05-09T01:03:58+00:00</updated><id>https://simonwillison.net/2026/May/9/luke-curley/#atom-tag</id><summary type="html">
    &lt;blockquote cite="https://moq.dev/blog/webrtc-is-the-problem/"&gt;&lt;p&gt;WebRTC is designed to &lt;strong&gt;degrade and drop my prompt&lt;/strong&gt; during poor network conditions.&lt;/p&gt;
&lt;p&gt;wtf my dude&lt;/p&gt;
&lt;p&gt;WebRTC aggressively drops audio packets to keep latency low. If you’ve ever heard distorted audio on a conference call, that’s WebRTC baybee. The idea is that conference calls depend on rapid back-and-forth, so pausing to wait for audio is unacceptable.&lt;/p&gt;
&lt;p&gt;…but as a user, I would much rather wait an extra 200ms for my slow/expensive prompt to be accurate. After all, I’m paying good money to boil the ocean, and a garbage prompt means a garbage response. It’s not like LLMs are particularly responsive anyway.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;But I’m not allowed to wait&lt;/strong&gt;. It’s &lt;em&gt;impossible&lt;/em&gt; to even retransmit a WebRTC audio packet within a browser; we tried at Discord. The &lt;em&gt;implementation&lt;/em&gt; is hard-coded for real-time latency &lt;strong&gt;or else&lt;/strong&gt;.&lt;/p&gt;&lt;/blockquote&gt;
&lt;p class="cite"&gt;&amp;mdash; &lt;a href="https://moq.dev/blog/webrtc-is-the-problem/"&gt;Luke Curley&lt;/a&gt;, OpenAI’s WebRTC Problem, in response to &lt;a href="https://openai.com/index/delivering-low-latency-voice-ai-at-scale/"&gt;How OpenAI delivers low-latency voice AI at scale&lt;/a&gt;&lt;/p&gt;

    &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/webrtc"&gt;webrtc&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/openai"&gt;openai&lt;/a&gt;&lt;/p&gt;



</summary><category term="webrtc"/><category term="openai"/></entry><entry><title>Quoting John Gruber</title><link href="https://simonwillison.net/2026/May/5/john-gruber/#atom-tag" rel="alternate"/><published>2026-05-05T00:46:29+00:00</published><updated>2026-05-05T00:46:29+00:00</updated><id>https://simonwillison.net/2026/May/5/john-gruber/#atom-tag</id><summary type="html">
    &lt;blockquote cite="https://daringfireball.net/2026/05/y_combinators_stake_in_openai"&gt;&lt;p&gt;So it’s well known that Y Combinator owns &lt;em&gt;some&lt;/em&gt; stake in OpenAI. But how big is that stake? This seems like devilishly difficult information to obtain. I asked around and a little birdie who knows several OpenAI investors came back with an answer: Y Combinator owns about 0.6 percent of OpenAI. At OpenAI’s current &lt;a href="https://openai.com/index/accelerating-the-next-phase-ai/"&gt;$852 billion valuation&lt;/a&gt;, that’s worth over $5 billion.&lt;/p&gt;&lt;/blockquote&gt;
&lt;p class="cite"&gt;&amp;mdash; &lt;a href="https://daringfireball.net/2026/05/y_combinators_stake_in_openai"&gt;John Gruber&lt;/a&gt;, Y Combinator’s Stake in OpenAI&lt;/p&gt;

    &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/openai"&gt;openai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/y-combinator"&gt;y-combinator&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/ai"&gt;ai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/john-gruber"&gt;john-gruber&lt;/a&gt;&lt;/p&gt;



</summary><category term="openai"/><category term="y-combinator"/><category term="ai"/><category term="john-gruber"/></entry><entry><title>Codex CLI 0.128.0 adds /goal</title><link href="https://simonwillison.net/2026/Apr/30/codex-goals/#atom-tag" rel="alternate"/><published>2026-04-30T23:23:17+00:00</published><updated>2026-04-30T23:23:17+00:00</updated><id>https://simonwillison.net/2026/Apr/30/codex-goals/#atom-tag</id><summary type="html">
    
&lt;p&gt;&lt;strong&gt;&lt;a href="https://github.com/openai/codex/releases/tag/rust-v0.128.0"&gt;Codex CLI 0.128.0 adds /goal&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;
The latest version of OpenAI's Codex CLI coding agent adds their own version of the &lt;a href="https://ghuntley.com/ralph/"&gt;Ralph loop&lt;/a&gt;: you can now set a &lt;code&gt;/goal&lt;/code&gt; and Codex will keep on looping until it evaluates that the goal has been completed... or the configured token budget has been exhausted.&lt;/p&gt;
&lt;p&gt;It looks like the feature is mainly implemented though the &lt;a href="https://github.com/openai/codex/blob/6014b6679ffbd92eeddffa3ad7b4402be6a7fefe/codex-rs/core/templates/goals/continuation.md"&gt;goals/continuation.md&lt;/a&gt; and &lt;a href="https://github.com/openai/codex/blob/6014b6679ffbd92eeddffa3ad7b4402be6a7fefe/codex-rs/core/templates/goals/budget_limit.md"&gt;goals/budget_limit.md&lt;/a&gt; prompts, which are automatically injected at the end of a turn.

    &lt;p&gt;&lt;small&gt;&lt;/small&gt;Via &lt;a href="https://twitter.com/fcoury/status/2049917871799636201"&gt;@fcoury&lt;/a&gt;&lt;/small&gt;&lt;/p&gt;


    &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/codex"&gt;codex&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/coding-agents"&gt;coding-agents&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/generative-ai"&gt;generative-ai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/openai"&gt;openai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/agentic-engineering"&gt;agentic-engineering&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/ai"&gt;ai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/llms"&gt;llms&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/system-prompts"&gt;system-prompts&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/prompt-engineering"&gt;prompt-engineering&lt;/a&gt;&lt;/p&gt;



</summary><category term="codex"/><category term="coding-agents"/><category term="generative-ai"/><category term="openai"/><category term="agentic-engineering"/><category term="ai"/><category term="llms"/><category term="system-prompts"/><category term="prompt-engineering"/></entry><entry><title>Our evaluation of OpenAI's GPT-5.5 cyber capabilities</title><link href="https://simonwillison.net/2026/Apr/30/gpt-55-cyber-capabilities/#atom-tag" rel="alternate"/><published>2026-04-30T23:03:24+00:00</published><updated>2026-04-30T23:03:24+00:00</updated><id>https://simonwillison.net/2026/Apr/30/gpt-55-cyber-capabilities/#atom-tag</id><summary type="html">
    
&lt;p&gt;&lt;strong&gt;&lt;a href="https://www.aisi.gov.uk/blog/our-evaluation-of-openais-gpt-5-5-cyber-capabilities"&gt;Our evaluation of OpenAI&amp;#x27;s GPT-5.5 cyber capabilities&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;
The UK's AI Security Institute &lt;a href="https://www.aisi.gov.uk/blog/our-evaluation-of-claude-mythos-previews-cyber-capabilities"&gt;previously evaluated Claude Mythos&lt;/a&gt;: now they've evaluated GPT-5.5 for finding security vulnerability and found it to be comparable to Mythos, but unlike Mythos it's generally available right now.


    &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/anthropic"&gt;anthropic&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/claude"&gt;claude&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/openai"&gt;openai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/ai"&gt;ai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/llms"&gt;llms&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/generative-ai"&gt;generative-ai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/ai-security-research"&gt;ai-security-research&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/gpt"&gt;gpt&lt;/a&gt;&lt;/p&gt;



</summary><category term="anthropic"/><category term="claude"/><category term="openai"/><category term="ai"/><category term="llms"/><category term="generative-ai"/><category term="ai-security-research"/><category term="gpt"/></entry><entry><title>Quoting OpenAI Codex base_instructions</title><link href="https://simonwillison.net/2026/Apr/28/openai-codex/#atom-tag" rel="alternate"/><published>2026-04-28T22:02:53+00:00</published><updated>2026-04-28T22:02:53+00:00</updated><id>https://simonwillison.net/2026/Apr/28/openai-codex/#atom-tag</id><summary type="html">
    &lt;blockquote cite="https://github.com/openai/codex/blob/66b0781502be5de3b1909525c987643b9e5e407d/codex-rs/models-manager/models.json#L55"&gt;&lt;p&gt;&lt;code&gt;Never talk about goblins, gremlins, raccoons, trolls, ogres, pigeons, or other animals or creatures unless it is absolutely and unambiguously relevant to the user's query.&lt;/code&gt;&lt;/p&gt;&lt;/blockquote&gt;
&lt;p class="cite"&gt;&amp;mdash; &lt;a href="https://github.com/openai/codex/blob/66b0781502be5de3b1909525c987643b9e5e407d/codex-rs/models-manager/models.json#L55"&gt;OpenAI Codex base_instructions&lt;/a&gt;, for GPT-5.5&lt;/p&gt;

    &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/openai"&gt;openai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/ai"&gt;ai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/llms"&gt;llms&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/system-prompts"&gt;system-prompts&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/prompt-engineering"&gt;prompt-engineering&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/codex"&gt;codex&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/generative-ai"&gt;generative-ai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/gpt"&gt;gpt&lt;/a&gt;&lt;/p&gt;



</summary><category term="openai"/><category term="ai"/><category term="llms"/><category term="system-prompts"/><category term="prompt-engineering"/><category term="codex"/><category term="generative-ai"/><category term="gpt"/></entry><entry><title>Tracking the history of the now-deceased OpenAI Microsoft AGI clause</title><link href="https://simonwillison.net/2026/Apr/27/now-deceased-agi-clause/#atom-tag" rel="alternate"/><published>2026-04-27T18:38:17+00:00</published><updated>2026-04-27T18:38:17+00:00</updated><id>https://simonwillison.net/2026/Apr/27/now-deceased-agi-clause/#atom-tag</id><summary type="html">
    &lt;p&gt;For many years, Microsoft and OpenAI's relationship has included a weird clause saying that, should AGI be achieved, Microsoft's commercial IP rights to OpenAI's technology would be null and void. That clause appeared to end today. I decided to try and track its expression over time on &lt;a href="https://openai.com/"&gt;openai.com&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;OpenAI, July 22nd 2019 in &lt;a href="https://openai.com/index/microsoft-invests-in-and-partners-with-openai/"&gt;Microsoft invests in and partners with OpenAI to support us building beneficial AGI&lt;/a&gt; (emphasis mine):&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;OpenAI is producing a sequence of increasingly powerful AI technologies, which requires a lot of capital for computational power. The most obvious way to cover costs is to build a product, but that would mean changing our focus. Instead, we intend to license &lt;strong&gt;some of our pre-AGI technologies&lt;/strong&gt;, with Microsoft becoming our preferred partner for commercializing them.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;But what &lt;em&gt;is&lt;/em&gt; AGI? The &lt;a href="https://openai.com/charter/"&gt;OpenAI Charter&lt;/a&gt; was first published in April 2018 and has remained unchanged at least since this &lt;a href="https://web.archive.org/web/20190311213352/https://openai.com/charter/"&gt;March 11th 2019 archive.org capture&lt;/a&gt;:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;OpenAI’s mission is to ensure that artificial general intelligence (AGI)—by which we mean highly autonomous systems that outperform humans at most economically valuable work—benefits all of humanity.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;Here's the problem: if you're going to sign an agreement with Microsoft that is dependent on knowing when "AGI" has been achieved, you need something a little more concrete.&lt;/p&gt;
&lt;p&gt;In December 2024 &lt;a href="https://www.theinformation.com/articles/microsoft-and-openai-wrangle-over-terms-of-their-blockbuster-partnership"&gt;The Information reported the details&lt;/a&gt; (summarized here outside of their paywall &lt;a href="https://techcrunch.com/2024/12/26/microsoft-and-openai-have-a-financial-definition-of-agi-report/"&gt;by TechCrunch&lt;/a&gt;):&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;Last year’s agreement between Microsoft and OpenAI, which hasn’t been disclosed, said AGI would be achieved only when OpenAI has developed systems that have the ability to generate the maximum total profits to which its earliest investors, including Microsoft, are entitled, according to documents OpenAI distributed to investors. Those profits total about $100 billion, the documents showed.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;So AGI is now whenever OpenAI's systems are capable of generating $100 billion in profit?&lt;/p&gt;
&lt;p&gt;In October 2025 the process changed to being judged by an "independent expert panel". In &lt;a href="https://openai.com/index/next-chapter-of-microsoft-openai-partnership/"&gt;The next chapter of the Microsoft–OpenAI partnership&lt;/a&gt;:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;The agreement preserves key elements that have fueled this successful partnership—meaning OpenAI remains Microsoft’s frontier model partner and Microsoft continues to have exclusive IP rights and Azure API exclusivity until Artificial General Intelligence (AGI). [...]&lt;/p&gt;
&lt;p&gt;Once AGI is declared by OpenAI, that declaration will now be verified by an independent expert panel. [...]&lt;/p&gt;
&lt;p&gt;Microsoft’s IP rights to research, defined as the confidential methods used in the development of models and systems, will remain until either the expert panel verifies AGI or through 2030, whichever is first.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;OpenAI on February 27th, 2026 in &lt;a href="https://openai.com/index/continuing-microsoft-partnership/"&gt;Joint Statement from OpenAI and Microsoft&lt;/a&gt;:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;AGI definition and processes are unchanged&lt;/strong&gt;. The contractual definition of AGI and the process for determining if it has been achieved remains the same.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;OpenAI today, April 27th 2026 in &lt;a href="https://openai.com/index/next-phase-of-microsoft-partnership/"&gt;The next phase of the Microsoft OpenAI partnership&lt;/a&gt; (emphasis mine):&lt;/p&gt;
&lt;blockquote&gt;
&lt;ul&gt;
&lt;li&gt;Microsoft will continue to have a license to OpenAI IP for models and products through 2032.  Microsoft’s license will now be non-exclusive.&lt;/li&gt;
&lt;li&gt;Microsoft will no longer pay a revenue share to OpenAI.&lt;/li&gt;
&lt;li&gt;Revenue share payments from OpenAI to Microsoft continue through 2030, &lt;strong&gt;independent of OpenAI’s technology progress&lt;/strong&gt;, at the same percentage but subject to a total cap.&lt;/li&gt;
&lt;/ul&gt;
&lt;/blockquote&gt;
&lt;p&gt;As far as I can tell "independent of OpenAI’s technology progress" is a declaration that the AGI clause is now dead. Here's The Verge coming to the same conclusion: &lt;a href="https://www.theverge.com/ai-artificial-intelligence/918981/openai-microsoft-renegotiate-contract"&gt;The AGI clause is dead&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;My all-time favorite commentary on OpenAI's approach to AGI remains this 2023 hypothetical &lt;a href="https://www.bloomberg.com/opinion/articles/2023-11-20/who-controls-openai"&gt;by Matt Levine&lt;/a&gt;:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;And the investors wailed and gnashed their teeth but it’s true, that is what they agreed to, and they had no legal recourse. And OpenAI’s new CEO, and its nonprofit board, cut them a check for their capped return and said “bye” and went back to running OpenAI for the benefit of humanity. It turned out that a benign, carefully governed artificial superintelligence is really good for humanity, and OpenAI quickly solved all of humanity’s problems and ushered in an age of peace and abundance in which nobody wanted for anything or needed any Microsoft products. And capitalism came to an end.&lt;/p&gt;
&lt;/blockquote&gt;
    
        &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/openai"&gt;openai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/ai"&gt;ai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/microsoft"&gt;microsoft&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/computer-history"&gt;computer-history&lt;/a&gt;&lt;/p&gt;
    

</summary><category term="openai"/><category term="ai"/><category term="microsoft"/><category term="computer-history"/></entry><entry><title>Quoting Romain Huet</title><link href="https://simonwillison.net/2026/Apr/25/romain-huet/#atom-tag" rel="alternate"/><published>2026-04-25T12:06:55+00:00</published><updated>2026-04-25T12:06:55+00:00</updated><id>https://simonwillison.net/2026/Apr/25/romain-huet/#atom-tag</id><summary type="html">
    &lt;blockquote cite="https://twitter.com/romainhuet/status/2047955381578838357"&gt;&lt;p&gt;Since GPT-5.4, we’ve unified Codex and the main model into a single system, so there’s no separate coding line anymore.&lt;/p&gt;
&lt;p&gt;GPT-5.5 takes this further, with strong gains in agentic coding, computer use, and any task on a computer.&lt;/p&gt;&lt;/blockquote&gt;
&lt;p class="cite"&gt;&amp;mdash; &lt;a href="https://twitter.com/romainhuet/status/2047955381578838357"&gt;Romain Huet&lt;/a&gt;, confirming OpenAI won't release a GPT-5.5-Codex model&lt;/p&gt;

    &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/generative-ai"&gt;generative-ai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/gpt"&gt;gpt&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/openai"&gt;openai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/ai"&gt;ai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/llms"&gt;llms&lt;/a&gt;&lt;/p&gt;



</summary><category term="generative-ai"/><category term="gpt"/><category term="openai"/><category term="ai"/><category term="llms"/></entry><entry><title>GPT-5.5 prompting guide</title><link href="https://simonwillison.net/2026/Apr/25/gpt-5-5-prompting-guide/#atom-tag" rel="alternate"/><published>2026-04-25T04:13:36+00:00</published><updated>2026-04-25T04:13:36+00:00</updated><id>https://simonwillison.net/2026/Apr/25/gpt-5-5-prompting-guide/#atom-tag</id><summary type="html">
    
&lt;p&gt;&lt;strong&gt;&lt;a href="https://developers.openai.com/api/docs/guides/prompt-guidance?model=gpt-5.5"&gt;GPT-5.5 prompting guide&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;
Now that GPT-5.5 is &lt;a href="https://developers.openai.com/api/docs/models/gpt-5.5"&gt;available in the API&lt;/a&gt;, OpenAI have released a wealth of useful tips on how best to prompt the new model.&lt;/p&gt;
&lt;p&gt;Here's a neat trick they recommend for applications that might spend considerable time thinking before returning a user-visible response:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;&lt;code&gt;Before any tool calls for a multi-step task, send a short user-visible update that acknowledges the request and states the first step. Keep it to one or two sentences.&lt;/code&gt;&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;I've already noticed their Codex app doing this, and it does make longer running tasks feel less like the model has crashed.&lt;/p&gt;
&lt;p&gt;OpenAI suggest running the following in Codex to upgrade your existing code using advice embedded in their &lt;code&gt;openai-docs&lt;/code&gt; skill:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;&lt;code&gt;$openai-docs migrate this project to gpt-5.5&lt;/code&gt;&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;The upgrade guide the coding agent will follow &lt;a href="https://github.com/openai/skills/blob/724cd511c96593f642bddf13187217aa155d2554/skills/.curated/openai-docs/references/upgrade-guide.md#model-string--light-prompt-rewrite"&gt;is this one&lt;/a&gt;, which even includes light instructions on how to rewrite prompts to better fit the model.&lt;/p&gt;
&lt;p&gt;Also relevant is the &lt;a href="https://developers.openai.com/api/docs/guides/latest-model"&gt;Using GPT-5.5 guide&lt;/a&gt;, which opens with this warning:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;To get the most out of GPT-5.5, treat it as a new model family to tune for, not a drop-in replacement for &lt;code&gt;gpt-5.2&lt;/code&gt; or &lt;code&gt;gpt-5.4&lt;/code&gt;. Begin migration with a fresh baseline instead of carrying over every instruction from an older prompt stack. Start with the smallest prompt that preserves the product contract, then tune reasoning effort, verbosity, tool descriptions, and output format against representative examples.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;Interesting to see OpenAI recommend starting from scratch rather than trusting that existing prompts optimized for previous models will continue to work effectively with GPT-5.5.


    &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/codex"&gt;codex&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/prompt-engineering"&gt;prompt-engineering&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/generative-ai"&gt;generative-ai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/gpt"&gt;gpt&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/openai"&gt;openai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/ai"&gt;ai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/llms"&gt;llms&lt;/a&gt;&lt;/p&gt;



</summary><category term="codex"/><category term="prompt-engineering"/><category term="generative-ai"/><category term="gpt"/><category term="openai"/><category term="ai"/><category term="llms"/></entry><entry><title>llm 0.31</title><link href="https://simonwillison.net/2026/Apr/24/llm/#atom-tag" rel="alternate"/><published>2026-04-24T23:35:07+00:00</published><updated>2026-04-24T23:35:07+00:00</updated><id>https://simonwillison.net/2026/Apr/24/llm/#atom-tag</id><summary type="html">
    
        &lt;p&gt;&lt;strong&gt;Release:&lt;/strong&gt; &lt;a href="https://github.com/simonw/llm/releases/tag/0.31"&gt;llm 0.31&lt;/a&gt;&lt;/p&gt;
        &lt;blockquote&gt;
&lt;ul&gt;
&lt;li&gt;New GPT-5.5 OpenAI model: &lt;code&gt;llm -m gpt-5.5&lt;/code&gt;. &lt;a href="https://github.com/simonw/llm/issues/1418"&gt;#1418&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;New option to set the &lt;a href="https://developers.openai.com/cookbook/examples/gpt-5/gpt-5_new_params_and_tools#1-verbosity-parameter"&gt;text verbosity level&lt;/a&gt; for GPT-5+ OpenAI models: &lt;code&gt;-o verbosity low&lt;/code&gt;. Values are &lt;code&gt;low&lt;/code&gt;, &lt;code&gt;medium&lt;/code&gt;, &lt;code&gt;high&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;New option for setting the &lt;a href="https://developers.openai.com/api/docs/guides/images-vision#choose-an-image-detail-level"&gt;image detail level&lt;/a&gt; used for image attachments to OpenAI models: &lt;code&gt;-o image_detail low&lt;/code&gt; - values are &lt;code&gt;low&lt;/code&gt;, &lt;code&gt;high&lt;/code&gt; and &lt;code&gt;auto&lt;/code&gt;, and GPT-5.4 and 5.5 also accept &lt;code&gt;original&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;Models listed in &lt;code&gt;extra-openai-models.yaml&lt;/code&gt; are now also registered as asynchronous. &lt;a href="https://github.com/simonw/llm/issues/1395"&gt;#1395&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/blockquote&gt;
    
    
        &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/gpt"&gt;gpt&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/openai"&gt;openai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/llm"&gt;llm&lt;/a&gt;&lt;/p&gt;
    

</summary><category term="gpt"/><category term="openai"/><category term="llm"/></entry><entry><title>A pelican for GPT-5.5 via the semi-official Codex backdoor API</title><link href="https://simonwillison.net/2026/Apr/23/gpt-5-5/#atom-tag" rel="alternate"/><published>2026-04-23T19:59:47+00:00</published><updated>2026-04-23T19:59:47+00:00</updated><id>https://simonwillison.net/2026/Apr/23/gpt-5-5/#atom-tag</id><summary type="html">
    &lt;p&gt;&lt;a href="https://openai.com/index/introducing-gpt-5-5/"&gt;GPT-5.5 is out&lt;/a&gt;. It's available in OpenAI Codex and is rolling out to paid ChatGPT subscribers. I've had some preview access and found it to be a fast, effective and highly capable model. As is usually the case these days, it's hard to put into words what's good about it - I ask it to build things and it builds exactly what I ask for!&lt;/p&gt;
&lt;p&gt;There's one notable omission from today's release - the API:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;API deployments require different safeguards and we are working closely with partners and customers on the safety and security requirements for serving it at scale. We'll bring GPT‑5.5 and GPT‑5.5 Pro to the API very soon.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;When I run my &lt;a href="https://simonwillison.net/tags/pelican-riding-a-bicycle/"&gt;pelican benchmark&lt;/a&gt; I always prefer to use an API, to avoid hidden system prompts in ChatGPT or other agent harnesses from impacting the results.&lt;/p&gt;
&lt;h4 id="the-openclaw-backdoor"&gt;The OpenClaw backdoor&lt;/h4&gt;
&lt;p&gt;One of the ongoing tension points in the AI world over the past few months has concerned how agent harnesses like OpenClaw and Pi interact with the APIs provided by the big providers.&lt;/p&gt;
&lt;p&gt;Both OpenAI and Anthropic offer popular monthly subscriptions which provide access to their models at a significant discount to their raw API.&lt;/p&gt;
&lt;p&gt;OpenClaw integrated directly with this mechanism, and was then &lt;a href="https://www.theverge.com/ai-artificial-intelligence/907074/anthropic-openclaw-claude-subscription-ban"&gt;blocked from doing so&lt;/a&gt; by Anthropic. This kicked off a whole thing. OpenAI - who recently hired OpenClaw creator Peter Steinberger - saw an opportunity for an easy karma win and announced that OpenClaw was welcome to continue integrating with OpenAI's subscriptions via the same mechanism used by their (open source) Codex CLI tool.&lt;/p&gt;
&lt;p&gt;Does this mean &lt;em&gt;anyone&lt;/em&gt; can write code that integrates with OpenAI's Codex-specific APIs to hook into those existing subscriptions?&lt;/p&gt;
&lt;p&gt;The other day &lt;a href="https://twitter.com/jeremyphoward/status/2046537816834965714"&gt;Jeremy Howard asked&lt;/a&gt;:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;Anyone know whether OpenAI officially supports the use of the &lt;code&gt;/backend-api/codex/responses&lt;/code&gt; endpoint that Pi and Opencode (IIUC) uses?&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;It turned out that on March 30th OpenAI's Romain Huet &lt;a href="https://twitter.com/romainhuet/status/2038699202834841962"&gt;had tweeted&lt;/a&gt;:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;We want people to be able to use Codex, and their ChatGPT subscription, wherever they like! That means in the app, in the terminal, but also in JetBrains, Xcode, OpenCode, Pi, and now Claude Code.&lt;/p&gt;
&lt;p&gt;That’s why Codex CLI and Codex app server are open source too! 🙂&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;And Peter Steinberger &lt;a href="https://twitter.com/steipete/status/2046775849769148838"&gt;replied to Jeremy&lt;/a&gt; that:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;OpenAI sub is officially supported.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;h4 id="llm-openai-via-codex"&gt;llm-openai-via-codex&lt;/h4&gt;
&lt;p&gt;So... I had Claude Code reverse-engineer the &lt;a href="https://github.com/openai/codex"&gt;openai/codex&lt;/a&gt; repo, figure out how authentication tokens were stored and build me &lt;a href="https://github.com/simonw/llm-openai-via-codex"&gt;llm-openai-via-codex&lt;/a&gt;, a new plugin for &lt;a href="https://llm.datasette.io/"&gt;LLM&lt;/a&gt; which picks up your existing Codex subscription and uses it to run prompts!&lt;/p&gt;
&lt;p&gt;(With hindsight I wish I'd used GPT-5.4 or the GPT-5.5 preview, it would have been funnier. I genuinely considered rewriting the project from scratch using Codex and GPT-5.5 for the sake of the joke, but decided not to spend any more time on this!)&lt;/p&gt;
&lt;p&gt;Here's how to use it:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;Install Codex CLI, buy an OpenAI plan, login to Codex&lt;/li&gt;
&lt;li&gt;Install LLM: &lt;code&gt;uv tool install llm&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;Install the new plugin: &lt;code&gt;llm install llm-openai-via-codex&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;Start prompting: &lt;code&gt;llm -m openai-codex/gpt-5.5 'Your prompt goes here'&lt;/code&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;All existing LLM features should also work - use &lt;code&gt;-a filepath.jpg/URL&lt;/code&gt; to attach an image, &lt;code&gt;llm chat -m openai-codex/gpt-5.5&lt;/code&gt; to start an ongoing chat, &lt;code&gt;llm logs&lt;/code&gt; to view logged conversations and &lt;code&gt;llm --tool ...&lt;/code&gt; to &lt;a href="https://llm.datasette.io/en/stable/tools.html"&gt;try it out with tool support&lt;/a&gt;.&lt;/p&gt;
&lt;h4 id="and-some-pelicans"&gt;And some pelicans&lt;/h4&gt;
&lt;p&gt;Let's generate a pelican!&lt;/p&gt;
&lt;div class="highlight highlight-source-shell"&gt;&lt;pre&gt;llm install llm-openai-via-codex
llm -m openai-codex/gpt-5.5 &lt;span class="pl-s"&gt;&lt;span class="pl-pds"&gt;'&lt;/span&gt;Generate an SVG of a pelican riding a bicycle&lt;span class="pl-pds"&gt;'&lt;/span&gt;&lt;/span&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;Here's &lt;a href="https://gist.github.com/simonw/edda1d98f7ba07fd95eeff473cb16634"&gt;what I got back&lt;/a&gt;:&lt;/p&gt;
&lt;p&gt;&lt;img src="https://static.simonwillison.net/static/2026/gpt-5.5-pelican.png" alt="It is a bit mangled to be honest - good beak, pelican body shapes are slightly weird, legs do at least extend to the pedals, bicycle frame is not quite right." style="max-width: 100%;" /&gt;&lt;/p&gt;
&lt;p&gt;I've seen better &lt;a href="https://simonwillison.net/2026/Mar/17/mini-and-nano/#pelicans"&gt;from GPT-5.4&lt;/a&gt;, so I tagged on &lt;code&gt;-o reasoning_effort xhigh&lt;/code&gt; and &lt;a href="https://gist.github.com/simonw/a6168e4165a258e4d664aeae8e602cc5"&gt;tried again&lt;/a&gt;:&lt;/p&gt;
&lt;p&gt;That one took almost four minutes to generate, but I think it's a much better effort.&lt;/p&gt;
&lt;p&gt;&lt;img src="https://static.simonwillison.net/static/2026/gpt-5.5-pelican-xhigh.png" alt="Pelican has gradients now, body is much better put together, bicycle is nearly the right shape albeit with one extra bar between pedals and front wheel, clearly a better image overall." style="max-width: 100%;" /&gt;&lt;/p&gt;
&lt;p&gt;If you compare the SVG code (&lt;a href="https://gist.github.com/simonw/edda1d98f7ba07fd95eeff473cb16634#response"&gt;default&lt;/a&gt;, &lt;a href="https://gist.github.com/simonw/a6168e4165a258e4d664aeae8e602cc5#response"&gt;xhigh&lt;/a&gt;) the &lt;code&gt;xhigh&lt;/code&gt; one took a very different approach, which is much more CSS-heavy - as demonstrated by those gradients. &lt;code&gt;xhigh&lt;/code&gt; used 9,322 reasoning tokens where the default used just 39.&lt;/p&gt;
&lt;h4 id="a-few-more-notes-on-gpt-5-5"&gt;A few more notes on GPT-5.5&lt;/h4&gt;
&lt;p&gt;One of the most notable things about GPT-5.5 is the pricing. Once it goes live in the API it's &lt;a href="https://openai.com/index/introducing-gpt-5-5/#availability-and-pricing"&gt;going to be priced&lt;/a&gt; at &lt;em&gt;twice&lt;/em&gt; the cost of GPT-5.4 - $5 per 1M input tokens and $30 per 1M output tokens, where 5.4 is $2.5 and $15.&lt;/p&gt;
&lt;p&gt;GPT-5.5 Pro will be even more: $30 per 1M input tokens and $180 per 1M output tokens.&lt;/p&gt;
&lt;p&gt;GPT-5.4 will remain available. At half the price of 5.5 this feels like 5.4 is to 5.5 as Claude Sonnet is to Claude Opus.&lt;/p&gt;
&lt;p&gt;Ethan Mollick has a &lt;a href="https://www.oneusefulthing.org/p/sign-of-the-future-gpt-55"&gt;detailed review of GPT-5.5&lt;/a&gt; where he put it (and GPT-5.5 Pro) through an array of interesting challenges. His verdict: the jagged frontier continues to hold, with GPT-5.5 excellent at some things and challenged by others in a way that remains difficult to predict.&lt;/p&gt;
    
        &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/llm"&gt;llm&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/openai"&gt;openai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/pelican-riding-a-bicycle"&gt;pelican-riding-a-bicycle&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/llm-reasoning"&gt;llm-reasoning&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/ai"&gt;ai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/llms"&gt;llms&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/llm-release"&gt;llm-release&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/codex"&gt;codex&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/generative-ai"&gt;generative-ai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/chatgpt"&gt;chatgpt&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/gpt"&gt;gpt&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/llm-pricing"&gt;llm-pricing&lt;/a&gt;&lt;/p&gt;
    

</summary><category term="llm"/><category term="openai"/><category term="pelican-riding-a-bicycle"/><category term="llm-reasoning"/><category term="ai"/><category term="llms"/><category term="llm-release"/><category term="codex"/><category term="generative-ai"/><category term="chatgpt"/><category term="gpt"/><category term="llm-pricing"/></entry><entry><title>llm-openai-via-codex 0.1a0</title><link href="https://simonwillison.net/2026/Apr/23/llm-openai-via-codex/#atom-tag" rel="alternate"/><published>2026-04-23T19:22:29+00:00</published><updated>2026-04-23T19:22:29+00:00</updated><id>https://simonwillison.net/2026/Apr/23/llm-openai-via-codex/#atom-tag</id><summary type="html">
    
        &lt;p&gt;&lt;strong&gt;Release:&lt;/strong&gt; &lt;a href="https://github.com/simonw/llm-openai-via-codex/releases/tag/0.1a0"&gt;llm-openai-via-codex 0.1a0&lt;/a&gt;&lt;/p&gt;
        &lt;p&gt;Hijacks your Codex CLI credentials to make API calls with LLM, as described &lt;a href="https://simonwillison.net/2026/Apr/23/gpt-5-5/#llm-openai-via-codex"&gt;in my post about GPT-5.5&lt;/a&gt;.&lt;/p&gt;
    
    
        &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/openai"&gt;openai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/llm"&gt;llm&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/codex"&gt;codex&lt;/a&gt;&lt;/p&gt;
    

</summary><category term="openai"/><category term="llm"/><category term="codex"/></entry><entry><title>Where's the raccoon with the ham radio? (ChatGPT Images 2.0)</title><link href="https://simonwillison.net/2026/Apr/21/gpt-image-2/#atom-tag" rel="alternate"/><published>2026-04-21T20:32:24+00:00</published><updated>2026-04-21T20:32:24+00:00</updated><id>https://simonwillison.net/2026/Apr/21/gpt-image-2/#atom-tag</id><summary type="html">
    &lt;p&gt;OpenAI &lt;a href="https://openai.com/index/introducing-chatgpt-images-2-0/"&gt;released ChatGPT Images 2.0 today&lt;/a&gt;, their latest image generation model. On &lt;a href="https://www.youtube.com/watch?v=sWkGomJ3TLI"&gt;the livestream&lt;/a&gt; Sam Altman said that the leap from gpt-image-1 to gpt-image-2 was equivalent to jumping from GPT-3 to GPT-5. Here's how I put it to the test.&lt;/p&gt;
&lt;p&gt;My prompt:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;&lt;code&gt;Do a where's Waldo style image but it's where is the raccoon holding a ham radio&lt;/code&gt;&lt;/p&gt;
&lt;/blockquote&gt;
&lt;h4 id="gpt-image-1"&gt;gpt-image-1&lt;/h4&gt;
&lt;p&gt;First as a baseline here's what I got from the older gpt-image-1 using ChatGPT directly:&lt;/p&gt;
&lt;p&gt;&lt;a href="https://static.simonwillison.net/static/2026/chatgpt-image-1-ham-radio.png"&gt;&lt;img loading="lazy" src="https://static.simonwillison.net/static/2026/image_crop_1402x1122_w1402_q0.3.jpg" alt="There's a lot going on, but I couldn't find a raccoon." style="max-width: 100%;" /&gt;&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;I wasn't able to spot the raccoon - I quickly realized that testing image generation models on Where's Waldo style images (Where's Wally in the UK) can be pretty frustrating!&lt;/p&gt;
&lt;p&gt;I tried &lt;a href="https://claude.ai/share/bd6e9b88-29a9-420b-8ac1-3ac5cebac215"&gt;getting Claude Opus 4.7&lt;/a&gt; with its new higher resolution inputs to solve it but it was convinced there was a raccoon it couldn't find thanks to the instruction card at the top left of the image:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Yes — there's at least one raccoon in the picture, but it's very well hidden&lt;/strong&gt;. In my careful sweep through zoomed-in sections, honestly, I couldn't definitively spot a raccoon holding a ham radio. [...]&lt;/p&gt;
&lt;/blockquote&gt;
&lt;h4 id="nano-banana-2-and-pro"&gt;Nano Banana 2 and Pro&lt;/h4&gt;
&lt;p&gt;Next I tried Google's Nano Banana 2, &lt;a href="https://gemini.google.com/share/3775db96c576"&gt;via Gemini&lt;/a&gt;:&lt;/p&gt;
&lt;p&gt;&lt;a href="https://static.simonwillison.net/static/2026/nano-banana-2-ham-radio.jpg"&gt;&lt;img loading="lazy" src="https://static.simonwillison.net/static/2026/gemini-ham-radio-small.jpg" alt="Busy Where's Waldo-style illustration of a park festival with crowds of people, tents labeled &amp;quot;FOOD &amp;amp; DRINK&amp;quot;, &amp;quot;CRAFT FAIR&amp;quot;, &amp;quot;BOOK NOOK&amp;quot;, &amp;quot;MUSIC FEST&amp;quot;, and &amp;quot;AMATEUR RADIO CLUB - W6HAM&amp;quot; (featuring a raccoon in a red hat at the radio table), plus a Ferris wheel, carousel, gazebo with band, pond with boats, fountain, food trucks, and striped circus tents" style="max-width: 100%;" /&gt;&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;That one was pretty obvious, the raccoon is in the "Amateur Radio Club" booth in the center of the image!&lt;/p&gt;
&lt;p&gt;Claude said:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;Honestly, this one wasn't really hiding — he's the star of the booth. Feels like the illustrator took pity on us after that last impossible scene. The little "W6HAM" callsign pun on the booth sign is a nice touch too.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;I also tried Nano Banana Pro &lt;a href="https://aistudio.google.com/app/prompts?state=%7B%22ids%22:%5B%221sGU5A7mrngkfLfSEU84xaV1DhtOTnS--%22%5D,%22action%22:%22open%22,%22userId%22:%22106366615678321494423%22,%22resourceKeys%22:%7B%7D%7D&amp;amp;usp=sharing"&gt;in AI Studio&lt;/a&gt; and got this, by far the worst result from any model. Not sure what went wrong here!&lt;/p&gt;
&lt;p&gt;&lt;a href="https://static.simonwillison.net/static/2026/nano-banana-pro-ham-radio.jpg"&gt;&lt;img loading="lazy" src="https://static.simonwillison.net/static/2026/nano-banana-pro-ham-radio-small.jpg" alt="The raccoon is larger than everyone else, right in the middle of the image with an ugly white border around it." style="max-width: 100%;" /&gt;&lt;/a&gt;&lt;/p&gt;
&lt;h4 id="gpt-image-2"&gt;gpt-image-2&lt;/h4&gt;
&lt;p&gt;With the baseline established, let's try out the new model.&lt;/p&gt;
&lt;p&gt;I used an updated version of my &lt;a href="https://github.com/simonw/tools/blob/main/python/openai_image.py"&gt;openai_image.py&lt;/a&gt; script, which is a thin wrapper around the &lt;a href="https://github.com/openai/openai-python"&gt;OpenAI Python&lt;/a&gt; client library. Their client library hasn't yet been updated to include &lt;code&gt;gpt-image-2&lt;/code&gt; but thankfully it doesn't validate the model ID so you can use it anyway.&lt;/p&gt;
&lt;p&gt;Here's how I ran that:&lt;/p&gt;
&lt;div class="highlight highlight-source-shell"&gt;&lt;pre&gt;OPENAI_API_KEY=&lt;span class="pl-s"&gt;&lt;span class="pl-pds"&gt;"&lt;/span&gt;&lt;span class="pl-s"&gt;&lt;span class="pl-pds"&gt;$(&lt;/span&gt;llm keys get openai&lt;span class="pl-pds"&gt;)&lt;/span&gt;&lt;/span&gt;&lt;span class="pl-pds"&gt;"&lt;/span&gt;&lt;/span&gt; \
  uv run https://tools.simonwillison.net/python/openai_image.py \
  -m gpt-image-2 \
  &lt;span class="pl-s"&gt;&lt;span class="pl-pds"&gt;"&lt;/span&gt;Do a where's Waldo style image but it's where is the raccoon holding a ham radio&lt;span class="pl-pds"&gt;"&lt;/span&gt;&lt;/span&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;Here's what I got back. I don't &lt;em&gt;think&lt;/em&gt; there's a raccoon in there - I couldn't spot one, and neither could Claude.&lt;/p&gt;
&lt;p&gt;&lt;a href="https://static.simonwillison.net/static/2026/gpt-image-2-default.png"&gt;&lt;img loading="lazy" src="https://static.simonwillison.net/static/2026/gpt-image-2-default.jpg" alt="Lots of stuff, a ham radio booth, many many people, a lake, but maybe no raccoon?" style="max-width: 100%;" /&gt;&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;The &lt;a href="https://github.com/openai/openai-cookbook/blob/main/examples/multimodal/image-gen-models-prompting-guide.ipynb"&gt;OpenAI image generation cookbook&lt;/a&gt; has been updated with notes on &lt;code&gt;gpt-image-2&lt;/code&gt;, including the &lt;code&gt;outputQuality&lt;/code&gt; setting and available sizes.&lt;/p&gt;
&lt;p&gt;I tried setting &lt;code&gt;outputQuality&lt;/code&gt; to &lt;code&gt;high&lt;/code&gt; and the dimensions to &lt;code&gt;3840x2160&lt;/code&gt; - I believe that's the maximum - and got this - a 17MB PNG which I converted to a 5MB WEBP:&lt;/p&gt;
&lt;div class="highlight highlight-source-shell"&gt;&lt;pre&gt;OPENAI_API_KEY=&lt;span class="pl-s"&gt;&lt;span class="pl-pds"&gt;"&lt;/span&gt;&lt;span class="pl-s"&gt;&lt;span class="pl-pds"&gt;$(&lt;/span&gt;llm keys get openai&lt;span class="pl-pds"&gt;)&lt;/span&gt;&lt;/span&gt;&lt;span class="pl-pds"&gt;"&lt;/span&gt;&lt;/span&gt; \
  uv run &lt;span class="pl-s"&gt;&lt;span class="pl-pds"&gt;'&lt;/span&gt;https://raw.githubusercontent.com/simonw/tools/refs/heads/main/python/openai_image.py&lt;span class="pl-pds"&gt;'&lt;/span&gt;&lt;/span&gt; \
  -m gpt-image-2 &lt;span class="pl-s"&gt;&lt;span class="pl-pds"&gt;"&lt;/span&gt;Do a where's Waldo style image but it's where is the raccoon holding a ham radio&lt;span class="pl-pds"&gt;"&lt;/span&gt;&lt;/span&gt; \
  --quality high --size 3840x2160&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;&lt;a href="https://static.simonwillison.net/static/2026/image-fc93bd-q100.webp"&gt;&lt;img loading="lazy" src="https://static.simonwillison.net/static/2026/image-fc93bd-q100.jpg" alt="Big complex image, lots of detail, good wording, there is indeed a raccoon with a ham radio." style="max-width: 100%;" /&gt;&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;That's pretty great! There's a raccoon with a ham radio in there (bottom left, quite easy to spot).&lt;/p&gt;
&lt;p&gt;The image used 13,342 output tokens, which are charged at $30/million so a total cost of around &lt;a href="https://www.llm-prices.com/#ot=13342&amp;amp;ic=5&amp;amp;cic=1.25&amp;amp;oc=10&amp;amp;sel=gpt-image-2-image"&gt;40 cents&lt;/a&gt;.&lt;/p&gt;
&lt;h4 id="takeaways"&gt;Takeaways&lt;/h4&gt;
&lt;p&gt;I think this new ChatGPT image generation model takes the crown from Gemini, at least for the moment.&lt;/p&gt;
&lt;p&gt;Where's Waldo style images are an infuriating and somewhat foolish way to test these models, but they do help illustrate how good they are getting at complex illustrations combining both text and details.&lt;/p&gt;
&lt;h4 id="update-asking-models-to-solve-this-is-risky"&gt;Update: asking models to solve this is risky&lt;/h4&gt;
&lt;p&gt;rizaco &lt;a href="https://news.ycombinator.com/item?id=47852835#47853561"&gt;on Hacker News&lt;/a&gt; asked ChatGPT to draw a red circle around the raccoon in one of the images in which I had failed to find one. Here's an animated mix of their result and the original image:&lt;/p&gt;
&lt;p&gt;&lt;img loading="lazy" src="https://static.simonwillison.net/static/2026/ham-radio-cheat.gif" alt="The circle appears around a raccoon with a ham radio who is definitely not there in the original image!" style="max-width: 100%;" /&gt;&lt;/p&gt;
&lt;p&gt;Looks like we definitely can't trust these models to usefully solve their own puzzles!&lt;/p&gt;
    
        &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/openai"&gt;openai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/ai"&gt;ai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/llms"&gt;llms&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/text-to-image"&gt;text-to-image&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/llm-release"&gt;llm-release&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/nano-banana"&gt;nano-banana&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/generative-ai"&gt;generative-ai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/chatgpt"&gt;chatgpt&lt;/a&gt;&lt;/p&gt;
    

</summary><category term="openai"/><category term="ai"/><category term="llms"/><category term="text-to-image"/><category term="llm-release"/><category term="nano-banana"/><category term="generative-ai"/><category term="chatgpt"/></entry><entry><title>Trusted access for the next era of cyber defense</title><link href="https://simonwillison.net/2026/Apr/14/trusted-access-openai/#atom-tag" rel="alternate"/><published>2026-04-14T21:23:59+00:00</published><updated>2026-04-14T21:23:59+00:00</updated><id>https://simonwillison.net/2026/Apr/14/trusted-access-openai/#atom-tag</id><summary type="html">
    
&lt;p&gt;&lt;strong&gt;&lt;a href="https://openai.com/index/scaling-trusted-access-for-cyber-defense/"&gt;Trusted access for the next era of cyber defense&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;
OpenAI's answer to &lt;a href="https://simonwillison.net/2026/Apr/7/project-glasswing/"&gt;Claude Mythos&lt;/a&gt; appears to be a new model called GPT-5.4-Cyber:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;In preparation for increasingly more capable models from OpenAI over the next few months, we are fine-tuning our models specifically to enable defensive cybersecurity use cases, starting today with a variant of GPT‑5.4 trained to be cyber-permissive: GPT‑5.4‑Cyber.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;They're also extending a program they launched in February (which I had missed) called &lt;a href="https://openai.com/index/trusted-access-for-cyber/"&gt;Trusted Access for Cyber&lt;/a&gt;, where users can verify their identity (via a photo of a government-issued ID processed by &lt;a href="https://withpersona.com/"&gt;Persona&lt;/a&gt;) to gain "reduced friction" access to OpenAI's models for cybersecurity work.&lt;/p&gt;
&lt;p&gt;Honestly, this OpenAI announcement is difficult to follow. Unsurprisingly they don't mention Anthropic at all, but much of the piece emphasizes their many years of existing cybersecurity work and their goal to "democratize access" to these tools, hence the emphasis on that self-service verification flow from February.&lt;/p&gt;
&lt;p&gt;If you want access to their best security tools you still need to go through an extra Google Form application process though, which doesn't feel particularly different to me from Anthropic's &lt;a href="https://www.anthropic.com/glasswing"&gt;Project Glasswing&lt;/a&gt;.

    &lt;p&gt;&lt;small&gt;&lt;/small&gt;Via &lt;a href="https://news.ycombinator.com/item?id=47770770"&gt;Hacker News&lt;/a&gt;&lt;/small&gt;&lt;/p&gt;


    &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/security"&gt;security&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/generative-ai"&gt;generative-ai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/ai-security-research"&gt;ai-security-research&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/openai"&gt;openai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/ai"&gt;ai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/llms"&gt;llms&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/anthropic"&gt;anthropic&lt;/a&gt;&lt;/p&gt;



</summary><category term="security"/><category term="generative-ai"/><category term="ai-security-research"/><category term="openai"/><category term="ai"/><category term="llms"/><category term="anthropic"/></entry><entry><title>ChatGPT voice mode is a weaker model</title><link href="https://simonwillison.net/2026/Apr/10/voice-mode-is-weaker/#atom-tag" rel="alternate"/><published>2026-04-10T15:56:02+00:00</published><updated>2026-04-10T15:56:02+00:00</updated><id>https://simonwillison.net/2026/Apr/10/voice-mode-is-weaker/#atom-tag</id><summary type="html">
    &lt;p&gt;I think it's non-obvious to many people that the OpenAI voice mode runs on a much older, much weaker model - it feels like the AI that you can talk to should be the smartest AI but it really isn't.&lt;/p&gt;
&lt;p&gt;If you ask ChatGPT voice mode for its knowledge cutoff date it tells you April 2024 - it's a GPT-4o era model.&lt;/p&gt;
&lt;p&gt;This thought inspired by &lt;a href="https://twitter.com/karpathy/status/2042334451611693415"&gt;this Andrej Karpathy tweet&lt;/a&gt; about the growing gap in understanding of AI capability based on the access points and domains people are using the models with:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;[...] It really is simultaneously the case that OpenAI's free and I think slightly orphaned (?) "Advanced Voice Mode" will fumble the dumbest questions in your Instagram's reels and &lt;em&gt;at the same time&lt;/em&gt;, OpenAI's highest-tier and paid Codex model will go off for 1 hour to coherently restructure an entire code base, or find and exploit vulnerabilities in computer systems.&lt;/p&gt;
&lt;p&gt;This part really works and has made dramatic strides because 2 properties:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;these domains offer explicit reward functions that are verifiable meaning they are easily amenable to reinforcement learning training (e.g. unit tests passed yes or no, in contrast to writing, which is much harder to explicitly judge),  but also&lt;/li&gt;
&lt;li&gt;they are a lot more valuable in b2b settings, meaning that the biggest fraction of the team is focused on improving them.&lt;/li&gt;
&lt;/ol&gt;
&lt;/blockquote&gt;

    &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/andrej-karpathy"&gt;andrej-karpathy&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/generative-ai"&gt;generative-ai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/openai"&gt;openai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/chatgpt"&gt;chatgpt&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/ai"&gt;ai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/llms"&gt;llms&lt;/a&gt;&lt;/p&gt;



</summary><category term="andrej-karpathy"/><category term="generative-ai"/><category term="openai"/><category term="chatgpt"/><category term="ai"/><category term="llms"/></entry><entry><title>Quoting Chengpeng Mou</title><link href="https://simonwillison.net/2026/Apr/5/chengpeng-mou/#atom-tag" rel="alternate"/><published>2026-04-05T21:47:06+00:00</published><updated>2026-04-05T21:47:06+00:00</updated><id>https://simonwillison.net/2026/Apr/5/chengpeng-mou/#atom-tag</id><summary type="html">
    &lt;blockquote cite="https://twitter.com/cpmou2022/status/2040606209800290404"&gt;&lt;p&gt;From anonymized U.S. ChatGPT data, we are seeing:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;~2M weekly messages on health insurance&lt;/li&gt;
&lt;li&gt;~600K weekly messages [classified as healthcare] from people living in “hospital deserts” (30 min drive to nearest hospital)&lt;/li&gt;
&lt;li&gt;7 out of 10 msgs happen outside clinic hours&lt;/li&gt;
&lt;/ul&gt;&lt;/blockquote&gt;
&lt;p class="cite"&gt;&amp;mdash; &lt;a href="https://twitter.com/cpmou2022/status/2040606209800290404"&gt;Chengpeng Mou&lt;/a&gt;, Head of Business Finance, OpenAI&lt;/p&gt;

    &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/ai-ethics"&gt;ai-ethics&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/generative-ai"&gt;generative-ai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/openai"&gt;openai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/chatgpt"&gt;chatgpt&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/ai"&gt;ai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/llms"&gt;llms&lt;/a&gt;&lt;/p&gt;



</summary><category term="ai-ethics"/><category term="generative-ai"/><category term="openai"/><category term="chatgpt"/><category term="ai"/><category term="llms"/></entry><entry><title>Thoughts on OpenAI acquiring Astral and uv/ruff/ty</title><link href="https://simonwillison.net/2026/Mar/19/openai-acquiring-astral/#atom-tag" rel="alternate"/><published>2026-03-19T16:45:15+00:00</published><updated>2026-03-19T16:45:15+00:00</updated><id>https://simonwillison.net/2026/Mar/19/openai-acquiring-astral/#atom-tag</id><summary type="html">
    &lt;p&gt;The big news this morning: &lt;a href="https://astral.sh/blog/openai"&gt;Astral to join OpenAI&lt;/a&gt; (on the Astral blog) and &lt;a href="https://openai.com/index/openai-to-acquire-astral/"&gt;OpenAI to acquire Astral&lt;/a&gt; (the OpenAI announcement). Astral are the company behind &lt;a href="https://simonwillison.net/tags/uv/"&gt;uv&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/ruff/"&gt;ruff&lt;/a&gt;, and &lt;a href="https://simonwillison.net/tags/ty/"&gt;ty&lt;/a&gt; - three increasingly load-bearing open source projects in the Python ecosystem. I have thoughts!&lt;/p&gt;
&lt;h4 id="the-official-line-from-openai-and-astral"&gt;The official line from OpenAI and Astral&lt;/h4&gt;
&lt;p&gt;The Astral team will become part of the Codex team at OpenAI.&lt;/p&gt;
&lt;p&gt;Charlie Marsh &lt;a href="https://astral.sh/blog/openai"&gt;has this to say&lt;/a&gt;:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;Open source is at the heart of that impact and the heart of that story; it sits at the center of everything we do. In line with our philosophy and &lt;a href="https://openai.com/index/openai-to-acquire-astral/"&gt;OpenAI's own announcement&lt;/a&gt;, OpenAI will continue supporting our open source tools after the deal closes. We'll keep building in the open, alongside our community -- and for the broader Python ecosystem -- just as we have from the start. [...]&lt;/p&gt;
&lt;p&gt;After joining the Codex team, we'll continue building our open source tools, explore ways they can work more seamlessly with Codex, and expand our reach to think more broadly about the future of software development.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;OpenAI's message &lt;a href="https://openai.com/index/openai-to-acquire-astral/"&gt;has a slightly different focus&lt;/a&gt; (highlights mine):&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;As part of our developer-first philosophy, after closing OpenAI plans to support Astral’s open source products. &lt;strong&gt;By bringing Astral’s tooling and engineering expertise to OpenAI, we will accelerate our work on Codex&lt;/strong&gt; and expand what AI can do across the software development lifecycle.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;This is a slightly confusing message. The &lt;a href="https://github.com/openai/codex"&gt;Codex CLI&lt;/a&gt; is a Rust application, and Astral have some of the best Rust engineers in the industry - &lt;a href="https://github.com/burntsushi"&gt;BurntSushi&lt;/a&gt; alone (&lt;a href="https://github.com/rust-lang/regex"&gt;Rust regex&lt;/a&gt;, &lt;a href="https://github.com/BurntSushi/ripgrep"&gt;ripgrep&lt;/a&gt;, &lt;a href="https://github.com/BurntSushi/jiff"&gt;jiff&lt;/a&gt;) may be worth the price of acquisition!&lt;/p&gt;
&lt;p&gt;So is this about the talent or about the product? I expect both, but I know from past experience that a product+talent acquisition can turn into a talent-only acquisition later on.&lt;/p&gt;
&lt;h4 id="uv-is-the-big-one"&gt;uv is the big one&lt;/h4&gt;
&lt;p&gt;Of Astral's projects the most impactful is &lt;a href="https://github.com/astral-sh/uv"&gt;uv&lt;/a&gt;. If you're not familiar with it, &lt;code&gt;uv&lt;/code&gt; is by far the most convincing solution to Python's environment management problems, best illustrated by &lt;a href="https://xkcd.com/1987/"&gt;this classic XKCD&lt;/a&gt;:&lt;/p&gt;
&lt;p style="text-align: center"&gt;&lt;img src="https://imgs.xkcd.com/comics/python_environment.png" alt="xkcd comic showing a tangled, chaotic flowchart of Python environment paths and installations. Nodes include &amp;quot;PIP&amp;quot;, &amp;quot;EASY_INSTALL&amp;quot;, &amp;quot;$PYTHONPATH&amp;quot;, &amp;quot;ANACONDA PYTHON&amp;quot;, &amp;quot;ANOTHER PIP??&amp;quot;, &amp;quot;HOMEBREW PYTHON (2.7)&amp;quot;, &amp;quot;OS PYTHON&amp;quot;, &amp;quot;HOMEBREW PYTHON (3.6)&amp;quot;, &amp;quot;PYTHON.ORG BINARY (2.6)&amp;quot;, and &amp;quot;(MISC FOLDERS OWNED BY ROOT)&amp;quot; connected by a mess of overlapping arrows. A stick figure with a &amp;quot;?&amp;quot; stands at the top left. Paths at the bottom include &amp;quot;/usr/local/Cellar&amp;quot;, &amp;quot;/usr/local/opt&amp;quot;, &amp;quot;/usr/local/lib/python3.6&amp;quot;, &amp;quot;/usr/local/lib/python2.7&amp;quot;, &amp;quot;/python/&amp;quot;, &amp;quot;/newenv/&amp;quot;, &amp;quot;$PATH&amp;quot;, &amp;quot;????&amp;quot;, and &amp;quot;/(A BUNCH OF PATHS WITH &amp;quot;FRAMEWORKS&amp;quot; IN THEM SOMEWHERE)/&amp;quot;. Caption reads: &amp;quot;MY PYTHON ENVIRONMENT HAS BECOME SO DEGRADED THAT MY LAPTOP HAS BEEN DECLARED A SUPERFUND SITE.&amp;quot;" style="max-width: 100%;" /&gt;&lt;/p&gt;
&lt;p&gt;Switch from &lt;code&gt;python&lt;/code&gt; to &lt;code&gt;uv run&lt;/code&gt; and most of these problems go away. I've been using it extensively for the past couple of years and it's become an essential part of my workflow.&lt;/p&gt;
&lt;p&gt;I'm not alone in this. According to PyPI Stats &lt;a href="https://pypistats.org/packages/uv"&gt;uv was downloaded&lt;/a&gt; more than 126 million times last month! Since its release in February 2024 - just two years ago - it's become one of the most popular tools for running Python code.&lt;/p&gt;
&lt;h4 id="ruff-and-ty"&gt;Ruff and ty&lt;/h4&gt;
&lt;p&gt;Astral's two other big projects are &lt;a href="https://github.com/astral-sh/ruff"&gt;ruff&lt;/a&gt; - a Python linter and formatter - and &lt;a href="https://github.com/astral-sh/ty"&gt;ty&lt;/a&gt; - a fast Python type checker.&lt;/p&gt;
&lt;p&gt;These are popular tools that provide a great developer experience but they aren't load-bearing in the same way that &lt;code&gt;uv&lt;/code&gt; is.&lt;/p&gt;
&lt;p&gt;They do however resonate well with coding agent tools like Codex - giving an agent access to fast linting and type checking tools can help improve the quality of the code they generate.&lt;/p&gt;
&lt;p&gt;I'm not convinced that integrating them &lt;em&gt;into&lt;/em&gt; the coding agent itself as opposed to telling it when to run them will make a meaningful difference, but I may just not be imaginative enough here.&lt;/p&gt;
&lt;h4 id="what-of-pyx-"&gt;What of pyx?&lt;/h4&gt;
&lt;p&gt;Ever since &lt;code&gt;uv&lt;/code&gt; started to gain traction the Python community has been worrying about the strategic risk of a single VC-backed company owning a key piece of Python infrastructure. I &lt;a href="https://simonwillison.net/2024/Sep/8/uv-under-discussion-on-mastodon/"&gt;wrote about&lt;/a&gt; one of those conversations in detail back in September 2024.&lt;/p&gt;
&lt;p&gt;The conversation back then focused on what Astral's business plan could be, which started to take form &lt;a href="https://simonwillison.net/2025/Aug/13/pyx/"&gt;in August 2025&lt;/a&gt; when they announced &lt;a href="https://astral.sh/pyx"&gt;pyx&lt;/a&gt;, their private PyPI-style package registry for organizations.&lt;/p&gt;
&lt;p&gt;I'm less convinced that pyx makes sense within OpenAI, and it's notably absent from both the Astral and OpenAI announcement posts.&lt;/p&gt;
&lt;h4 id="competitive-dynamics"&gt;Competitive dynamics&lt;/h4&gt;
&lt;p&gt;An interesting aspect of this deal is how it might impact the competition between Anthropic and OpenAI.&lt;/p&gt;
&lt;p&gt;Both companies spent most of 2025 focused on improving the coding ability of their models, resulting in the &lt;a href="https://simonwillison.net/tags/november-2025-inflection/"&gt;November 2025 inflection point&lt;/a&gt; when coding agents went from often-useful to almost-indispensable tools for software development.&lt;/p&gt;
&lt;p&gt;The competition between Anthropic's Claude Code and OpenAI's Codex is &lt;em&gt;fierce&lt;/em&gt;. Those $200/month subscriptions add up to billions of dollars a year in revenue, for companies that very much need that money.&lt;/p&gt;
&lt;p&gt;Anthropic &lt;a href="https://www.anthropic.com/news/anthropic-acquires-bun-as-claude-code-reaches-usd1b-milestone"&gt;acquired the Bun JavaScript runtime&lt;/a&gt; in December 2025, an acquisition that looks somewhat similar in shape to Astral.&lt;/p&gt;
&lt;p&gt;Bun was already a core component of Claude Code and that acquisition looked to mainly be about ensuring that a crucial dependency stayed actively maintained. Claude Code's performance has increased significantly since then thanks to the efforts of Bun's Jarred Sumner.&lt;/p&gt;
&lt;p&gt;One bad version of this deal would be if OpenAI start using their ownership of &lt;code&gt;uv&lt;/code&gt; as leverage in their competition with Anthropic.&lt;/p&gt;
&lt;h4 id="astral-s-quiet-series-a-and-b"&gt;Astral's quiet series A and B&lt;/h4&gt;
&lt;p&gt;One detail that caught my eye from Astral's announcement, in the section thanking the team, investors, and community:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;Second, to our investors, especially &lt;a href="https://www.accel.com/team/casey-aylward#bay-area"&gt;Casey Aylward&lt;/a&gt; from Accel, who led our Seed and Series A, and &lt;a href="https://a16z.com/author/jennifer-li/"&gt;Jennifer Li&lt;/a&gt; from Andreessen Horowitz, who led our Series B. As a first-time, technical, solo founder, you showed far more belief in me than I ever showed in myself, and I will never forget that.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;As far as I can tell neither the Series A nor the Series B were previously announced - I've only been able to find coverage of the original seed round &lt;a href="https://astral.sh/blog/announcing-astral-the-company-behind-ruff"&gt;from April 2023&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;Those investors presumably now get to exchange their stake in Astral for a piece of OpenAI. I wonder how much influence they had on Astral's decision to sell.&lt;/p&gt;
&lt;h4 id="forking-as-a-credible-exit-"&gt;Forking as a credible exit?&lt;/h4&gt;
&lt;p&gt;Armin Ronacher built &lt;a href="https://til.simonwillison.net/python/rye"&gt;Rye&lt;/a&gt;, which was later taken over by Astral and effectively merged with uv. In &lt;a href="https://lucumr.pocoo.org/2024/8/21/harvest-season/"&gt;August 2024&lt;/a&gt; he wrote about the risk involved in a VC-backed company owning a key piece of open source infrastructure and said the following (highlight mine):&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;However having seen the code and what uv is doing, &lt;strong&gt;even in the worst possible future this is a very forkable and maintainable thing&lt;/strong&gt;. I believe that even in case Astral shuts down or were to do something incredibly dodgy licensing wise, the community would be better off than before uv existed.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;Astral's own Douglas Creager &lt;a href="https://news.ycombinator.com/item?id=47438723#47439974"&gt;emphasized this angle on Hacker News today&lt;/a&gt;:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;All I can say is that &lt;em&gt;right now&lt;/em&gt;, we're committed to maintaining our open-source tools with the same level of effort, care, and attention to detail as before. That does not change with this acquisition. No one can guarantee how motives, incentives, and decisions might change years down the line. But that's why we bake optionality into it with the tools being permissively licensed. That makes the worst-case scenarios have the shape of "fork and move on", and not "software disappears forever".&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;I like and trust the Astral team and I'm optimistic that their projects will be well-maintained in their new home.&lt;/p&gt;
&lt;p&gt;OpenAI don't yet have much of a track record with respect to acquiring and maintaining open source projects. They've been on a bit of an acquisition spree over the past three months though, snapping up &lt;a href="https://openai.com/index/openai-to-acquire-promptfoo/"&gt;Promptfoo&lt;/a&gt; and &lt;a href="https://steipete.me/posts/2026/openclaw"&gt;OpenClaw&lt;/a&gt; (sort-of, they hired creator Peter Steinberger and are spinning OpenClaw off to a foundation), plus closed source LaTeX platform &lt;a href="https://openai.com/index/introducing-prism/"&gt;Crixet (now Prism)&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;If things do go south for &lt;code&gt;uv&lt;/code&gt; and the other Astral projects we'll get to see how credible the forking exit strategy turns out to be.&lt;/p&gt;
    
        &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/ty"&gt;ty&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/uv"&gt;uv&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/openai"&gt;openai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/astral"&gt;astral&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/ai"&gt;ai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/ruff"&gt;ruff&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/codex"&gt;codex&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/rust"&gt;rust&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/coding-agents"&gt;coding-agents&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/python"&gt;python&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/charlie-marsh"&gt;charlie-marsh&lt;/a&gt;&lt;/p&gt;
    

</summary><category term="ty"/><category term="uv"/><category term="openai"/><category term="astral"/><category term="ai"/><category term="ruff"/><category term="codex"/><category term="rust"/><category term="coding-agents"/><category term="python"/><category term="charlie-marsh"/></entry><entry><title>GPT-5.4 mini and GPT-5.4 nano, which can describe 76,000 photos for $52</title><link href="https://simonwillison.net/2026/Mar/17/mini-and-nano/#atom-tag" rel="alternate"/><published>2026-03-17T19:39:17+00:00</published><updated>2026-03-17T19:39:17+00:00</updated><id>https://simonwillison.net/2026/Mar/17/mini-and-nano/#atom-tag</id><summary type="html">
    &lt;p&gt;OpenAI today: &lt;a href="https://openai.com/index/introducing-gpt-5-4-mini-and-nano/"&gt;Introducing GPT‑5.4 mini and nano&lt;/a&gt;. These models join GPT-5.4 which was released &lt;a href="https://openai.com/index/introducing-gpt-5-4/"&gt;two weeks ago&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;OpenAI's self-reported benchmarks show the new 5.4-nano out-performing their previous GPT-5 mini model when run at maximum reasoning effort. The new mini is also 2x faster than the previous mini.&lt;/p&gt;
&lt;p&gt;Here's how the pricing looks - all prices are per million tokens. &lt;code&gt;gpt-5.4-nano&lt;/code&gt; is notably even cheaper than Google's Gemini 3.1 Flash-Lite:&lt;/p&gt;
&lt;center&gt;&lt;table&gt;
  &lt;thead&gt;
    &lt;tr&gt;
      &lt;th&gt;Model&lt;/th&gt;
      &lt;th&gt;Input&lt;/th&gt;
      &lt;th&gt;Cached input&lt;/th&gt;
      &lt;th&gt;Output&lt;/th&gt;
    &lt;/tr&gt;
  &lt;/thead&gt;
  &lt;tbody&gt;
    &lt;tr&gt;
      &lt;td&gt;gpt-5.4&lt;/td&gt;
      &lt;td&gt;$2.50&lt;/td&gt;
      &lt;td&gt;$0.25&lt;/td&gt;
      &lt;td&gt;$15.00&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;gpt-5.4-mini&lt;/td&gt;
      &lt;td&gt;$0.75&lt;/td&gt;
      &lt;td&gt;$0.075&lt;/td&gt;
      &lt;td&gt;$4.50&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;gpt-5.4-nano&lt;/td&gt;
      &lt;td&gt;$0.20&lt;/td&gt;
      &lt;td&gt;$0.02&lt;/td&gt;
      &lt;td&gt;$1.25&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;&lt;td colspan="4"&gt;&lt;center&gt;Other models for comparison&lt;/center&gt;&lt;/td&gt;&lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;Claude Opus 4.6&lt;/td&gt;
      &lt;td&gt;$5.00&lt;/td&gt;
      &lt;td&gt;-&lt;/td&gt;
      &lt;td&gt;$25.00&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;Claude Sonnet 4.6&lt;/td&gt;
      &lt;td&gt;$3.00&lt;/td&gt;
      &lt;td&gt;-&lt;/td&gt;
      &lt;td&gt;$15.00&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;Gemini 3.1 Pro&lt;/td&gt;
      &lt;td&gt;$2.00&lt;/td&gt;
      &lt;td&gt;-&lt;/td&gt;
      &lt;td&gt;$12.00&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;Claude Haiku 4.5&lt;/td&gt;
      &lt;td&gt;$1.00&lt;/td&gt;
      &lt;td&gt;-&lt;/td&gt;
      &lt;td&gt;$5.00&lt;/td&gt;
    &lt;/tr&gt;
&lt;tr&gt;
      &lt;td&gt;Gemini 3.1 Flash-Lite&lt;/td&gt;
      &lt;td&gt;$0.25&lt;/td&gt;
      &lt;td&gt;-&lt;/td&gt;
      &lt;td&gt;$1.50&lt;/td&gt;
    &lt;/tr&gt;
  &lt;/tbody&gt;
&lt;/table&gt;&lt;/center&gt;
&lt;p&gt;I used GPT-5.4 nano to generate a description of this photo I took at the &lt;a href="https://www.niche-museums.com/118"&gt;John M. Mossman Lock Collection&lt;/a&gt;:&lt;/p&gt;
&lt;p&gt;&lt;img src="https://static.simonwillison.net/static/2026/IMG_2324.jpeg" alt="Description below" style="max-width: 100%;" /&gt;&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;llm -m gpt-5.4-nano -a IMG_2324.jpeg 'describe image'
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Here's the output:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;The image shows the interior of a museum gallery with a long display wall. White-painted brick walls are covered with many framed portraits arranged in neat rows. Below the portraits, there are multiple glass display cases with dark wooden frames and glass tops/fronts, containing various old historical objects and equipment. The room has a polished wooden floor, hanging ceiling light fixtures/cords, and a few visible pipes near the top of the wall. In the foreground, glass cases run along the length of the room, reflecting items from other sections of the gallery.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;That took 2,751 input tokens and 112 output tokens, at a cost of &lt;a href="https://www.llm-prices.com/#it=2751&amp;amp;ot=112&amp;amp;sel=gpt-5.4-nano"&gt;0.069 cents&lt;/a&gt; (less than a tenth of a cent). That means describing every single photo in my 76,000 photo collection would cost around $52.44.&lt;/p&gt;
&lt;p&gt;I released &lt;a href="https://llm.datasette.io/en/stable/changelog.html#v0-29"&gt;llm 0.29&lt;/a&gt; with support for the new models.&lt;/p&gt;
&lt;h4 id="pelicans"&gt;Pelicans&lt;/h4&gt;
&lt;p&gt;Then I had OpenAI Codex loop through all five reasoning effort levels and all three models and produce this combined SVG grid of pelicans riding bicycles (&lt;a href="https://gist.github.com/simonw/f16292d9a5b90b28054cff3ba497a3ca"&gt;generation transcripts here&lt;/a&gt;). I do like the gpt-5.4 xhigh one the best, it has a good bicycle (with nice spokes) and the pelican has a fish in its beak!&lt;/p&gt;
&lt;p&gt;&lt;img src="https://static.simonwillison.net/static/2026/gpt-5.4-pelican-family.svg" alt="Described by Claude Opus 4.6: A 5x3 comparison grid of AI-generated cartoon illustrations of a pelican riding a bicycle. Columns are labeled &amp;quot;gpt-5.4-nano&amp;quot;, &amp;quot;gpt-5.4-mini&amp;quot;, and &amp;quot;gpt-5.4&amp;quot; across the top, and rows are labeled &amp;quot;none&amp;quot;, &amp;quot;low&amp;quot;, &amp;quot;medium&amp;quot;, &amp;quot;high&amp;quot;, and &amp;quot;xhigh&amp;quot; down the left side, representing quality/detail settings. In the &amp;quot;none&amp;quot; row, gpt-5.4-nano shows a chaotic white bird with misplaced arrows and tangled wheels on grass, gpt-5.4-mini shows a duck-like brown bird awkwardly straddling a motorcycle-like bike, and gpt-5.4 shows a stiff gray-and-white pelican sitting atop a blue tandem bicycle with extra legs. In the &amp;quot;low&amp;quot; row, nano shows a chubby round white bird pedaling with small feet on grass, mini shows a cleaner white bird riding a blue bicycle with motion lines, and gpt-5.4 shows a pelican with a blue cap riding confidently but with slightly awkward proportions. In the &amp;quot;medium&amp;quot; row, nano regresses to a strange bird standing over bowling balls on ice, mini shows two plump white birds merged onto one yellow-wheeled bicycle, and gpt-5.4 shows a more recognizable gray-and-white pelican on a red bicycle but with tangled extra legs. In the &amp;quot;high&amp;quot; row, nano shows multiple small pelicans crowded around a broken green bicycle on grass with a sun overhead, mini shows a tandem bicycle with two white pelicans and clear blue sky, and gpt-5.4 shows two pelicans stacked on a red tandem bike with the most realistic proportions yet. In the &amp;quot;xhigh&amp;quot; row, nano shows the most detailed scene with a pelican on a detailed bicycle with grass and a large sun but still somewhat jumbled anatomy, mini produces the cleanest single pelican on a yellow-accented bicycle with a light blue sky, and gpt-5.4 shows a well-rendered gray pelican on a teal bicycle with the best overall coherence. Generally, quality improves moving right across models and down through quality tiers, though &amp;quot;medium&amp;quot; is inconsistently worse than &amp;quot;low&amp;quot; for some models, and all images maintain a lighthearted cartoon style with pastel skies and simple backgrounds." style="max-width: 100%;" /&gt;&lt;/p&gt;
    
        &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/vision-llms"&gt;vision-llms&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/llm"&gt;llm&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/openai"&gt;openai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/pelican-riding-a-bicycle"&gt;pelican-riding-a-bicycle&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/llm-pricing"&gt;llm-pricing&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/ai"&gt;ai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/llms"&gt;llms&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/llm-release"&gt;llm-release&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/generative-ai"&gt;generative-ai&lt;/a&gt;&lt;/p&gt;
    

</summary><category term="vision-llms"/><category term="llm"/><category term="openai"/><category term="pelican-riding-a-bicycle"/><category term="llm-pricing"/><category term="ai"/><category term="llms"/><category term="llm-release"/><category term="generative-ai"/></entry><entry><title>Use subagents and custom agents in Codex</title><link href="https://simonwillison.net/2026/Mar/16/codex-subagents/#atom-tag" rel="alternate"/><published>2026-03-16T23:03:56+00:00</published><updated>2026-03-16T23:03:56+00:00</updated><id>https://simonwillison.net/2026/Mar/16/codex-subagents/#atom-tag</id><summary type="html">
    
&lt;p&gt;&lt;strong&gt;&lt;a href="https://developers.openai.com/codex/subagents"&gt;Use subagents and custom agents in Codex&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;
Subagents were announced in general availability today for OpenAI Codex, after several weeks of preview behind a feature flag.&lt;/p&gt;
&lt;p&gt;They're very similar to the Claude Code implementation, with default subagents for "explorer", "worker" and "default". It's unclear to me what the difference between "worker" and "default" is but based on their CSV example I think "worker" is intended for running large numbers of small tasks in parallel.&lt;/p&gt;
&lt;p&gt;Codex also lets you define custom agents as TOML files in &lt;code&gt;~/.codex/agents/&lt;/code&gt;. These can have custom instructions and be assigned to use specific models - including &lt;code&gt;gpt-5.3-codex-spark&lt;/code&gt; if you want &lt;a href="https://simonwillison.net/2026/Feb/12/codex-spark/"&gt;some raw speed&lt;/a&gt;. They can then be referenced by name, as demonstrated by this example prompt from the documentation:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;&lt;code&gt;Investigate why the settings modal fails to save. Have browser_debugger reproduce it, code_mapper trace the responsible code path, and ui_fixer implement the smallest fix once the failure mode is clear.&lt;/code&gt;&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;The subagents pattern is widely supported in coding agents now. Here's documentation across a number of different platforms:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href="https://developers.openai.com/codex/subagents/"&gt;OpenAI Codex subagents&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://code.claude.com/docs/en/sub-agents"&gt;Claude Code subagents&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://geminicli.com/docs/core/subagents/"&gt;Gemini CLI subagents&lt;/a&gt; (experimental)&lt;/li&gt;
&lt;li&gt;&lt;a href="https://docs.mistral.ai/mistral-vibe/agents-skills#agent-selection"&gt;Mistral Vibe subagents&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://opencode.ai/docs/agents/"&gt;OpenCode agents&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://code.visualstudio.com/docs/copilot/agents/subagents"&gt;Subagents in Visual Studio Code&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://cursor.com/docs/subagents"&gt;Cursor Subagents&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;strong&gt;Update&lt;/strong&gt;: I added &lt;a href="https://simonwillison.net/guides/agentic-engineering-patterns/subagents/"&gt;a chapter on Subagents&lt;/a&gt; to my Agentic Engineering Patterns guide.

    &lt;p&gt;&lt;small&gt;&lt;/small&gt;Via &lt;a href="https://twitter.com/OpenAIDevs/status/2033636701848174967"&gt;@OpenAIDevs&lt;/a&gt;&lt;/small&gt;&lt;/p&gt;


    &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/parallel-agents"&gt;parallel-agents&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/coding-agents"&gt;coding-agents&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/generative-ai"&gt;generative-ai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/openai"&gt;openai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/agentic-engineering"&gt;agentic-engineering&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/ai"&gt;ai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/llms"&gt;llms&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/codex"&gt;codex&lt;/a&gt;&lt;/p&gt;



</summary><category term="parallel-agents"/><category term="coding-agents"/><category term="generative-ai"/><category term="openai"/><category term="agentic-engineering"/><category term="ai"/><category term="llms"/><category term="codex"/></entry><entry><title>Codex for Open Source</title><link href="https://simonwillison.net/2026/Mar/7/codex-for-open-source/#atom-tag" rel="alternate"/><published>2026-03-07T18:13:39+00:00</published><updated>2026-03-07T18:13:39+00:00</updated><id>https://simonwillison.net/2026/Mar/7/codex-for-open-source/#atom-tag</id><summary type="html">
    
&lt;p&gt;&lt;strong&gt;&lt;a href="https://developers.openai.com/codex/community/codex-for-oss"&gt;Codex for Open Source&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;
Anthropic announced six months of free Claude Max for maintainers of popular open source projects (5,000+ stars or 1M+ NPM downloads) &lt;a href="https://simonwillison.net/2026/Feb/27/claude-max-oss-six-months/"&gt;on 27th February&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;Now OpenAI have launched their comparable offer: six months of ChatGPT Pro (same $200/month price as Claude Max) with Codex and "conditional access to Codex Security" for core maintainers.&lt;/p&gt;
&lt;p&gt;Unlike Anthropic they don't hint at the exact metrics they care about, but the &lt;a href="https://openai.com/form/codex-for-oss/"&gt;application form&lt;/a&gt; does ask for "information such as GitHub stars, monthly downloads, or why the project is important to the ecosystem."

    &lt;p&gt;&lt;small&gt;&lt;/small&gt;Via &lt;a href="https://twitter.com/openaidevs/status/2029998191043911955"&gt;@openaidevs&lt;/a&gt;&lt;/small&gt;&lt;/p&gt;


    &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/open-source"&gt;open-source&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/codex"&gt;codex&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/generative-ai"&gt;generative-ai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/openai"&gt;openai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/ai"&gt;ai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/llms"&gt;llms&lt;/a&gt;&lt;/p&gt;



</summary><category term="open-source"/><category term="codex"/><category term="generative-ai"/><category term="openai"/><category term="ai"/><category term="llms"/></entry><entry><title>Anthropic and the Pentagon</title><link href="https://simonwillison.net/2026/Mar/6/anthropic-and-the-pentagon/#atom-tag" rel="alternate"/><published>2026-03-06T17:26:50+00:00</published><updated>2026-03-06T17:26:50+00:00</updated><id>https://simonwillison.net/2026/Mar/6/anthropic-and-the-pentagon/#atom-tag</id><summary type="html">
    
&lt;p&gt;&lt;strong&gt;&lt;a href="https://www.schneier.com/blog/archives/2026/03/anthropic-and-the-pentagon.html"&gt;Anthropic and the Pentagon&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;
This piece by Bruce Schneier and Nathan E. Sanders is the most thoughtful and grounded coverage I've seen of the recent and ongoing Pentagon/OpenAI/Anthropic contract situation.&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;AI models are increasingly commodified. The top-tier offerings have about the same performance, and there is little to differentiate one from the other. The latest models from Anthropic, OpenAI and Google, in particular, tend to leapfrog each other with minor hops forward in quality every few months. [...]&lt;/p&gt;
&lt;p&gt;In this sort of market, branding matters a lot. Anthropic and its CEO, Dario Amodei, are positioning themselves as the moral and trustworthy AI provider. That has market value for both consumers and enterprise clients.&lt;/p&gt;
&lt;/blockquote&gt;


    &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/ai-ethics"&gt;ai-ethics&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/bruce-schneier"&gt;bruce-schneier&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/anthropic"&gt;anthropic&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/generative-ai"&gt;generative-ai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/openai"&gt;openai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/ai"&gt;ai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/llms"&gt;llms&lt;/a&gt;&lt;/p&gt;



</summary><category term="ai-ethics"/><category term="bruce-schneier"/><category term="anthropic"/><category term="generative-ai"/><category term="openai"/><category term="ai"/><category term="llms"/></entry><entry><title>Introducing GPT‑5.4</title><link href="https://simonwillison.net/2026/Mar/5/introducing-gpt54/#atom-tag" rel="alternate"/><published>2026-03-05T23:56:09+00:00</published><updated>2026-03-05T23:56:09+00:00</updated><id>https://simonwillison.net/2026/Mar/5/introducing-gpt54/#atom-tag</id><summary type="html">
    
&lt;p&gt;&lt;strong&gt;&lt;a href="https://openai.com/index/introducing-gpt-5-4/"&gt;Introducing GPT‑5.4&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;
Two new API models: &lt;a href="https://developers.openai.com/api/docs/models/gpt-5.4"&gt;gpt-5.4&lt;/a&gt; and &lt;a href="https://developers.openai.com/api/docs/models/gpt-5.4-pro"&gt;gpt-5.4-pro&lt;/a&gt;, also available in ChatGPT and Codex CLI. August 31st 2025 knowledge cutoff, 1 million token context window. Priced &lt;a href="https://www.llm-prices.com/#sel=gpt-5.2%2Cgpt-5.2-pro%2Cgpt-5.4%2Cgpt-5.4-272k%2Cgpt-5.4-pro%2Cgpt-5.4-pro-272k"&gt;slightly higher&lt;/a&gt; than the GPT-5.2 family with a bump in price for both models if you go above 272,000 tokens.&lt;/p&gt;
&lt;p&gt;5.4 beats coding specialist GPT-5.3-Codex on all of the relevant benchmarks. I wonder if we'll get a 5.4 Codex or if that model line has now been merged into main?&lt;/p&gt;
&lt;p&gt;Given Claude's recent focus on business applications it's interesting to see OpenAI highlight this in their announcement of GPT-5.4:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;We put a particular focus on improving GPT‑5.4’s ability to create and edit spreadsheets, presentations, and documents. On an internal benchmark of spreadsheet modeling tasks that a junior investment banking analyst might do, GPT‑5.4 achieves a mean score of &lt;strong&gt;87.3%&lt;/strong&gt;, compared to &lt;strong&gt;68.4%&lt;/strong&gt; for GPT‑5.2.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;Here's a pelican on a bicycle &lt;a href="https://gist.github.com/simonw/7fe75b8dab6ec9c2b6bd8fd1a5a640a6"&gt;drawn by GPT-5.4&lt;/a&gt;:&lt;/p&gt;
&lt;p&gt;&lt;img alt="alt text by GPT-5.4: Illustration of a cartoon pelican riding a bicycle, with a light gray background, dark blue bike frame and wheels, orange beak and legs, and motion lines suggesting movement." src="https://static.simonwillison.net/static/2026/gpt-5.4-pelican.png" /&gt;&lt;/p&gt;
&lt;p&gt;And &lt;a href="https://gist.github.com/simonw/688c0d5d93a5539b93d3f549a0b733ad"&gt;here's one&lt;/a&gt; by GPT-5.4 Pro, which took 4m45s and cost me &lt;a href="https://www.llm-prices.com/#it=16&amp;amp;ot=8593&amp;amp;sel=gpt-5.4-pro"&gt;$1.55&lt;/a&gt;:&lt;/p&gt;
&lt;p&gt;&lt;img alt="Described by GPT-5.4: Illustration of a cartoon pelican riding a blue bicycle on pale green grass against a light gray background, with a large orange beak, gray-and-white body, and orange legs posed on the pedals." src="https://static.simonwillison.net/static/2026/gpt-5.4-pro-pelican.png" /&gt;


    &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/llm-release"&gt;llm-release&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/generative-ai"&gt;generative-ai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/openai"&gt;openai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/ai"&gt;ai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/llms"&gt;llms&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/pelican-riding-a-bicycle"&gt;pelican-riding-a-bicycle&lt;/a&gt;&lt;/p&gt;



</summary><category term="llm-release"/><category term="generative-ai"/><category term="openai"/><category term="ai"/><category term="llms"/><category term="pelican-riding-a-bicycle"/></entry><entry><title>Quoting Benedict Evans</title><link href="https://simonwillison.net/2026/Feb/26/benedict-evans/#atom-tag" rel="alternate"/><published>2026-02-26T03:44:56+00:00</published><updated>2026-02-26T03:44:56+00:00</updated><id>https://simonwillison.net/2026/Feb/26/benedict-evans/#atom-tag</id><summary type="html">
    &lt;blockquote cite="https://www.ben-evans.com/benedictevans/2026/2/19/how-will-openai-compete-nkg2x"&gt;&lt;p&gt;If people are only using this a couple of times a week at most, and can’t think of anything to do with it on the average day, it hasn’t changed their life. OpenAI itself admits the problem, talking about a ‘capability gap’ between what the models can do and what people do with them, which seems to me like a way to avoid saying that you don’t have clear product-market fit. &lt;/p&gt;
&lt;p&gt;Hence, OpenAI’s ad project is partly just about covering the cost of serving the 90% or more of users who don’t pay (and capturing an early lead with advertisers and early learning in how this might work), but more strategically, it’s also about making it possible to give those users the latest and most powerful (i.e. expensive) models, in the hope that this will deepen their engagement.&lt;/p&gt;&lt;/blockquote&gt;
&lt;p class="cite"&gt;&amp;mdash; &lt;a href="https://www.ben-evans.com/benedictevans/2026/2/19/how-will-openai-compete-nkg2x"&gt;Benedict Evans&lt;/a&gt;, How will OpenAI compete?&lt;/p&gt;

    &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/openai"&gt;openai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/chatgpt"&gt;chatgpt&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/benedict-evans"&gt;benedict-evans&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/ai"&gt;ai&lt;/a&gt;&lt;/p&gt;



</summary><category term="openai"/><category term="chatgpt"/><category term="benedict-evans"/><category term="ai"/></entry><entry><title>How I think about Codex</title><link href="https://simonwillison.net/2026/Feb/22/how-i-think-about-codex/#atom-tag" rel="alternate"/><published>2026-02-22T15:53:43+00:00</published><updated>2026-02-22T15:53:43+00:00</updated><id>https://simonwillison.net/2026/Feb/22/how-i-think-about-codex/#atom-tag</id><summary type="html">
    
&lt;p&gt;&lt;strong&gt;&lt;a href="https://www.linkedin.com/pulse/how-i-think-codex-gabriel-chua-ukhic"&gt;How I think about Codex&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;
Gabriel Chua (Developer Experience Engineer for APAC at OpenAI) provides his take on the confusing terminology behind the term "Codex", which can refer to a bunch of of different things within the OpenAI ecosystem:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;In plain terms, Codex is OpenAI’s software engineering agent, available through multiple interfaces, and an agent is a model plus instructions and tools, wrapped in a runtime that can execute tasks on your behalf. [...]&lt;/p&gt;
&lt;p&gt;At a high level, I see Codex as three parts working together:&lt;/p&gt;
&lt;p&gt;&lt;em&gt;Codex = Model + Harness + Surfaces&lt;/em&gt; [...]&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Model + Harness = the Agent&lt;/li&gt;
&lt;li&gt;Surfaces = how you interact with the Agent&lt;/li&gt;
&lt;/ul&gt;
&lt;/blockquote&gt;
&lt;p&gt;He defines the harness as "the collection of instructions and tools", which is notably open source and lives in the &lt;a href="https://github.com/openai/codex"&gt;openai/codex&lt;/a&gt; repository.&lt;/p&gt;
&lt;p&gt;Gabriel also provides the first acknowledgment I've seen from an OpenAI insider that the Codex model family are directly trained for the Codex harness:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;Codex models are trained in the presence of the harness. Tool use, execution loops, compaction, and iterative verification aren’t bolted on behaviors — they’re part of how the model learns to operate. The harness, in turn, is shaped around how the model plans, invokes tools, and recovers from failure.&lt;/p&gt;
&lt;/blockquote&gt;


    &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/codex"&gt;codex&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/ai-assisted-programming"&gt;ai-assisted-programming&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/generative-ai"&gt;generative-ai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/openai"&gt;openai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/definitions"&gt;definitions&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/llms"&gt;llms&lt;/a&gt;&lt;/p&gt;



</summary><category term="codex"/><category term="ai-assisted-programming"/><category term="generative-ai"/><category term="openai"/><category term="definitions"/><category term="llms"/></entry><entry><title>Quoting Thibault Sottiaux</title><link href="https://simonwillison.net/2026/Feb/21/thibault-sottiaux/#atom-tag" rel="alternate"/><published>2026-02-21T01:30:21+00:00</published><updated>2026-02-21T01:30:21+00:00</updated><id>https://simonwillison.net/2026/Feb/21/thibault-sottiaux/#atom-tag</id><summary type="html">
    &lt;blockquote cite="https://twitter.com/thsottiaux/status/2024947946849186064"&gt;&lt;p&gt;We’ve made GPT-5.3-Codex-Spark about 30% faster. It is now serving at over 1200 tokens per second.&lt;/p&gt;&lt;/blockquote&gt;
&lt;p class="cite"&gt;&amp;mdash; &lt;a href="https://twitter.com/thsottiaux/status/2024947946849186064"&gt;Thibault Sottiaux&lt;/a&gt;, OpenAI&lt;/p&gt;

    &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/openai"&gt;openai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/llms"&gt;llms&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/ai"&gt;ai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/generative-ai"&gt;generative-ai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/llm-performance"&gt;llm-performance&lt;/a&gt;&lt;/p&gt;



</summary><category term="openai"/><category term="llms"/><category term="ai"/><category term="generative-ai"/><category term="llm-performance"/></entry><entry><title>SWE-bench February 2026 leaderboard update</title><link href="https://simonwillison.net/2026/Feb/19/swe-bench/#atom-tag" rel="alternate"/><published>2026-02-19T04:48:47+00:00</published><updated>2026-02-19T04:48:47+00:00</updated><id>https://simonwillison.net/2026/Feb/19/swe-bench/#atom-tag</id><summary type="html">
    
&lt;p&gt;&lt;strong&gt;&lt;a href="https://www.swebench.com/"&gt;SWE-bench February 2026 leaderboard update&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;
SWE-bench is one of the benchmarks that the labs love to list in their model releases. The official leaderboard is infrequently updated but they just did a full run of it against the current generation of models, which is notable because it's always good to see benchmark results like this that &lt;em&gt;weren't&lt;/em&gt; self-reported by the labs.&lt;/p&gt;
&lt;p&gt;The fresh results are for their "Bash Only" benchmark, which runs their &lt;a href="https://github.com/SWE-agent/mini-swe-agent"&gt;mini-swe-bench&lt;/a&gt; agent (~9,000 lines of Python, &lt;a href="https://github.com/SWE-agent/mini-swe-agent/blob/v2.2.1/src/minisweagent/config/benchmarks/swebench.yaml"&gt;here are the prompts&lt;/a&gt; they use) against the &lt;a href="https://huggingface.co/datasets/princeton-nlp/SWE-bench"&gt;SWE-bench&lt;/a&gt; dataset of coding problems - 2,294 real-world examples pulled from 12 open source repos: &lt;a href="https://github.com/django/django"&gt;django/django&lt;/a&gt; (850), &lt;a href="https://github.com/sympy/sympy"&gt;sympy/sympy&lt;/a&gt; (386), &lt;a href="https://github.com/scikit-learn/scikit-learn"&gt;scikit-learn/scikit-learn&lt;/a&gt; (229), &lt;a href="https://github.com/sphinx-doc/sphinx"&gt;sphinx-doc/sphinx&lt;/a&gt; (187), &lt;a href="https://github.com/matplotlib/matplotlib"&gt;matplotlib/matplotlib&lt;/a&gt; (184), &lt;a href="https://github.com/pytest-dev/pytest"&gt;pytest-dev/pytest&lt;/a&gt; (119), &lt;a href="https://github.com/pydata/xarray"&gt;pydata/xarray&lt;/a&gt; (110), &lt;a href="https://github.com/astropy/astropy"&gt;astropy/astropy&lt;/a&gt; (95), &lt;a href="https://github.com/pylint-dev/pylint"&gt;pylint-dev/pylint&lt;/a&gt; (57), &lt;a href="https://github.com/psf/requests"&gt;psf/requests&lt;/a&gt; (44), &lt;a href="https://github.com/mwaskom/seaborn"&gt;mwaskom/seaborn&lt;/a&gt; (22), &lt;a href="https://github.com/pallets/flask"&gt;pallets/flask&lt;/a&gt; (11).&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Correction&lt;/strong&gt;: &lt;em&gt;The Bash only benchmark runs against SWE-bench Verified, not original SWE-bench. Verified is a manually curated subset of 500 samples &lt;a href="https://openai.com/index/introducing-swe-bench-verified/"&gt;described here&lt;/a&gt;, funded by OpenAI. Here's &lt;a href="https://huggingface.co/datasets/princeton-nlp/SWE-bench_Verified"&gt;SWE-bench Verified&lt;/a&gt; on Hugging Face - since it's just 2.1MB of Parquet it's easy to browse &lt;a href="https://lite.datasette.io/?parquet=https%3A%2F%2Fhuggingface.co%2Fdatasets%2Fprinceton-nlp%2FSWE-bench_Verified%2Fresolve%2Fmain%2Fdata%2Ftest-00000-of-00001.parquet#/data/test-00000-of-00001?_facet=repo"&gt;using Datasette Lite&lt;/a&gt;, which cuts those numbers down to django/django (231), sympy/sympy (75), sphinx-doc/sphinx (44), matplotlib/matplotlib (34), scikit-learn/scikit-learn (32), astropy/astropy (22), pydata/xarray (22), pytest-dev/pytest (19), pylint-dev/pylint (10), psf/requests (8), mwaskom/seaborn (2), pallets/flask (1)&lt;/em&gt;.&lt;/p&gt;
&lt;p&gt;Here's how the top ten models performed:&lt;/p&gt;
&lt;p&gt;&lt;img alt="Bar chart showing &amp;quot;% Resolved&amp;quot; by &amp;quot;Model&amp;quot;. Bars in descending order: Claude 4.5 Opus (high reasoning) 76.8%, Gemini 3 Flash (high reasoning) 75.8%, MiniMax M2.5 (high reasoning) 75.8%, Claude Opus 4.6 75.6%, GLM-5 (high reasoning) 72.8%, GPT-5.2 (high reasoning) 72.8%, Claude 4.5 Sonnet (high reasoning) 72.8%, Kimi K2.5 (high reasoning) 71.4%, DeepSeek V3.2 (high reasoning) 70.8%, Claude 4.5 Haiku (high reasoning) 70.0%, and a partially visible final bar at 66.6%." src="https://static.simonwillison.net/static/2026/swbench-feb-2026.jpg" /&gt;&lt;/p&gt;
&lt;p&gt;It's interesting to see Claude Opus 4.5 beat Opus 4.6, though only by about a percentage point. 4.5 Opus is top, then Gemini 3 Flash, then MiniMax M2.5 - a 229B model released &lt;a href="https://www.minimax.io/news/minimax-m25"&gt;last week&lt;/a&gt; by Chinese lab MiniMax. GLM-5, Kimi K2.5 and DeepSeek V3.2 are three more Chinese models that make the top ten as well.&lt;/p&gt;
&lt;p&gt;OpenAI's GPT-5.2 is their highest performing model at position 6, but it's worth noting that their best coding model, GPT-5.3-Codex, is not represented - maybe because it's not yet available in the OpenAI API.&lt;/p&gt;
&lt;p&gt;This benchmark uses the same system prompt for every model, which is important for a fair comparison but does mean that the quality of the different harnesses or optimized prompts is not being measured here.&lt;/p&gt;
&lt;p&gt;The chart above is a screenshot from the SWE-bench website, but their charts don't include the actual percentage values visible on the bars. I successfully used Claude for Chrome to add these - &lt;a href="https://claude.ai/share/81a0c519-c727-4caa-b0d4-0d866375d0da"&gt;transcript here&lt;/a&gt;. My prompt sequence included:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;Use claude in chrome to open https://www.swebench.com/&lt;/p&gt;
&lt;p&gt;Click on "Compare results" and then select "Select top 10"&lt;/p&gt;
&lt;p&gt;See those bar charts? I want them to display the percentage on each bar so I can take a better screenshot, modify the page like that&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;I'm impressed at how well this worked - Claude injected custom JavaScript into the page to draw additional labels on top of the existing chart.&lt;/p&gt;
&lt;p&gt;&lt;img alt="Screenshot of a Claude AI conversation showing browser automation. A thinking step reads &amp;quot;Pivoted strategy to avoid recursion issues with chart labeling &amp;gt;&amp;quot; followed by the message &amp;quot;Good, the chart is back. Now let me carefully add the labels using an inline plugin on the chart instance to avoid the recursion issue.&amp;quot; A collapsed &amp;quot;Browser_evaluate&amp;quot; section shows a browser_evaluate tool call with JavaScript code using Chart.js canvas context to draw percentage labels on bars: meta.data.forEach((bar, index) =&amp;gt; { const value = dataset.data[index]; if (value !== undefined &amp;amp;&amp;amp; value !== null) { ctx.save(); ctx.textAlign = 'center'; ctx.textBaseline = 'bottom'; ctx.fillStyle = '#333'; ctx.font = 'bold 12px sans-serif'; ctx.fillText(value.toFixed(1) + '%', bar.x, bar.y - 5); A pending step reads &amp;quot;Let me take a screenshot to see if it worked.&amp;quot; followed by a completed &amp;quot;Done&amp;quot; step, and the message &amp;quot;Let me take a screenshot to check the result.&amp;quot;" src="https://static.simonwillison.net/static/2026/claude-chrome-draw-on-chart.jpg" /&gt;&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Update&lt;/strong&gt;: If you look at the transcript Claude claims to have switched to Playwright, which is confusing because I didn't think I had that configured.

    &lt;p&gt;&lt;small&gt;&lt;/small&gt;Via &lt;a href="https://twitter.com/KLieret/status/2024176335782826336"&gt;@KLieret&lt;/a&gt;&lt;/small&gt;&lt;/p&gt;


    &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/browser-agents"&gt;browser-agents&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/anthropic"&gt;anthropic&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/claude"&gt;claude&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/openai"&gt;openai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/benchmarks"&gt;benchmarks&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/ai"&gt;ai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/ai-in-china"&gt;ai-in-china&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/llms"&gt;llms&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/minimax"&gt;minimax&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/coding-agents"&gt;coding-agents&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/generative-ai"&gt;generative-ai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/django"&gt;django&lt;/a&gt;&lt;/p&gt;



</summary><category term="browser-agents"/><category term="anthropic"/><category term="claude"/><category term="openai"/><category term="benchmarks"/><category term="ai"/><category term="ai-in-china"/><category term="llms"/><category term="minimax"/><category term="coding-agents"/><category term="generative-ai"/><category term="django"/></entry><entry><title>Three months of OpenClaw</title><link href="https://simonwillison.net/2026/Feb/15/openclaw/#atom-tag" rel="alternate"/><published>2026-02-15T17:23:28+00:00</published><updated>2026-02-15T17:23:28+00:00</updated><id>https://simonwillison.net/2026/Feb/15/openclaw/#atom-tag</id><summary type="html">
    &lt;p&gt;It's wild that the first commit to OpenClaw was &lt;a href="https://github.com/openclaw/openclaw/commit/f6dd362d39b8e30bd79ef7560aab9575712ccc11"&gt;on November 25th 2025&lt;/a&gt;, and less than three months later it's hit 10,000 commits from 600 contributors, attracted 196,000 GitHub stars and sort-of been featured in an extremely vague &lt;a href="https://www.youtube.com/watch?v=n7I-D4YXbzg"&gt;Super Bowl commercial for AI.com&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;Quoting AI.com founder &lt;a href="https://twitter.com/kris/status/2020663711015514399"&gt;Kris Marszalek&lt;/a&gt;, purchaser of the &lt;a href="https://www.theregister.com/2026/02/09/70m_aicom_domain_sale/"&gt;most expensive domain in history&lt;/a&gt; for $70m:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;ai.com is the world’s first easy-to-use and secure implementation of OpenClaw, the open source agent framework that went viral two weeks ago; we made it easy to use without any technical skills, while hardening security to keep your data safe.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;Looks like vaporware to me - all you can do right now is reserve a handle - but it's still remarkable to see an open source project get to &lt;em&gt;that&lt;/em&gt; level of hype in such a short space of time.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Update&lt;/strong&gt;: OpenClaw creator Peter Steinberger &lt;a href="https://steipete.me/posts/2026/openclaw"&gt;just announced&lt;/a&gt; that he's joining OpenAI and plans to transfer ownership of OpenClaw to a new independent foundation.&lt;/p&gt;

    &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/ai-agents"&gt;ai-agents&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/openclaw"&gt;openclaw&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/ai"&gt;ai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/open-source"&gt;open-source&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/domains"&gt;domains&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/openai"&gt;openai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/peter-steinberger"&gt;peter-steinberger&lt;/a&gt;&lt;/p&gt;



</summary><category term="ai-agents"/><category term="openclaw"/><category term="ai"/><category term="open-source"/><category term="domains"/><category term="openai"/><category term="peter-steinberger"/></entry><entry><title>The evolution of OpenAI's mission statement</title><link href="https://simonwillison.net/2026/Feb/13/openai-mission-statement/#atom-tag" rel="alternate"/><published>2026-02-13T23:38:29+00:00</published><updated>2026-02-13T23:38:29+00:00</updated><id>https://simonwillison.net/2026/Feb/13/openai-mission-statement/#atom-tag</id><summary type="html">
    &lt;p&gt;As a USA &lt;a href="https://en.wikipedia.org/wiki/501(c)(3)_organization"&gt;501(c)(3)&lt;/a&gt; the OpenAI non-profit has to file a tax return each year with the IRS. One of the required fields on that tax return is to "Briefly describe the organization’s mission or most significant activities" - this has actual legal weight to it as the IRS can use it to evaluate if the organization is sticking to its mission and deserves to maintain its non-profit tax-exempt status.&lt;/p&gt;
&lt;p&gt;You can browse OpenAI's &lt;a href="https://projects.propublica.org/nonprofits/organizations/810861541"&gt;tax filings by year&lt;/a&gt; on ProPublica's excellent &lt;a href="https://projects.propublica.org/nonprofits/"&gt;Nonprofit Explorer&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;I went through and extracted that mission statement for 2016 through 2024, then had Claude Code &lt;a href="https://gisthost.github.io/?7a569df89f43f390bccc2c5517718b49/index.html"&gt;help me&lt;/a&gt; fake the commit dates to turn it into a git repository and share that as a Gist - which means that Gist's &lt;a href="https://gist.github.com/simonw/e36f0e5ef4a86881d145083f759bcf25/revisions"&gt;revisions page&lt;/a&gt; shows every edit they've made since they started filing their taxes!&lt;/p&gt;
&lt;p&gt;It's really interesting seeing what they've changed over time.&lt;/p&gt;
&lt;p&gt;The original 2016 mission reads as follows (and yes, the apostrophe in "OpenAIs" is missing &lt;a href="https://projects.propublica.org/nonprofits/organizations/810861541/201703459349300445/full"&gt;in the original&lt;/a&gt;):&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;OpenAIs goal is to advance digital intelligence in the way that is most likely to benefit humanity as a whole, unconstrained by a need to generate financial return. We think that artificial intelligence technology will help shape the 21st century, and we want to help the world build safe AI technology and ensure that AI's benefits are as widely and evenly distributed as possible. Were trying to build AI as part of a larger community, and we want to openly share our plans and capabilities along the way.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;In 2018 they dropped the part about "trying to build AI as part of a larger community, and we want to openly share our plans and capabilities along the way."&lt;/p&gt;
&lt;p&gt;&lt;img src="https://static.simonwillison.net/static/2026/mission-3.jpg" alt="Git diff showing the 2018 revision deleting the final two sentences: &amp;quot;Were trying to build AI as part of a larger community, and we want to openly share our plans and capabilities along the way.&amp;quot;" style="max-width: 100%;" /&gt;&lt;/p&gt;
&lt;p&gt;In 2020 they dropped the words "as a whole" from "benefit humanity as a whole". They're still "unconstrained by a need to generate financial return" though.&lt;/p&gt;
&lt;p&gt;&lt;img src="https://static.simonwillison.net/static/2026/mission-5.jpg" alt="Git diff showing the 2020 revision dropping &amp;quot;as a whole&amp;quot; from &amp;quot;benefit humanity as a whole&amp;quot; and changing &amp;quot;We think&amp;quot; to &amp;quot;OpenAI believes&amp;quot;" style="max-width: 100%;" /&gt;&lt;/p&gt;
&lt;p&gt;Some interesting changes in 2021. They're still unconstrained by a need to generate financial return, but here we have the first reference to "general-purpose artificial intelligence" (replacing "digital intelligence"). They're more confident too: it's not "most likely to benefit humanity", it's just "benefits humanity".&lt;/p&gt;
&lt;p&gt;They previously wanted to "help the world build safe AI technology", but now they're going to do that themselves: "the companys goal is to develop and responsibly deploy safe AI technology".&lt;/p&gt;
&lt;p&gt;&lt;img src="https://static.simonwillison.net/static/2026/mission-6.jpg" alt="Git diff showing the 2021 revision replacing &amp;quot;goal is to advance digital intelligence&amp;quot; with &amp;quot;mission is to build general-purpose artificial intelligence&amp;quot;, changing &amp;quot;most likely to benefit&amp;quot; to just &amp;quot;benefits&amp;quot;, and replacing &amp;quot;help the world build safe AI technology&amp;quot; with &amp;quot;the companys goal is to develop and responsibly deploy safe AI technology&amp;quot;" style="max-width: 100%;" /&gt;&lt;/p&gt;
&lt;p&gt;2022 only changed one significant word: they added "safely" to "build ... (AI) that safely benefits humanity". They're still unconstrained by those financial returns!&lt;/p&gt;
&lt;p&gt;&lt;img src="https://static.simonwillison.net/static/2026/mission-7.jpg" alt="Git diff showing the 2022 revision adding &amp;quot;(AI)&amp;quot; and the word &amp;quot;safely&amp;quot; so it now reads &amp;quot;that safely benefits humanity&amp;quot;, and changing &amp;quot;the companys&amp;quot; to &amp;quot;our&amp;quot;" style="max-width: 100%;" /&gt;&lt;/p&gt;
&lt;p&gt;No changes in 2023... but then in 2024 they deleted almost the entire thing, reducing it to simply:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;OpenAIs mission is to ensure that artificial general intelligence benefits all of humanity.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;They've expanded "humanity" to "all of humanity", but there's no mention of safety any more and I guess they can finally start focusing on that need to generate financial returns!&lt;/p&gt;
&lt;p&gt;&lt;img src="https://static.simonwillison.net/static/2026/mission-9.jpg" alt="Git diff showing the 2024 revision deleting the entire multi-sentence mission statement and replacing it with just &amp;quot;OpenAIs mission is to ensure that artificial general intelligence benefits all of humanity.&amp;quot;" style="max-width: 100%;" /&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Update&lt;/strong&gt;: I found loosely equivalent but much less interesting documents &lt;a href="https://simonwillison.net/2026/Feb/13/anthropic-public-benefit-mission/"&gt;from Anthropic&lt;/a&gt;.&lt;/p&gt;
    
        &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/ai-ethics"&gt;ai-ethics&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/openai"&gt;openai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/propublica"&gt;propublica&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/ai"&gt;ai&lt;/a&gt;&lt;/p&gt;
    

</summary><category term="ai-ethics"/><category term="openai"/><category term="propublica"/><category term="ai"/></entry><entry><title>Introducing GPT‑5.3‑Codex‑Spark</title><link href="https://simonwillison.net/2026/Feb/12/codex-spark/#atom-tag" rel="alternate"/><published>2026-02-12T21:16:07+00:00</published><updated>2026-02-12T21:16:07+00:00</updated><id>https://simonwillison.net/2026/Feb/12/codex-spark/#atom-tag</id><summary type="html">
    
&lt;p&gt;&lt;strong&gt;&lt;a href="https://openai.com/index/introducing-gpt-5-3-codex-spark/"&gt;Introducing GPT‑5.3‑Codex‑Spark&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;
OpenAI announced a partnership with Cerebras &lt;a href="https://openai.com/index/cerebras-partnership/"&gt;on January 14th&lt;/a&gt;. Four weeks later they're already launching the first integration, "an ultra-fast model for real-time coding in Codex".&lt;/p&gt;
&lt;p&gt;Despite being named GPT-5.3-Codex-Spark it's not purely an accelerated alternative to GPT-5.3-Codex - the blog post calls it "a smaller version of GPT‑5.3-Codex" and clarifies that "at launch, Codex-Spark has a 128k context window and is text-only."&lt;/p&gt;
&lt;p&gt;I had some preview access to this model and I can confirm that it's significantly faster than their other models.&lt;/p&gt;
&lt;p&gt;Here's what that speed looks like running in Codex CLI:&lt;/p&gt;
&lt;div style="max-width: 100%;"&gt;
    &lt;video 
        controls 
        preload="none"
        poster="https://static.simonwillison.net/static/2026/gpt-5.3-codex-spark-medium-last.jpg"
        style="width: 100%; height: auto;"&gt;
        &lt;source src="https://static.simonwillison.net/static/2026/gpt-5.3-codex-spark-medium.mp4" type="video/mp4"&gt;
    &lt;/video&gt;
&lt;/div&gt;

&lt;p&gt;That was the "Generate an SVG of a pelican riding a bicycle" prompt - here's the rendered result:&lt;/p&gt;
&lt;p&gt;&lt;img alt="Whimsical flat illustration of an orange duck merged with a bicycle, where the duck's body forms the seat and frame area while its head extends forward over the handlebars, set against a simple light blue sky and green grass background." src="https://static.simonwillison.net/static/2026/gpt-5.3-codex-spark-pelican.png" /&gt;&lt;/p&gt;
&lt;p&gt;Compare that to the speed of regular GPT-5.3 Codex medium:&lt;/p&gt;
&lt;div style="max-width: 100%;"&gt;
    &lt;video 
        controls 
        preload="none"
        poster="https://static.simonwillison.net/static/2026/gpt-5.3-codex-medium-last.jpg"
        style="width: 100%; height: auto;"&gt;
        &lt;source src="https://static.simonwillison.net/static/2026/gpt-5.3-codex-medium.mp4" type="video/mp4"&gt;
    &lt;/video&gt;
&lt;/div&gt;

&lt;p&gt;Significantly slower, but the pelican is a lot better:&lt;/p&gt;
&lt;p&gt;&lt;img alt="Whimsical flat illustration of a white pelican riding a dark blue bicycle at speed, with motion lines behind it, its long orange beak streaming back in the wind, set against a light blue sky and green grass background." src="https://static.simonwillison.net/static/2026/gpt-5.3-codex-pelican.png" /&gt;&lt;/p&gt;
&lt;p&gt;What's interesting about this model isn't the quality though, it's the &lt;em&gt;speed&lt;/em&gt;. When a model responds this fast you can stay in flow state and iterate with the model much more productively.&lt;/p&gt;
&lt;p&gt;I showed a demo of Cerebras running Llama 3.1 70 B at 2,000 tokens/second against Val Town &lt;a href="https://simonwillison.net/2024/Oct/31/cerebras-coder/"&gt;back in October 2024&lt;/a&gt;. OpenAI claim 1,000 tokens/second for their new model, and I expect it will prove to be a ferociously useful partner for hands-on iterative coding sessions.&lt;/p&gt;
&lt;p&gt;It's not yet clear what the pricing will look like for this new model.


    &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/llm-performance"&gt;llm-performance&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/openai"&gt;openai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/cerebras"&gt;cerebras&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/pelican-riding-a-bicycle"&gt;pelican-riding-a-bicycle&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/ai"&gt;ai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/llms"&gt;llms&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/llm-release"&gt;llm-release&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/codex"&gt;codex&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/generative-ai"&gt;generative-ai&lt;/a&gt;&lt;/p&gt;



</summary><category term="llm-performance"/><category term="openai"/><category term="cerebras"/><category term="pelican-riding-a-bicycle"/><category term="ai"/><category term="llms"/><category term="llm-release"/><category term="codex"/><category term="generative-ai"/></entry><entry><title>Skills in OpenAI API</title><link href="https://simonwillison.net/2026/Feb/11/skills-in-openai-api/#atom-tag" rel="alternate"/><published>2026-02-11T19:19:22+00:00</published><updated>2026-02-11T19:19:22+00:00</updated><id>https://simonwillison.net/2026/Feb/11/skills-in-openai-api/#atom-tag</id><summary type="html">
    
&lt;p&gt;&lt;strong&gt;&lt;a href="https://developers.openai.com/cookbook/examples/skills_in_api"&gt;Skills in OpenAI API&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;
OpenAI's adoption of Skills continues to gain ground. You can now use Skills directly in the OpenAI API with their &lt;a href="https://developers.openai.com/api/docs/guides/tools-shell/"&gt;shell tool&lt;/a&gt;. You can zip skills up and upload them first, but I think an even neater interface is the ability to send skills with the JSON request as inline base64-encoded zip data, as seen &lt;a href="https://github.com/simonw/research/blob/main/openai-api-skills/openai_inline_skills.py"&gt;in this script&lt;/a&gt;:&lt;/p&gt;
&lt;pre&gt;&lt;span class="pl-s1"&gt;r&lt;/span&gt; &lt;span class="pl-c1"&gt;=&lt;/span&gt; &lt;span class="pl-en"&gt;OpenAI&lt;/span&gt;().&lt;span class="pl-c1"&gt;responses&lt;/span&gt;.&lt;span class="pl-c1"&gt;create&lt;/span&gt;(
    &lt;span class="pl-s1"&gt;model&lt;/span&gt;&lt;span class="pl-c1"&gt;=&lt;/span&gt;&lt;span class="pl-s"&gt;"gpt-5.2"&lt;/span&gt;,
    &lt;span class="pl-s1"&gt;tools&lt;/span&gt;&lt;span class="pl-c1"&gt;=&lt;/span&gt;[
      {
        &lt;span class="pl-s"&gt;"type"&lt;/span&gt;: &lt;span class="pl-s"&gt;"shell"&lt;/span&gt;,
        &lt;span class="pl-s"&gt;"environment"&lt;/span&gt;: {
          &lt;span class="pl-s"&gt;"type"&lt;/span&gt;: &lt;span class="pl-s"&gt;"container_auto"&lt;/span&gt;,
          &lt;span class="pl-s"&gt;"skills"&lt;/span&gt;: [
            {
              &lt;span class="pl-s"&gt;"type"&lt;/span&gt;: &lt;span class="pl-s"&gt;"inline"&lt;/span&gt;,
              &lt;span class="pl-s"&gt;"name"&lt;/span&gt;: &lt;span class="pl-s"&gt;"wc"&lt;/span&gt;,
              &lt;span class="pl-s"&gt;"description"&lt;/span&gt;: &lt;span class="pl-s"&gt;"Count words in a file."&lt;/span&gt;,
              &lt;span class="pl-s"&gt;"source"&lt;/span&gt;: {
                &lt;span class="pl-s"&gt;"type"&lt;/span&gt;: &lt;span class="pl-s"&gt;"base64"&lt;/span&gt;,
                &lt;span class="pl-s"&gt;"media_type"&lt;/span&gt;: &lt;span class="pl-s"&gt;"application/zip"&lt;/span&gt;,
                &lt;span class="pl-s"&gt;"data"&lt;/span&gt;: &lt;span class="pl-s1"&gt;b64_encoded_zip_file&lt;/span&gt;,
              },
            }
          ],
        },
      }
    ],
    &lt;span class="pl-s1"&gt;input&lt;/span&gt;&lt;span class="pl-c1"&gt;=&lt;/span&gt;&lt;span class="pl-s"&gt;"Use the wc skill to count words in its own SKILL.md file."&lt;/span&gt;,
)
&lt;span class="pl-en"&gt;print&lt;/span&gt;(&lt;span class="pl-s1"&gt;r&lt;/span&gt;.&lt;span class="pl-c1"&gt;output_text&lt;/span&gt;)&lt;/pre&gt;

&lt;p&gt;I built that example script after first having Claude Code for web use &lt;a href="https://simonwillison.net/2026/Feb/10/showboat-and-rodney/"&gt;Showboat&lt;/a&gt; to explore the API for me and create &lt;a href="https://github.com/simonw/research/blob/main/openai-api-skills/README.md"&gt;this report&lt;/a&gt;. My opening prompt for the research project was:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;&lt;code&gt;Run uvx showboat --help - you will use this tool later&lt;/code&gt;&lt;/p&gt;
&lt;p&gt;&lt;code&gt;Fetch https://developers.openai.com/cookbook/examples/skills_in_api.md to /tmp with curl, then read it&lt;/code&gt;&lt;/p&gt;
&lt;p&gt;&lt;code&gt;Use the OpenAI API key you have in your environment variables&lt;/code&gt;&lt;/p&gt;
&lt;p&gt;&lt;code&gt;Use showboat to build up a detailed demo of this, replaying the examples from the documents and then trying some experiments of your own&lt;/code&gt;&lt;/p&gt;
&lt;/blockquote&gt;


    &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/skills"&gt;skills&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/generative-ai"&gt;generative-ai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/openai"&gt;openai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/ai"&gt;ai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/llms"&gt;llms&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/ai-assisted-programming"&gt;ai-assisted-programming&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/showboat"&gt;showboat&lt;/a&gt;&lt;/p&gt;



</summary><category term="skills"/><category term="generative-ai"/><category term="openai"/><category term="ai"/><category term="llms"/><category term="ai-assisted-programming"/><category term="showboat"/></entry></feed>