<?xml version="1.0" encoding="utf-8"?>
<feed xml:lang="en-us" xmlns="http://www.w3.org/2005/Atom"><title>Simon Willison's Weblog: ai</title><link href="http://simonwillison.net/" rel="alternate"/><link href="http://simonwillison.net/tags/ai.atom" rel="self"/><id>http://simonwillison.net/</id><updated>2026-05-20T22:26:36+00:00</updated><author><name>Simon Willison</name></author><entry><title>Quoting SpaceX S-1</title><link href="https://simonwillison.net/2026/May/20/spacex-s1/#atom-tag" rel="alternate"/><published>2026-05-20T22:26:36+00:00</published><updated>2026-05-20T22:26:36+00:00</updated><id>https://simonwillison.net/2026/May/20/spacex-s1/#atom-tag</id><summary type="html">
    &lt;blockquote cite="https://www.sec.gov/Archives/edgar/data/1181412/000162828026036936/spaceexplorationtechnologi.htm"&gt;&lt;p&gt;We have the ability to use compute resources to support our proprietary AI applications (such as Grok 5, which is currently being trained at COLOSSUS II), while also providing access to select compute capacity to third-party customers. For example, in May 2026, we entered into &lt;strong&gt;Cloud Services Agreements with Anthropic PBC&lt;/strong&gt; (“Anthropic”), an AI research and development public benefit corporation, with respect to access to &lt;strong&gt;compute capacity across COLOSSUS and COLOSSUS II&lt;/strong&gt;. Pursuant to these agreements, the customer &lt;strong&gt;has agreed to pay us $1.25 billion per month&lt;/strong&gt; through May 2029, with capacity ramping in May and June 2026 at a reduced fee. The agreements may be terminated by either party upon 90 days’ notice.&lt;/p&gt;&lt;/blockquote&gt;
&lt;p class="cite"&gt;&amp;mdash; &lt;a href="https://www.sec.gov/Archives/edgar/data/1181412/000162828026036936/spaceexplorationtechnologi.htm"&gt;SpaceX S-1&lt;/a&gt;, highlights mine&lt;/p&gt;

    &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/anthropic"&gt;anthropic&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/grok"&gt;grok&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/generative-ai"&gt;generative-ai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/ai"&gt;ai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/llms"&gt;llms&lt;/a&gt;&lt;/p&gt;



</summary><category term="anthropic"/><category term="grok"/><category term="generative-ai"/><category term="ai"/><category term="llms"/></entry><entry><title>How fast is 10 tokens per second really?</title><link href="https://simonwillison.net/2026/May/20/tokens-per-second/#atom-tag" rel="alternate"/><published>2026-05-20T17:57:45+00:00</published><updated>2026-05-20T17:57:45+00:00</updated><id>https://simonwillison.net/2026/May/20/tokens-per-second/#atom-tag</id><summary type="html">
    
&lt;p&gt;&lt;strong&gt;&lt;a href="https://mikeveerman.github.io/tokenspeed/"&gt;How fast is 10 tokens per second really?&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;
Neat little HTML app by Mike Veerman (&lt;a href="https://github.com/MikeVeerman/tokenspeed/blob/master/index.html"&gt;source code here&lt;/a&gt;) which simulates LLM token output speeds from 5/second to 800/second.&lt;/p&gt;
&lt;p&gt;Useful if you see a model advertised as "30 tokens/second" and want to get a feel for what that actually looks like.

    &lt;p&gt;&lt;small&gt;&lt;/small&gt;Via &lt;a href="https://news.ycombinator.com/item?id=48174920"&gt;Hacker News&lt;/a&gt;&lt;/small&gt;&lt;/p&gt;


    &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/llms"&gt;llms&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/ai"&gt;ai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/generative-ai"&gt;generative-ai&lt;/a&gt;&lt;/p&gt;



</summary><category term="llms"/><category term="ai"/><category term="generative-ai"/></entry><entry><title>Google I/O, Gemini Spark, Antigravity</title><link href="https://simonwillison.net/2026/May/20/google-io/#atom-tag" rel="alternate"/><published>2026-05-20T15:32:17+00:00</published><updated>2026-05-20T15:32:17+00:00</updated><id>https://simonwillison.net/2026/May/20/google-io/#atom-tag</id><summary type="html">
    &lt;p&gt;It's hard to find much to write about Google I/O this year because I have a policy of not writing about anything that I can't try out myself, and a lot of the big announcements are "coming soon".&lt;/p&gt;
&lt;p&gt;I actually prefer to write about things that are in general availability, because I've had instances in the past where the previews didn't match what was released to the general public later on.&lt;/p&gt;
&lt;p&gt;Aside from &lt;a href="https://simonwillison.net/2026/May/19/gemini-35-flash/"&gt;Gemini 3.5 Flash&lt;/a&gt; the most interesting announcement looks to be Google's upcoming OpenClaw competitor &lt;a href="https://gemini.google/overview/agent/spark/"&gt;Gemini Spark&lt;/a&gt;, described as "your personal AI agent" which can "connect natively with your favorite Google apps like Gmail, Calendar, Drive, Docs, Sheets, Slides, YouTube, and Google Maps". The FAQ for that also includes this confusing detail:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;What Gemini model does Gemini Spark run on?&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;Gemini Spark runs on Gemini 3.5 Flash and Antigravity.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;The &lt;a href="https://antigravity.google/"&gt;antigravity.google&lt;/a&gt; website currently lists Antigravity as a desktop app, a CLI agent tool (written in Go), the &lt;a href="https://github.com/google-antigravity/antigravity-sdk-python"&gt;Antigravity SDK&lt;/a&gt; (an open source Python wrapper around a bundled closed source Go binary), and the original Antigravity IDE (a VS Code fork).&lt;/p&gt;
&lt;p&gt;I guess Gemini Spark, the user-facing hosted agent product, might be running on that Go binary, but I'm not sure why that's worth mentioning in the FAQ!&lt;/p&gt;
&lt;p&gt;Naturally I went looking for notes on how Gemini Spark intends to handle the risk of prompt injection. The best information I could find on that was in the &lt;a href="https://cloud.google.com/blog/products/ai-machine-learning/innovations-from-google-io-26-on-google-cloud"&gt;Everything Google Cloud customers need to know coming out of Google I/O&lt;/a&gt; post aimed at enterprise customers, which includes:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;Spark operates in a fully managed, secure runtime on Google Cloud, meaning you get enterprise-grade security without ever having to manage the underlying infrastructure. Every task executes in a fresh, strictly isolated, ephemeral VM to help ensure data never overlaps between sessions. To protect your enterprise, all traffic routes through our secure Agent Gateway that enforces Data Loss Prevention (DLP) policies, while user credentials remain fully encrypted and are never exposed directly to the agent.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;Given how many people are going to be piping &lt;em&gt;very&lt;/em&gt; sensitive data through Gemini Spark in the near future I hope they've made this bullet-proof, or this could be a top candidate for the agent security &lt;a href="https://simonwillison.net/2026/Jan/8/llm-predictions-for-2026/#1-year-a-challenger-disaster-for-coding-agent-security"&gt;challenger disaster&lt;/a&gt; that we still haven't seen.&lt;/p&gt;
&lt;p&gt;Also of note: in &lt;a href="https://developers.googleblog.com/an-important-update-transitioning-gemini-cli-to-antigravity-cli/"&gt;Transitioning Gemini CLI to Antigravity CLI&lt;/a&gt; Google announce that the &lt;a href="https://github.com/google-gemini/gemini-cli"&gt;open source Gemini CLI&lt;/a&gt; tool (Apache 2.0 licensed TypeScript) will stop working with their AI subscription plans on June 18th, replaced by the new closed source &lt;a href="https://github.com/google-antigravity/antigravity-cli"&gt;Antigravity CLI&lt;/a&gt;.&lt;/p&gt;

    &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/gemini"&gt;gemini&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/google"&gt;google&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/generative-ai"&gt;generative-ai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/ai"&gt;ai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/google-io"&gt;google-io&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/llms"&gt;llms&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/prompt-injection"&gt;prompt-injection&lt;/a&gt;&lt;/p&gt;



</summary><category term="gemini"/><category term="google"/><category term="generative-ai"/><category term="ai"/><category term="google-io"/><category term="llms"/><category term="prompt-injection"/></entry><entry><title>Gemini 3.5 Flash: more expensive, but Google plan to use it for everything</title><link href="https://simonwillison.net/2026/May/19/gemini-35-flash/#atom-tag" rel="alternate"/><published>2026-05-19T22:40:25+00:00</published><updated>2026-05-19T22:40:25+00:00</updated><id>https://simonwillison.net/2026/May/19/gemini-35-flash/#atom-tag</id><summary type="html">
    &lt;p&gt;Today at Google I/O, Google &lt;a href="https://blog.google/innovation-and-ai/models-and-research/gemini-models/gemini-3-5/"&gt;released Gemini 3.5 Flash&lt;/a&gt;. This one skipped the &lt;code&gt;-preview&lt;/code&gt; modifier and went straight to general availability, and Google appear to be using it for a whole lot of their key products:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;3.5 Flash is available today to billions of people globally:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;For everyone via the Gemini app and AI Mode in &lt;a href="https://blog.google/products-and-platforms/products/search/search-io-2026"&gt;Google Search&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;For developers in our agent-first development platform Google Antigravity and Gemini API in Google AI Studio and Android Studio&lt;/li&gt;
&lt;li&gt;For enterprises in Gemini Enterprise Agent Platform and Gemini Enterprise.&lt;/li&gt;
&lt;/ul&gt;
&lt;/blockquote&gt;
&lt;p&gt;As usual with Gemini, the most interesting details are tucked away in the &lt;a href="https://ai.google.dev/gemini-api/docs/whats-new-gemini-3.5"&gt;What's new in Gemini 3.5 Flash&lt;/a&gt; developer documentation. It mostly has the same set of platform features as the previous Gemini 3.x series, albeit with no &lt;a href="https://ai.google.dev/gemini-api/docs/computer-use"&gt;computer use&lt;/a&gt;. The model ID is &lt;code&gt;gemini-3.5-flash&lt;/code&gt;. The knowledge cut-off is January 2025, and it supports 1,048,576 input tokens and 65,536 maximum output tokens.&lt;/p&gt;
&lt;p&gt;Google are also pushing a new &lt;a href="https://ai.google.dev/gemini-api/docs/interactions"&gt;Interactions API&lt;/a&gt;, currently in beta, which looks to me like their version of the patterns introduced by &lt;a href="https://developers.openai.com/api/reference/responses/overview"&gt;OpenAI Responses&lt;/a&gt; - in particular server-side history management.&lt;/p&gt;
&lt;h4 id="the-price-has-gone-up"&gt;The price has gone up&lt;/h4&gt;
&lt;p&gt;Gemini 3.5 Flash is accompanied by a notable price bump. The previous models in the "Flash" family were &lt;a href="https://ai.google.dev/gemini-api/docs/models/gemini-3-flash-preview"&gt;Gemini 3 Flash Preview&lt;/a&gt; and &lt;a href="https://ai.google.dev/gemini-api/docs/models/gemini-3.1-flash-lite"&gt;Gemini 3.1 Flash-Lite&lt;/a&gt;. The new 3.5 Flash is 3x the price of 3 Flash Preview and 6x the price of 3.1 Flash-Lite (see &lt;a href="https://www.llm-prices.com/#sel=gemini-3-flash-preview%2Cgemini-3.5-flash%2Cgemini-3.1-flash-lite-preview"&gt;price comparison here&lt;/a&gt;).&lt;/p&gt;
&lt;p&gt;At $1.50/million input and $9/million output it's getting close in price to Google's Gemini 3.1 Pro, which is $2 and $12.&lt;/p&gt;
&lt;p&gt;The Gemini team promise that 3.5 Pro will roll out "next month" - presumably at an even higher price.&lt;/p&gt;
&lt;p&gt;This fits a trend: OpenAI's GPT-5.5 was 2x the price of GPT-5.4, and Claude Opus 4.7 is around 1.46x the price of 4.6 when you take the &lt;a href="https://simonwillison.net/2026/Apr/20/claude-token-counts/"&gt;new tokenizer into account&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;Given the price increase it's interesting to see Google roll it out for so many of their own free-to-consumer products. It feels like all three of the major AI labs are starting to probe the price tolerance of their API customers.&lt;/p&gt;
&lt;p&gt;Artificial Analysis publish the cost to run their proprietary benchmark against models, which is a useful way to take things like tokenization and increased volume of reasoning tokens into account. Some numbers worth comparing:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;a href="https://artificialanalysis.ai/models/gemini-3-5-flash"&gt;Gemini 3.5 Flash (high)&lt;/a&gt;: $1,551.60&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://artificialanalysis.ai/models/gemini-3-1-pro-preview"&gt;Gemini 3.1 Pro Preview&lt;/a&gt;: $892.28&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://artificialanalysis.ai/models/gemini-3-flash-reasoning"&gt;Gemini 3 Flash Preview (Reasoning)&lt;/a&gt;: $278.26&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://artificialanalysis.ai/models/gemini-3-1-flash-lite-preview"&gt;Gemini 3.1 Flash-Lite Preview&lt;/a&gt;: $93.60&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Running the benchmark for 3.5 Flash (high) cost significantly more than 3.1 Pro Preview!&lt;/p&gt;
&lt;p&gt;Here are some numbers from other vendors:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;a href="https://artificialanalysis.ai/models/claude-opus-4-7"&gt;Claude Opus 4.7 (Adaptive Reasoning, Max Effort)&lt;/a&gt;: $5,117.14&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://artificialanalysis.ai/models/claude-opus-4-7-non-reasoning"&gt;Claude Opus 4.7 (Non-reasoning, High Effort)&lt;/a&gt;: $1,217.23&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://artificialanalysis.ai/models/gpt-5-5"&gt;GPT-5.5 (xhigh)&lt;/a&gt;: $3,357.00&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://artificialanalysis.ai/models/gpt-5-5-medium"&gt;GPT-5.5 (medium)&lt;/a&gt;: $1,199.14&lt;/li&gt;
&lt;/ul&gt;
&lt;h4 id="a-pelican-on-a-bicycle"&gt;A pelican on a bicycle&lt;/h4&gt;
&lt;p&gt;I ran "Generate an SVG of a pelican riding a bicycle" &lt;a href="https://gist.github.com/simonw/09cc5a5545d7e75b33b75ffa92a34601"&gt;against the Gemini API&lt;/a&gt; and got back this pelican, which is a &lt;em&gt;lot&lt;/em&gt;:&lt;/p&gt;
&lt;p&gt;&lt;img src="https://static.simonwillison.net/static/2026/gemini-3.5-flash.png" alt="Black background, bats in the sky against a stylized moon. Pelican is funky looking. Very good beak. Bicycle frame is a bit twisted, and the bar from pedals to back wheel is missing. Bike lamp illuminates the road in front. Quite stylish." style="max-width: 100%;" /&gt;&lt;/p&gt;
&lt;p&gt;From the code comments: &lt;code&gt;&amp;lt;!-- Pelican Eye / Sunglasses (Cool Retro Aviators) --&amp;gt;&lt;/code&gt;&lt;/p&gt;
&lt;p&gt;&lt;a href="https://news.ycombinator.com/item?id=48196570#48198275"&gt;hedgehog on Hacker News&lt;/a&gt;:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;That pelican looks like it's in Miami for a crypto conference.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;That one cost me 11 input tokens and 14,403 output tokens, for a total cost of &lt;a href="https://www.llm-prices.com/#it=11&amp;amp;ot=14403&amp;amp;sel=gemini-3.5-flash"&gt;just under 13 cents&lt;/a&gt;.&lt;/p&gt;
    
        &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/gemini"&gt;gemini&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/pelican-riding-a-bicycle"&gt;pelican-riding-a-bicycle&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/llm-pricing"&gt;llm-pricing&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/ai"&gt;ai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/llms"&gt;llms&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/llm-release"&gt;llm-release&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/google"&gt;google&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/generative-ai"&gt;generative-ai&lt;/a&gt;&lt;/p&gt;
    

</summary><category term="gemini"/><category term="pelican-riding-a-bicycle"/><category term="llm-pricing"/><category term="ai"/><category term="llms"/><category term="llm-release"/><category term="google"/><category term="generative-ai"/></entry><entry><title>The last six months in LLMs in five minutes</title><link href="https://simonwillison.net/2026/May/19/5-minute-llms/#atom-tag" rel="alternate"/><published>2026-05-19T01:09:44+00:00</published><updated>2026-05-19T01:09:44+00:00</updated><id>https://simonwillison.net/2026/May/19/5-minute-llms/#atom-tag</id><summary type="html">
    &lt;p&gt;I put together these annotated slides from my five minute lightning talk at PyCon US 2026, using the &lt;a href="https://tools.simonwillison.net/annotated-presentations"&gt;latest iteration&lt;/a&gt; of my &lt;a href="https://simonwillison.net/2023/Aug/6/annotated-presentations/"&gt;annotated presentation tool&lt;/a&gt;.&lt;/p&gt;

&lt;div class="slide" id="5-minutes-llms.001.jpeg"&gt;
  &lt;img loading="lazy" src="https://static.simonwillison.net/static/2026/5-minutes-llms/5-minutes-llms.001.jpeg" alt="The last six months in LLMs in
five minutes

Simon Willison - simonwillison.net

PyCon US 2026 Lightning Talk
" style="max-width: 100%" /&gt;
  &lt;div&gt;&lt;a style="float: right; text-decoration: none; border-bottom: none; padding-left: 1em;" href="https://simonwillison.net/2026/May/19/5-minute-llms/#5-minutes-llms.001.jpeg"&gt;#&lt;/a&gt;
  &lt;p&gt;I presented this lightning talk at PyCon US 2026, attempting to summarize the last six months of developments in LLMs in five minutes.&lt;/p&gt;
  &lt;/div&gt;
&lt;/div&gt;
&lt;div class="slide" id="5-minutes-llms.002.jpeg"&gt;
  &lt;img loading="lazy" src="https://static.simonwillison.net/static/2026/5-minutes-llms/5-minutes-llms.002.jpeg" alt="The November inflection point
" style="max-width: 100%" /&gt;
  &lt;div&gt;&lt;a style="float: right; text-decoration: none; border-bottom: none; padding-left: 1em;" href="https://simonwillison.net/2026/May/19/5-minute-llms/#5-minutes-llms.002.jpeg"&gt;#&lt;/a&gt;
  &lt;p&gt;Six months is a pretty convenient time period to cover, because it captures what I've been calling the &lt;a href="https://simonwillison.net/tags/november-2025-inflection/"&gt;November 2025 inflection point&lt;/a&gt;. November was a critical month in LLMs, especially for coding.&lt;/p&gt;
  &lt;/div&gt;
&lt;/div&gt;
&lt;div class="slide" id="5-minutes-llms.003.jpeg"&gt;
  &lt;img loading="lazy" src="https://static.simonwillison.net/static/2026/5-minutes-llms/5-minutes-llms.003.jpeg" alt="The “best” model changed hands 5 times
between Anthropic, OpenAl and Google
" style="max-width: 100%" /&gt;
  &lt;div&gt;&lt;a style="float: right; text-decoration: none; border-bottom: none; padding-left: 1em;" href="https://simonwillison.net/2026/May/19/5-minute-llms/#5-minutes-llms.003.jpeg"&gt;#&lt;/a&gt;
  &lt;p&gt;For one thing, the supposedly "best" model (depending mostly on vibes) changed hands five times between the three big providers.&lt;/p&gt;
  &lt;/div&gt;
&lt;/div&gt;
&lt;div class="slide" id="5-minutes-llms.004.jpeg"&gt;
  &lt;img loading="lazy" src="https://static.simonwillison.net/static/2026/5-minutes-llms/5-minutes-llms.004.jpeg" alt="Generate an SVG of a
pelican riding a bicycle
" style="max-width: 100%" /&gt;
  &lt;div&gt;&lt;a style="float: right; text-decoration: none; border-bottom: none; padding-left: 1em;" href="https://simonwillison.net/2026/May/19/5-minute-llms/#5-minutes-llms.004.jpeg"&gt;#&lt;/a&gt;
  &lt;p&gt;As always, I'm using my &lt;a href="https://simonwillison.net/tags/pelican-riding-a-bicycle/"&gt;Generate an SVG of a pelican riding a bicycle&lt;/a&gt; test to help illustrate the differences between the models.&lt;/p&gt;
&lt;p&gt;Why this test? Because pelicans are hard to draw, bicycles are hard to draw, pelicans &lt;em&gt;can't ride bicycles&lt;/em&gt;... and there's zero chance any AI lab would train a model for such a ridiculous task.&lt;/p&gt;
  &lt;/div&gt;
&lt;/div&gt;
&lt;div class="slide" id="5-minutes-llms.005.jpeg"&gt;
  &lt;img loading="lazy" src="https://static.simonwillison.net/static/2026/5-minutes-llms/5-minutes-llms.005.jpeg" alt="Five pelicans, one for each of the following models. Varying qualities!" style="max-width: 100%" /&gt;
  &lt;div&gt;&lt;a style="float: right; text-decoration: none; border-bottom: none; padding-left: 1em;" href="https://simonwillison.net/2026/May/19/5-minute-llms/#5-minutes-llms.005.jpeg"&gt;#&lt;/a&gt;
  &lt;p&gt;At the start of November the widely acknowledged "best" model was Claude Sonnet 4.5, released on &lt;a href="https://simonwillison.net/2025/Sep/29/claude-sonnet-4-5/"&gt;29th September&lt;/a&gt;. It drew me this pelican.&lt;/p&gt;
&lt;p&gt;In November it was overtaken by &lt;a href="https://simonwillison.net/2025/Nov/13/gpt-51/"&gt;GPT-5.1&lt;/a&gt;, then &lt;a href="https://simonwillison.net/2025/Nov/18/gemini-3/"&gt;Gemini 3&lt;/a&gt;, then &lt;a href="https://simonwillison.net/2025/Nov/19/gpt-51-codex-max/"&gt;GPT-5.1 Codex Max&lt;/a&gt;, and then Anthropic took the crown back again with &lt;a href="https://simonwillison.net/2025/Nov/24/claude-opus/"&gt;Claude Opus 4.5&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;I think Gemini 3 drew the best pelican out of this lot, but pelicans aren't everything. Most practitioners will agree that Opus 4.5 held the crown for the next couple of months.&lt;/p&gt;
  &lt;/div&gt;
&lt;/div&gt;
&lt;div class="slide" id="5-minutes-llms.006.jpeg"&gt;
  &lt;img loading="lazy" src="https://static.simonwillison.net/static/2026/5-minutes-llms/5-minutes-llms.006.jpeg" alt="The coding agents got good
" style="max-width: 100%" /&gt;
  &lt;div&gt;&lt;a style="float: right; text-decoration: none; border-bottom: none; padding-left: 1em;" href="https://simonwillison.net/2026/May/19/5-minute-llms/#5-minutes-llms.006.jpeg"&gt;#&lt;/a&gt;
  &lt;p&gt;It took a little while for this to become clear, but the real news from November was that the coding agents got &lt;em&gt;good&lt;/em&gt;.&lt;/p&gt;
&lt;p&gt;OpenAI and Anthropic had spent most of 2025 running &lt;a href="https://simonwillison.net/2025/Dec/19/andrej-karpathy/"&gt;Reinforcement Learning from Verifiable Rewards&lt;/a&gt; to increase the quality of code written by their models, especially when paired up with their Codex and Claude Code agent harnesses.&lt;/p&gt;
&lt;p&gt;In November the results of this work became apparent. Coding agents went from often-work to mostly-work, crossing a quality barrier where you could use them as a daily-driver to get real work done, without needing to spend most of your time fixing their stupid mistakes.&lt;/p&gt;
  &lt;/div&gt;
&lt;/div&gt;
&lt;div class="slide" id="5-minutes-llms.007.jpeg"&gt;
  &lt;img loading="lazy" src="https://static.simonwillison.net/static/2026/5-minutes-llms/5-minutes-llms.007.jpeg" alt="Screenshot of &amp;quot;Initial commit&amp;quot; on GitHub to steipete/Warelay, commit f6dd362, steipete authored on Nov 24, 2025

It&amp;#39;s a copy of the MIT license" style="max-width: 100%" /&gt;
  &lt;div&gt;&lt;a style="float: right; text-decoration: none; border-bottom: none; padding-left: 1em;" href="https://simonwillison.net/2026/May/19/5-minute-llms/#5-minutes-llms.007.jpeg"&gt;#&lt;/a&gt;
  &lt;p&gt;Also in November, this happened - the first commit to an obscure (back then) repo called "Warelay" by some guy called Pete.&lt;/p&gt;
  &lt;/div&gt;
&lt;/div&gt;
&lt;div class="slide" id="5-minutes-llms.008.jpeg"&gt;
  &lt;img loading="lazy" src="https://static.simonwillison.net/static/2026/5-minutes-llms/5-minutes-llms.008.jpeg" alt="December/January
(A little bit of LLM psychosis)
" style="max-width: 100%" /&gt;
  &lt;div&gt;&lt;a style="float: right; text-decoration: none; border-bottom: none; padding-left: 1em;" href="https://simonwillison.net/2026/May/19/5-minute-llms/#5-minutes-llms.008.jpeg"&gt;#&lt;/a&gt;
  &lt;p&gt;Over the holiday period, from December to January, a whole lot of us took advantage of the break to have a poke at these new models and coding agents and see what they could do.&lt;/p&gt;
&lt;p&gt;They could do a lot! Some of us got a little bit over-excited. I had my own short-lived bout of a form of LLM psychosis as I started spinning up wildly ambitious projects to see how far I could push them.&lt;/p&gt;
  &lt;/div&gt;
&lt;/div&gt;
&lt;div class="slide" id="5-minutes-llms.009.jpeg"&gt;
  &lt;img loading="lazy" src="https://static.simonwillison.net/static/2026/5-minutes-llms/5-minutes-llms.009.jpeg" alt="micro-javascript playground
Execute JavaScript code in a sandboxed micro-javascript environment powered by Pyodide

var numbers = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10];
var doubled = numbers.map(n =&amp;gt; n * 2);
console.log(&amp;#39;Doubled: &amp;quot;&amp;#39;, doubled);
var evens = numbers.filter(n =&amp;gt; n % 2 === 0);
console.log(&amp;#39;Evens: &amp;#39;, evens);
var sum = numbers.reduce((a, b) =&amp;gt; a + b, @);
console.log(&amp;#39;Sum:&amp;quot;, sum);

Output 27
Doubled: [2, 4, 6, 8, 10, 12, 14, 16, 18, 20]
Evens: [2, 4, 6, 8, 10]
Sum: 55
Execution time: 8.00ms
About: micro-javascript is a pure Python JavaScript interpreter with configurable memory and time limits. This playground runs entirely in your browser using
Pyodide (Python compiled to WebAssembly). View on GitHub" style="max-width: 100%" /&gt;
  &lt;div&gt;&lt;a style="float: right; text-decoration: none; border-bottom: none; padding-left: 1em;" href="https://simonwillison.net/2026/May/19/5-minute-llms/#5-minutes-llms.009.jpeg"&gt;#&lt;/a&gt;
  &lt;p&gt;One of my projects was a vibe-coded implementation of JavaScript in Python - a loose port of &lt;a href="https://github.com/bellard/mquickjs"&gt;MicroQuickJS&lt;/a&gt; - which I called &lt;a href="https://github.com/simonw/micro-javascript"&gt;micro-javascript&lt;/a&gt;. You can try it out in your browser in &lt;a href="https://simonw.github.io/micro-javascript/playground.html"&gt;this playground&lt;/a&gt;.&lt;/p&gt;
  &lt;/div&gt;
&lt;/div&gt;
&lt;div class="slide" id="5-minutes-llms.010.jpeg"&gt;
  &lt;img loading="lazy" src="https://static.simonwillison.net/static/2026/5-minutes-llms/5-minutes-llms.010.jpeg" alt="JavaScript running in Python running in Pyodide running in WebAssembly running in JavaScript" style="max-width: 100%" /&gt;
  &lt;div&gt;&lt;a style="float: right; text-decoration: none; border-bottom: none; padding-left: 1em;" href="https://simonwillison.net/2026/May/19/5-minute-llms/#5-minutes-llms.010.jpeg"&gt;#&lt;/a&gt;
  &lt;p&gt;That playground demo shows JavaScript code run using my micro-javascript library, in Python, running inside Pyodide, running in WebAssembly, running in JavaScript, running in a browser!&lt;/p&gt;
&lt;p&gt;It's pretty cool! But did anyone out there &lt;em&gt;need&lt;/em&gt; a buggy, slow, insecure half-baked implementation of JavaScript in Python?&lt;/p&gt;
&lt;p&gt;They did not. I have quite a few other projects from that holiday period that I have since quietly retired!&lt;/p&gt;
  &lt;/div&gt;
&lt;/div&gt;
&lt;div class="slide" id="5-minutes-llms.011.jpeg"&gt;
  &lt;img loading="lazy" src="https://static.simonwillison.net/static/2026/5-minutes-llms/5-minutes-llms.011.jpeg" alt="February 2026
" style="max-width: 100%" /&gt;
  &lt;div&gt;&lt;a style="float: right; text-decoration: none; border-bottom: none; padding-left: 1em;" href="https://simonwillison.net/2026/May/19/5-minute-llms/#5-minutes-llms.011.jpeg"&gt;#&lt;/a&gt;
  &lt;p&gt;On to February. Remember that Warelay project that had its first commit at the end of November?&lt;/p&gt;
  &lt;/div&gt;
&lt;/div&gt;
&lt;div class="slide" id="5-minutes-llms.012.jpeg"&gt;
  &lt;img loading="lazy" src="https://static.simonwillison.net/static/2026/5-minutes-llms/5-minutes-llms.012.jpeg" alt="Warelay → CLAWDIS → CLAWDBOT →
Clawdbot → Moltbot →🦞 OpenClaw" style="max-width: 100%" /&gt;
  &lt;div&gt;&lt;a style="float: right; text-decoration: none; border-bottom: none; padding-left: 1em;" href="https://simonwillison.net/2026/May/19/5-minute-llms/#5-minutes-llms.012.jpeg"&gt;#&lt;/a&gt;
  &lt;p&gt;In December and January it had gone through &lt;a href="https://simonwillison.net/2026/May/16/openclaw-names/"&gt;quite a few name changes&lt;/a&gt;... and by February it was taking the world by storm under its final name, &lt;a href="https://openclaw.ai/"&gt;OpenClaw&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;The amount of attention it got is pretty astonishing for a project that was less than three months old.&lt;/p&gt;
  &lt;/div&gt;
&lt;/div&gt;
&lt;div class="slide" id="5-minutes-llms.013.jpeg"&gt;
  &lt;img loading="lazy" src="https://static.simonwillison.net/static/2026/5-minutes-llms/5-minutes-llms.013.jpeg" alt="Generic term: Claw
" style="max-width: 100%" /&gt;
  &lt;div&gt;&lt;a style="float: right; text-decoration: none; border-bottom: none; padding-left: 1em;" href="https://simonwillison.net/2026/May/19/5-minute-llms/#5-minutes-llms.013.jpeg"&gt;#&lt;/a&gt;
  &lt;p&gt;OpenClaw is a "personal AI assistant", and we actually got a generic term for these, based on NanoClaw and ZeroClaw and suchlike... they're called &lt;strong&gt;Claws&lt;/strong&gt;.&lt;/p&gt;
  &lt;/div&gt;
&lt;/div&gt;
&lt;div class="slide" id="5-minutes-llms.014.jpeg"&gt;
  &lt;img loading="lazy" src="https://static.simonwillison.net/static/2026/5-minutes-llms/5-minutes-llms.014.jpeg" alt="An aquarium for your Claw
" style="max-width: 100%" /&gt;
  &lt;div&gt;&lt;a style="float: right; text-decoration: none; border-bottom: none; padding-left: 1em;" href="https://simonwillison.net/2026/May/19/5-minute-llms/#5-minutes-llms.014.jpeg"&gt;#&lt;/a&gt;
  &lt;p&gt;Mac Minis started to sell out around Silicon Valley, because people were buying them to run their Claws.&lt;/p&gt;
&lt;p&gt;&lt;a href="https://www.dbreunig.com/"&gt;Drew Breunig&lt;/a&gt; joked to me that this is because they're the new digital pets, and a Mac Mini is the perfect aquarium for your Claw.&lt;/p&gt;
  &lt;/div&gt;
&lt;/div&gt;
&lt;div class="slide" id="5-minutes-llms.015.jpeg"&gt;
  &lt;img loading="lazy" src="https://static.simonwillison.net/static/2026/5-minutes-llms/5-minutes-llms.015.jpeg" alt="Alfred Molina&amp;#39;s Doc Ock in Spider-Man 2, tearing apart a New York subway train with his four claws." style="max-width: 100%" /&gt;
  &lt;div&gt;&lt;a style="float: right; text-decoration: none; border-bottom: none; padding-left: 1em;" href="https://simonwillison.net/2026/May/19/5-minute-llms/#5-minutes-llms.015.jpeg"&gt;#&lt;/a&gt;
  &lt;p&gt;My favourite metaphor for Claws is Alfred Molina's Doc Ock in the 2004 movie Spider-Man 2. His claws were powered by AI, and were perfectly safe provided nothing damaged his inhibitor chip... after which they turned evil and took over.&lt;/p&gt;
  &lt;/div&gt;
&lt;/div&gt;
&lt;div class="slide" id="5-minutes-llms.016.jpeg"&gt;
  &lt;img loading="lazy" src="https://static.simonwillison.net/static/2026/5-minutes-llms/5-minutes-llms.016.jpeg" alt="Gemini 3.1 Pro

A really good illustration of a pelican riding a bicycle.
" style="max-width: 100%" /&gt;
  &lt;div&gt;&lt;a style="float: right; text-decoration: none; border-bottom: none; padding-left: 1em;" href="https://simonwillison.net/2026/May/19/5-minute-llms/#5-minutes-llms.016.jpeg"&gt;#&lt;/a&gt;
  &lt;p&gt;Also in February: Gemini 3.1 Pro came out, and drew me a &lt;em&gt;really good pelican riding a bicycle&lt;/em&gt;. Look at this! It's even got a fish in its basket.&lt;/p&gt;
  &lt;/div&gt;
&lt;/div&gt;
&lt;div class="slide" id="5-minutes-llms.017.jpeg"&gt;
  &lt;img loading="lazy" src="https://static.simonwillison.net/static/2026/5-minutes-llms/5-minutes-llms.017.jpeg" alt="Gemini 3 Pro pelican contrasted with Gemini 3.1 Pro, as animated SVGs" style="max-width: 100%" /&gt;
  &lt;div&gt;&lt;a style="float: right; text-decoration: none; border-bottom: none; padding-left: 1em;" href="https://simonwillison.net/2026/May/19/5-minute-llms/#5-minutes-llms.017.jpeg"&gt;#&lt;/a&gt;
  &lt;p&gt;And then Google's Jeff Dean &lt;a href="https://simonwillison.net/2026/Feb/19/gemini-31-pro/#jeff-dean"&gt;tweeted this video&lt;/a&gt; of an animated pelican riding a bicycle, plus a frog on a penny-farthing and a giraffe driving a tiny car and an ostrich on roller skates and a turtle kickflipping a skateboard and a dachshund driving a stretch limousine.&lt;/p&gt;
&lt;p&gt;So maybe the AI labs have been paying attention after all!&lt;/p&gt;
  &lt;/div&gt;
&lt;/div&gt;
&lt;div class="slide" id="5-minutes-llms.018.jpeg"&gt;
  &lt;img loading="lazy" src="https://static.simonwillison.net/static/2026/5-minutes-llms/5-minutes-llms.018.jpeg" alt="April 2026
" style="max-width: 100%" /&gt;
  &lt;div&gt;&lt;a style="float: right; text-decoration: none; border-bottom: none; padding-left: 1em;" href="https://simonwillison.net/2026/May/19/5-minute-llms/#5-minutes-llms.018.jpeg"&gt;#&lt;/a&gt;
  &lt;p&gt;A lot of stuff happened just in the past month.&lt;/p&gt;
  &lt;/div&gt;
&lt;/div&gt;
&lt;div class="slide" id="5-minutes-llms.019.jpeg"&gt;
  &lt;img loading="lazy" src="https://static.simonwillison.net/static/2026/5-minutes-llms/5-minutes-llms.019.jpeg" alt="Gemma 4 26B-A4B (17.99GB)

A pretty decent pelican riding a bicycle, though the bike is a bit mis-shapen." style="max-width: 100%" /&gt;
  &lt;div&gt;&lt;a style="float: right; text-decoration: none; border-bottom: none; padding-left: 1em;" href="https://simonwillison.net/2026/May/19/5-minute-llms/#5-minutes-llms.019.jpeg"&gt;#&lt;/a&gt;
  &lt;p&gt;Google released the &lt;a href="https://simonwillison.net/2026/Apr/2/gemma-4/"&gt;Gemma 4&lt;/a&gt; series of models, which are the most capable open weight models I've seen from a US company.&lt;/p&gt;
  &lt;/div&gt;
&lt;/div&gt;
&lt;div class="slide" id="5-minutes-llms.020.jpeg"&gt;
  &lt;img loading="lazy" src="https://static.simonwillison.net/static/2026/5-minutes-llms/5-minutes-llms.020.jpeg" alt="GLM-5.1
MIT, 754B parameter, 1.51TB!
" style="max-width: 100%" /&gt;
  &lt;div&gt;&lt;a style="float: right; text-decoration: none; border-bottom: none; padding-left: 1em;" href="https://simonwillison.net/2026/May/19/5-minute-llms/#5-minutes-llms.020.jpeg"&gt;#&lt;/a&gt;
  &lt;p&gt;Also last month, Chinese AI lab GLM came out with &lt;a href="https://simonwillison.net/2026/Apr/7/glm-51/"&gt;GLM-5.1&lt;/a&gt; - an open weight 1.5TB monster! This is a very effective model... if you can afford the hardware to run it.&lt;/p&gt;
  &lt;/div&gt;
&lt;/div&gt;
&lt;div class="slide" id="5-minutes-llms.021.jpeg"&gt;
  &lt;img loading="lazy" src="https://static.simonwillison.net/static/2026/5-minutes-llms/5-minutes-llms.021.jpeg" alt="" style="max-width: 100%" /&gt;
  &lt;div&gt;&lt;a style="float: right; text-decoration: none; border-bottom: none; padding-left: 1em;" href="https://simonwillison.net/2026/May/19/5-minute-llms/#5-minutes-llms.021.jpeg"&gt;#&lt;/a&gt;
  &lt;p&gt;GLM-5.1 drew me this very competent pelican on a bicycle.&lt;/p&gt;
  &lt;/div&gt;
&lt;/div&gt;
&lt;div class="slide" id="5-minutes-llms.022.jpeg"&gt;
  &lt;img loading="lazy" src="https://static.simonwillison.net/static/2026/5-minutes-llms/5-minutes-llms.022.jpeg" alt="The bike is wonky, the pelican is floating." style="max-width: 100%" /&gt;
  &lt;div&gt;&lt;a style="float: right; text-decoration: none; border-bottom: none; padding-left: 1em;" href="https://simonwillison.net/2026/May/19/5-minute-llms/#5-minutes-llms.022.jpeg"&gt;#&lt;/a&gt;
  &lt;p&gt;... though when it &lt;a href="https://gisthost.github.io/?73bb6808b18c2482f66e5f082c75f36e"&gt;tried to animate it&lt;/a&gt; the bicycle bounced off into the top and the bicycle got warped.&lt;/p&gt;
  &lt;/div&gt;
&lt;/div&gt;
&lt;div class="slide" id="5-minutes-llms.023.jpeg"&gt;
  &lt;img loading="lazy" src="https://static.simonwillison.net/static/2026/5-minutes-llms/5-minutes-llms.023.jpeg" alt="Screenshot of Bluesky

Charles
‪@charles.capps.me‬
I think you should pester it with another animal using another method of locomotion. 

Something tells me it was trained for this. I can&amp;#39;t quite put my finger on it. /s

NORTH VIRGINIA OPOSSUM ON AN E-SCOOTER!!" style="max-width: 100%" /&gt;
  &lt;div&gt;&lt;a style="float: right; text-decoration: none; border-bottom: none; padding-left: 1em;" href="https://simonwillison.net/2026/May/19/5-minute-llms/#5-minutes-llms.023.jpeg"&gt;#&lt;/a&gt;
  &lt;p&gt;Charles &lt;a href="https://bsky.app/profile/charles.capps.me/post/3miwrn42mjc2t"&gt;on Bluesky&lt;/a&gt; suggested I try it with a North Virginia Opossum on an E-scooter&lt;/p&gt;
  &lt;/div&gt;
&lt;/div&gt;
&lt;div class="slide" id="5-minutes-llms.024.jpeg"&gt;
  &lt;img loading="lazy" src="https://static.simonwillison.net/static/2026/5-minutes-llms/5-minutes-llms.024.jpeg" alt="NORTH VIRGINIA OPOSSUM
CRUISING THE COMMONWEALTH SINCE DUSK

And a really cool illustration of a possum." style="max-width: 100%" /&gt;
  &lt;div&gt;&lt;a style="float: right; text-decoration: none; border-bottom: none; padding-left: 1em;" href="https://simonwillison.net/2026/May/19/5-minute-llms/#5-minutes-llms.024.jpeg"&gt;#&lt;/a&gt;
  &lt;p&gt;And it did this! I've tried this on other models and they don't even come close. "Cruising the commonwealth since dusk" is perfect. It's &lt;a href="https://static.simonwillison.net/static/2026/glm-possum-escooter.html"&gt;animated too&lt;/a&gt;.&lt;/p&gt;
  &lt;/div&gt;
&lt;/div&gt;
&lt;div class="slide" id="5-minutes-llms.025.jpeg"&gt;
  &lt;img loading="lazy" src="https://static.simonwillison.net/static/2026/5-minutes-llms/5-minutes-llms.025.jpeg" alt="Qwen3.6-35B-A3B is a 20.9GB file that runs on my laptop

It drew a better pelican on a bicycle than Opus 4.7, which messed up the bicycle frame." style="max-width: 100%" /&gt;
  &lt;div&gt;&lt;a style="float: right; text-decoration: none; border-bottom: none; padding-left: 1em;" href="https://simonwillison.net/2026/May/19/5-minute-llms/#5-minutes-llms.025.jpeg"&gt;#&lt;/a&gt;
  &lt;p&gt;The other neat Chinese open weight models in April came from Qwen. &lt;a href="https://simonwillison.net/2026/Apr/16/qwen-beats-opus/"&gt;Qwen3.6-35B-A3B on my laptop drew me a better pelican than Claude Opus 4.7&lt;/a&gt;. That's a 20.9GB open weights model that runs on my laptop!&lt;/p&gt;
&lt;p&gt;(I think this mainly demonstrates that the pelican on the bicycle has firmly exceeded its limits as a useful benchmark.)&lt;/p&gt;
  &lt;/div&gt;
&lt;/div&gt;
&lt;div class="slide" id="5-minutes-llms.026.jpeg"&gt;
  &lt;img loading="lazy" src="https://static.simonwillison.net/static/2026/5-minutes-llms/5-minutes-llms.026.jpeg" alt="Claude Sonnet 4.5 pelican for comparison." style="max-width: 100%" /&gt;
  &lt;div&gt;&lt;a style="float: right; text-decoration: none; border-bottom: none; padding-left: 1em;" href="https://simonwillison.net/2026/May/19/5-minute-llms/#5-minutes-llms.026.jpeg"&gt;#&lt;/a&gt;
  &lt;p&gt;Here's that Claude Sonnet 4.5 pelican from September for comparison. &lt;/p&gt;
  &lt;/div&gt;
&lt;/div&gt;
&lt;div class="slide" id="5-minutes-llms.027.jpeg"&gt;
  &lt;img loading="lazy" src="https://static.simonwillison.net/static/2026/5-minutes-llms/5-minutes-llms.027.jpeg" alt="The themes of the past 6 months:
Coding agents got really good
Local models wildly outperform expectations
" style="max-width: 100%" /&gt;
  &lt;div&gt;&lt;a style="float: right; text-decoration: none; border-bottom: none; padding-left: 1em;" href="https://simonwillison.net/2026/May/19/5-minute-llms/#5-minutes-llms.027.jpeg"&gt;#&lt;/a&gt;
  &lt;p&gt;So those were the two main themes of the past six months. The coding agents got really good... and the laptop-available models, while a lot weaker than the frontier, have started wildly outperforming expectations.&lt;/p&gt;
  &lt;/div&gt;
&lt;/div&gt;
    
        &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/coding-agents"&gt;coding-agents&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/local-llms"&gt;local-llms&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/lightning-talks"&gt;lightning-talks&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/llms"&gt;llms&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/pycon"&gt;pycon&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/generative-ai"&gt;generative-ai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/annotated-talks"&gt;annotated-talks&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/pelican-riding-a-bicycle"&gt;pelican-riding-a-bicycle&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/ai"&gt;ai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/speaking"&gt;speaking&lt;/a&gt;&lt;/p&gt;
    

</summary><category term="coding-agents"/><category term="local-llms"/><category term="lightning-talks"/><category term="llms"/><category term="pycon"/><category term="generative-ai"/><category term="annotated-talks"/><category term="pelican-riding-a-bicycle"/><category term="ai"/><category term="speaking"/></entry><entry><title>GDS weighs in on the NHS's decision to retreat from Open Source</title><link href="https://simonwillison.net/2026/May/17/gds-weighs-in/#atom-tag" rel="alternate"/><published>2026-05-17T15:59:41+00:00</published><updated>2026-05-17T15:59:41+00:00</updated><id>https://simonwillison.net/2026/May/17/gds-weighs-in/#atom-tag</id><summary type="html">
    
&lt;p&gt;&lt;strong&gt;&lt;a href="https://shkspr.mobi/blog/2026/05/gds-weighs-in-on-the-nhss-decision-to-retreat-from-open-source/"&gt;GDS weighs in on the NHS&amp;#x27;s decision to retreat from Open Source&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;
Terence Eden continues his coverage of the NHS' &lt;a href="https://shkspr.mobi/blog/2026/05/nhs-goes-to-war-against-open-source/"&gt;poorly considered decision&lt;/a&gt; to close down access to their open source repositories in response to vulnerabilities reported to them as part of &lt;a href="https://simonwillison.net/2026/Apr/7/project-glasswing/"&gt;Project Glasswing&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;Now the Government Digital Service have joined the conversation with &lt;a href="https://www.gov.uk/guidance/ai-open-code-and-vulnerability-risk-in-the-public-sector"&gt;AI, open code and vulnerability risk in the public sector&lt;/a&gt;, published May 14th. Their key recommendation:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;Keep open by default. Making everything private adds additional delivery and policy costs, and can reduce reuse and scrutiny. Openness should remain the default posture, with closure used sparingly and deliberately. &lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;While they don't mention the NHS by name, Terence speaks the language of the civil service and interprets this as a major escalation:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;Within the UK's Civil Service you occasionally hear the expression "being invited to a meeting &lt;em&gt;without biscuits&lt;/em&gt;". It implies a rather frosty discussion without any of the polite niceties of a normal meeting. In general though, even when people have severe disagreements, it is rare for tempers to fray. It is even rarer for those internal disagreements to spill over into public.&lt;/p&gt;
&lt;/blockquote&gt;


    &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/terence-eden"&gt;terence-eden&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/gov-uk"&gt;gov-uk&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/ai"&gt;ai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/llms"&gt;llms&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/ai-ethics"&gt;ai-ethics&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/open-source"&gt;open-source&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/security"&gt;security&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/generative-ai"&gt;generative-ai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/ai-security-research"&gt;ai-security-research&lt;/a&gt;&lt;/p&gt;



</summary><category term="terence-eden"/><category term="gov-uk"/><category term="ai"/><category term="llms"/><category term="ai-ethics"/><category term="open-source"/><category term="security"/><category term="generative-ai"/><category term="ai-security-research"/></entry><entry><title>QR code generator</title><link href="https://simonwillison.net/2026/May/15/qr-code-generator/#atom-tag" rel="alternate"/><published>2026-05-15T04:00:49+00:00</published><updated>2026-05-15T04:00:49+00:00</updated><id>https://simonwillison.net/2026/May/15/qr-code-generator/#atom-tag</id><summary type="html">
    
        &lt;p&gt;&lt;strong&gt;Tool:&lt;/strong&gt; &lt;a href="https://tools.simonwillison.net/qr-code-generator"&gt;QR code generator&lt;/a&gt;&lt;/p&gt;
        &lt;p&gt;Claude helped me build this tool for creating QR codes, for both text/URLs and for connecting to WiFi networks.&lt;/p&gt;
&lt;p&gt;&lt;img alt="Screenshot of a QR code generator web form. Heading &amp;quot;QR code generator&amp;quot; with subtitle &amp;quot;Create a scannable code for a URL, text, or WiFi network.&amp;quot; A segmented toggle shows &amp;quot;URL / text&amp;quot; and &amp;quot;WiFi&amp;quot; with WiFi selected. Below are fields: &amp;quot;Network name (SSID)&amp;quot; with placeholder &amp;quot;My WiFi&amp;quot;; &amp;quot;Password&amp;quot; with placeholder &amp;quot;Password&amp;quot; and a blue &amp;quot;Show&amp;quot; link; &amp;quot;Security&amp;quot; dropdown set to &amp;quot;WPA / WPA2 / WPA3 (most common)&amp;quot;; an unchecked &amp;quot;Hidden&amp;quot; checkbox; helper text &amp;quot;Not sure? Leave it on WPA / WPA2 / WPA3 — that covers almost every home WiFi network.&amp;quot; Below that: &amp;quot;Style&amp;quot; dropdown set to &amp;quot;Square&amp;quot;, an unchecked &amp;quot;Border&amp;quot; checkbox, &amp;quot;Size&amp;quot; dropdown set to &amp;quot;Medium&amp;quot;, and a &amp;quot;Color&amp;quot; swatch showing black. At the bottom is a blue &amp;quot;Generate QR code&amp;quot; button." src="https://static.simonwillison.net/static/2026/qr-code-generate.jpg" /&gt;&lt;/p&gt;
    
    
        &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/vibe-coding"&gt;vibe-coding&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/tools"&gt;tools&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/generative-ai"&gt;generative-ai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/ai"&gt;ai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/llms"&gt;llms&lt;/a&gt;&lt;/p&gt;
    

</summary><category term="vibe-coding"/><category term="tools"/><category term="generative-ai"/><category term="ai"/><category term="llms"/></entry><entry><title>Not so locked in any more</title><link href="https://simonwillison.net/2026/May/14/not-so-locked-in/#atom-tag" rel="alternate"/><published>2026-05-14T22:53:49+00:00</published><updated>2026-05-14T22:53:49+00:00</updated><id>https://simonwillison.net/2026/May/14/not-so-locked-in/#atom-tag</id><summary type="html">
    &lt;p&gt;This &lt;a href="https://simonwillison.net/2026/May/14/mitchell-hashimoto/"&gt;Mitchell Hashimoto quote&lt;/a&gt; about Bun migrating from Zig to Rust reminded me of a similar conversation I had at a conference last week.&lt;/p&gt;
&lt;p&gt;I was talking to someone who worked for a medium sized technology company with a pair of legacy/&lt;a href="https://simonwillison.net/2018/Jul/17/mark-norman-francis/"&gt;legendary&lt;/a&gt; iPhone and Android apps.&lt;/p&gt;
&lt;p&gt;They told me they had just completed a coding-agent driven rewrite of both apps to React Native.&lt;/p&gt;
&lt;p&gt;I asked why they chose that, given that coding agents presumably drive down the cost of maintaining separate iPhone and Android apps.&lt;/p&gt;
&lt;p&gt;They said that React Native has improved a lot over the past few years and covered everything their apps needed to do.&lt;/p&gt;
&lt;p&gt;And... if it turned out to be the wrong decision, they could &lt;strong&gt;just port back to native&lt;/strong&gt; in the future.&lt;/p&gt;
&lt;p&gt;Like Mitchell said:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;Programming languages used to be LOCK IN, and they're increasingly not so.&lt;/p&gt;
&lt;/blockquote&gt;

    &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/react"&gt;react&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/coding-agents"&gt;coding-agents&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/ai-assisted-programming"&gt;ai-assisted-programming&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/generative-ai"&gt;generative-ai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/ai"&gt;ai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/llms"&gt;llms&lt;/a&gt;&lt;/p&gt;



</summary><category term="react"/><category term="coding-agents"/><category term="ai-assisted-programming"/><category term="generative-ai"/><category term="ai"/><category term="llms"/></entry><entry><title>Quoting Mitchell Hashimoto</title><link href="https://simonwillison.net/2026/May/14/mitchell-hashimoto/#atom-tag" rel="alternate"/><published>2026-05-14T22:31:20+00:00</published><updated>2026-05-14T22:31:20+00:00</updated><id>https://simonwillison.net/2026/May/14/mitchell-hashimoto/#atom-tag</id><summary type="html">
    &lt;blockquote cite="https://twitter.com/mitchellh/status/2055039647924007222"&gt;&lt;p&gt;[...] On the interesting side is how fungible programming languages are nowadays. Programming languages used to be LOCK IN, and they're increasingly not so. You think the Bun rewrite in Rust is good for Rust? Bun has shown they can be in probably any language they want in roughly a week or two. Rust is expendable. Its useful until its not then it can be thrown out. That's interesting!&lt;/p&gt;&lt;/blockquote&gt;
&lt;p class="cite"&gt;&amp;mdash; &lt;a href="https://twitter.com/mitchellh/status/2055039647924007222"&gt;Mitchell Hashimoto&lt;/a&gt;, on Bun porting from Zig to Rust&lt;/p&gt;

    &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/zig"&gt;zig&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/ai"&gt;ai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/mitchell-hashimoto"&gt;mitchell-hashimoto&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/llms"&gt;llms&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/rust"&gt;rust&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/generative-ai"&gt;generative-ai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/agentic-engineering"&gt;agentic-engineering&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/bun"&gt;bun&lt;/a&gt;&lt;/p&gt;



</summary><category term="zig"/><category term="ai"/><category term="mitchell-hashimoto"/><category term="llms"/><category term="rust"/><category term="generative-ai"/><category term="agentic-engineering"/><category term="bun"/></entry><entry><title>Welcome to the Datasette blog</title><link href="https://simonwillison.net/2026/May/13/welcome-to-the-datasette-blog/#atom-tag" rel="alternate"/><published>2026-05-13T23:59:39+00:00</published><updated>2026-05-13T23:59:39+00:00</updated><id>https://simonwillison.net/2026/May/13/welcome-to-the-datasette-blog/#atom-tag</id><summary type="html">
    
&lt;p&gt;&lt;strong&gt;&lt;a href="https://datasette.io/blog/2026/new-blog/"&gt;Welcome to the Datasette blog&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;
We have a bunch of neat Datasette announcements in the pipeline so we decided it was time the project grew an official blog.&lt;/p&gt;
&lt;p&gt;I built this using OpenAI Codex desktop, which turns out to have the Markdown session transcript export feature I've always wanted. Here's &lt;a href="https://gist.github.com/simonw/885b11eee46822622b8031a1f4e5f3a3"&gt;the session that built the blog&lt;/a&gt;. See also &lt;a href="https://github.com/simonw/datasette.io/issues/179"&gt;issue 179&lt;/a&gt;.


    &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/datasette"&gt;datasette&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/codex"&gt;codex&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/ai-assisted-programming"&gt;ai-assisted-programming&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/generative-ai"&gt;generative-ai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/ai"&gt;ai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/llms"&gt;llms&lt;/a&gt;&lt;/p&gt;



</summary><category term="datasette"/><category term="codex"/><category term="ai-assisted-programming"/><category term="generative-ai"/><category term="ai"/><category term="llms"/></entry><entry><title>Quoting Boris Mann</title><link href="https://simonwillison.net/2026/May/13/boris-mann/#atom-tag" rel="alternate"/><published>2026-05-13T16:15:50+00:00</published><updated>2026-05-13T16:15:50+00:00</updated><id>https://simonwillison.net/2026/May/13/boris-mann/#atom-tag</id><summary type="html">
    &lt;blockquote cite="https://bsky.app/profile/bmann.ca/post/3mlp2ipupv22z"&gt;&lt;p&gt;“11 AI agents” is meaningless as a phrase. &lt;/p&gt;
&lt;p&gt;If I said “I have 11 spreadsheets” or “I have 11 browser tabs” to do my work, it means about the same thing.&lt;/p&gt;&lt;/blockquote&gt;
&lt;p class="cite"&gt;&amp;mdash; &lt;a href="https://bsky.app/profile/bmann.ca/post/3mlp2ipupv22z"&gt;Boris Mann&lt;/a&gt;&lt;/p&gt;

    &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/ai-agents"&gt;ai-agents&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/ai"&gt;ai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/agent-definitions"&gt;agent-definitions&lt;/a&gt;&lt;/p&gt;



</summary><category term="ai-agents"/><category term="ai"/><category term="agent-definitions"/></entry><entry><title>Quoting Mo Bitar</title><link href="https://simonwillison.net/2026/May/12/mo-bitar/#atom-tag" rel="alternate"/><published>2026-05-12T22:59:58+00:00</published><updated>2026-05-12T22:59:58+00:00</updated><id>https://simonwillison.net/2026/May/12/mo-bitar/#atom-tag</id><summary type="html">
    &lt;blockquote cite="https://www.tiktok.com/@atmoio/video/7638649825382190350"&gt;&lt;p&gt;Now, if your CEO has never heard the phrase Ralph Loop, oh man, you are less than 30 days away from your next promotion. I'm not even exaggerating. Walk into his office, close the door, and say, hey chief, been experimenting with something. It's called Ralph Loops. And I think it could change literally everything. And he's gonna say, what's a Ralph loop? And you will say, give me $18,000 worth of API credits and I'll show you. Now you won't actually do anything, because you can't do anything. Because nobody can, because nobody knows what they're doing. But by the time he figures that out, you'll have a new title, and equity bump. [...]&lt;/p&gt;
&lt;p&gt;Talk about automation constantly. Nothing arouses the slumbering capitalists than the mention of automation. Drop names too, bro. Like talk about specific team members you can automate out of existence. Be like, yo, I automated Gary, bro. Tag Gary in the message. Tag him in Slack in a very public channel. Be like, yo, I just automated @Gary. His function has been Ralph Looped. And tag your CEO in the same message. You think you're getting laid off after that?&lt;/p&gt;&lt;/blockquote&gt;
&lt;p class="cite"&gt;&amp;mdash; &lt;a href="https://www.tiktok.com/@atmoio/video/7638649825382190350"&gt;Mo Bitar&lt;/a&gt;, The Unethical Guide to Surviving AI Layoffs, TikTok&lt;/p&gt;

    &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/ai-ethics"&gt;ai-ethics&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/tiktok"&gt;tiktok&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/careers"&gt;careers&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/ai"&gt;ai&lt;/a&gt;&lt;/p&gt;



</summary><category term="ai-ethics"/><category term="tiktok"/><category term="careers"/><category term="ai"/></entry><entry><title>llm 0.32a2</title><link href="https://simonwillison.net/2026/May/12/llm/#atom-tag" rel="alternate"/><published>2026-05-12T17:45:07+00:00</published><updated>2026-05-12T17:45:07+00:00</updated><id>https://simonwillison.net/2026/May/12/llm/#atom-tag</id><summary type="html">
    
        &lt;p&gt;&lt;strong&gt;Release:&lt;/strong&gt; &lt;a href="https://github.com/simonw/llm/releases/tag/0.32a2"&gt;llm 0.32a2&lt;/a&gt;&lt;/p&gt;
        &lt;p&gt;A bunch of useful stuff in this &lt;a href="https://llm.datasette.io/"&gt;LLM&lt;/a&gt; alpha, but the most important detail is this one:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;Most reasoning-capable OpenAI models now use the &lt;a href="https://developers.openai.com/api/reference/responses/overview"&gt;&lt;code&gt;/v1/responses&lt;/code&gt;&lt;/a&gt; endpoint instead of &lt;code&gt;/v1/chat/completions&lt;/code&gt;. This enables interleaved reasoning across tool calls for GPT-5 class models. &lt;a href="https://github.com/simonw/llm/pull/1435"&gt;#1435&lt;/a&gt;&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;This means you can now see the summarized reasoning tokens when you run prompts against an OpenAI model, displayed in a different color to standard error. Use the &lt;code&gt;-R&lt;/code&gt; or &lt;code&gt;--hide-reasoning&lt;/code&gt; flags if you don't want to see that.&lt;/p&gt;
    
    
        &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/llm"&gt;llm&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/projects"&gt;projects&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/openai"&gt;openai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/generative-ai"&gt;generative-ai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/annotated-release-notes"&gt;annotated-release-notes&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/ai"&gt;ai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/llms"&gt;llms&lt;/a&gt;&lt;/p&gt;
    

</summary><category term="llm"/><category term="projects"/><category term="openai"/><category term="generative-ai"/><category term="annotated-release-notes"/><category term="ai"/><category term="llms"/></entry><entry><title>Thoughts on GitLab's workforce reduction" and "structural and strategic decisions"</title><link href="https://simonwillison.net/2026/May/11/gitlab-act-2/#atom-tag" rel="alternate"/><published>2026-05-11T23:58:55+00:00</published><updated>2026-05-11T23:58:55+00:00</updated><id>https://simonwillison.net/2026/May/11/gitlab-act-2/#atom-tag</id><summary type="html">
    
&lt;p&gt;&lt;strong&gt;&lt;a href="https://about.gitlab.com/blog/gitlab-act-2/"&gt;GitLab Act 2&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;
There's a lot going on in this announcement from GitLab about the "workforce reduction" and "structural and strategic decisions" they are making with respect to the agentic era.&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;They're "planning to reduce the number of countries by up to 30% where we have small teams". One of the most interesting things about GitLab is that they have employees spread across a large number of countries - 18 are listed &lt;a href="https://gitlab.com/gitlab-com/content-sites/handbook/-/blob/7ce61c4be88b04061f9ad9ab5eb64db91ce89d2a/content/handbook/people-group/employment-solutions.md"&gt;in their public employee handbook&lt;/a&gt; but this post says they are "operating in nearly 60 countries". That handbook used to document their payroll workflows for those countries too - they stopped publishing that in 2023 but &lt;a href="https://gitlab.com/gitlab-com/content-sites/handbook/-/blob/82ad50d380b11751645eedc733f7d663cf908d1f/content/handbook/finance/payroll.md"&gt;the last public version&lt;/a&gt; (hooray for version control) remains a fascinating read. Since we don't know which of those 60 countries have small teams, we can't calculate how many countries that 30% applies to.&lt;/li&gt;
&lt;li&gt;"We're planning to flatten the organization, removing up to three layers of management in some functions so leaders are closer to the work." - this isn't the first announcement of this type I've seen that's trimming management. Coinbase &lt;a href="https://twitter.com/brian_armstrong/status/2051616759145185723"&gt;recently announced&lt;/a&gt; a much more aggressive version of this: they were "flattening our org structure to 5 layers max below" and "No pure managers: Every leader at Coinbase must also be a strong and active individual contributor. Managers should be like player-coaches".&lt;/li&gt;
&lt;li&gt;In terms of team structure: "We're re-organizing R&amp;amp;D to create roughly 60 smaller, more empowered teams with end-to-end ownership, nearly doubling the number of independent teams." I've always loved the idea of individual teams that can ship features unblocked by other teams, and it makes sense to me that agentic engineering can increase the capability of such teams. The 37signals public employee handbook used to have a section on working &lt;a href="https://github.com/basecamp/handbook/blob/9504494a6daa555837ee2cc2d9134ca43ab36301/how-we-work.md#in-self-sufficient-independent-teams"&gt;In self-sufficient, independent teams&lt;/a&gt; which perfectly captured this for me, I'm sad to see they &lt;a href="https://github.com/basecamp/handbook/commit/1db14f83913163f4e2e72130524269ae6ba3d757"&gt;removed that detail&lt;/a&gt; in January 2024!&lt;/li&gt;
&lt;li&gt;Tucked away towards the bottom: "&lt;em&gt;We will be retiring CREDIT as our values framework&lt;/em&gt;" - that's the values framework &lt;a href="https://gitlab.com/gitlab-com/content-sites/handbook/-/blob/7ce61c4be88b04061f9ad9ab5eb64db91ce89d2a/content/handbook/values/_index.md"&gt;described on this page&lt;/a&gt;: "Collaboration, Results for Customers, Efficiency, Diversity, Inclusion &amp;amp; Belonging, Iteration, and Transparency". The new values are "Speed with Quality, Ownership Mindset, Customer Outcomes". The fact that "Diversity" is no longer in there is likely to attract a whole lot of attention, so it's worth noting that a sub-bullet under Customer Outcomes reads "Interpersonal excellence: individuals who are good humans, embrace diversity, inclusion and belonging, assume good intent and treat everyone with respect".&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Here's the part of their new strategy that most resonated with me:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;The agentic era multiplies demand for software&lt;/strong&gt;. Software has been the force multiplier behind nearly every business transformation of the last two decades. The constraint was the cost and time of producing and managing it. That constraint is collapsing. As the cost of producing software collapses, demand for it will expand. Last year, the developer platform market used to be measured in tens of dollars per user per month, this year it is hundreds/user/month and headed to thousands. &lt;em&gt;Not only is the value of software for builders increasing, but we believe there will be more software and builders than ever, and we will serve an increasing volume of both&lt;/em&gt;.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;That very much encapsulates my own optimistic, &lt;a href="https://simonwillison.net/tags/jevons-paradox/"&gt;Jevons-paradox&lt;/a&gt;-inspired hope for how this will all work out.&lt;/p&gt;
&lt;p&gt;Their opinion on this does need to be taken with a big grain of salt though. GitLab's stock price was ~$52 a year ago and is ~$26 today, and it's plausible that the drop corresponds to uncertainty about GitLab's continued growth as agentic engineering eats its way through their core market.&lt;/p&gt;
&lt;p&gt;If your entire business depends on software engineering growing as a field and producing larger volumes of more lucrative seats, you have a strong incentive to believe that agents will have that effect!

    &lt;p&gt;&lt;small&gt;&lt;/small&gt;Via &lt;a href="https://news.ycombinator.com/item?id=48100500"&gt;Hacker News&lt;/a&gt;&lt;/small&gt;&lt;/p&gt;


    &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/gitlab"&gt;gitlab&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/careers"&gt;careers&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/coding-agents"&gt;coding-agents&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/agentic-engineering"&gt;agentic-engineering&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/ai"&gt;ai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/37signals"&gt;37signals&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/jevons-paradox"&gt;jevons-paradox&lt;/a&gt;&lt;/p&gt;



</summary><category term="gitlab"/><category term="careers"/><category term="coding-agents"/><category term="agentic-engineering"/><category term="ai"/><category term="37signals"/><category term="jevons-paradox"/></entry><entry><title>Quoting James Shore</title><link href="https://simonwillison.net/2026/May/11/james-shore/#atom-tag" rel="alternate"/><published>2026-05-11T19:48:32+00:00</published><updated>2026-05-11T19:48:32+00:00</updated><id>https://simonwillison.net/2026/May/11/james-shore/#atom-tag</id><summary type="html">
    &lt;blockquote cite="https://www.jamesshore.com/v2/blog/2026/you-need-ai-that-reduces-your-maintenance-costs"&gt;&lt;p&gt;Your AI coding agent, the one you use to write code, needs to reduce your maintenance costs. Not by a little bit, either. You write code twice as quick now? Better hope you’ve halved your maintenance costs. Three times as productive? One third the maintenance costs. Otherwise, you’re screwed. You’re trading a temporary speed boost for permanent indenture. [...]&lt;/p&gt;
&lt;p&gt;The math only works if the LLM &lt;em&gt;decreases&lt;/em&gt; your maintenance costs, and by exactly the inverse of the rate it adds code. If you double your output and your cost of maintaining that output, two times two means you’ve quadrupled your maintenance costs. If you double your output and hold your maintenance costs steady, two times one means you’ve &lt;em&gt;still&lt;/em&gt; doubled your maintenance costs.&lt;/p&gt;&lt;/blockquote&gt;
&lt;p class="cite"&gt;&amp;mdash; &lt;a href="https://www.jamesshore.com/v2/blog/2026/you-need-ai-that-reduces-your-maintenance-costs"&gt;James Shore&lt;/a&gt;, You Need AI That Reduces Maintenance Costs&lt;/p&gt;

    &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/coding-agents"&gt;coding-agents&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/ai-assisted-programming"&gt;ai-assisted-programming&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/generative-ai"&gt;generative-ai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/agentic-engineering"&gt;agentic-engineering&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/ai"&gt;ai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/llms"&gt;llms&lt;/a&gt;&lt;/p&gt;



</summary><category term="coding-agents"/><category term="ai-assisted-programming"/><category term="generative-ai"/><category term="agentic-engineering"/><category term="ai"/><category term="llms"/></entry><entry><title>Your AI Use Is Breaking My Brain</title><link href="https://simonwillison.net/2026/May/11/zombie-internet/#atom-tag" rel="alternate"/><published>2026-05-11T19:21:27+00:00</published><updated>2026-05-11T19:21:27+00:00</updated><id>https://simonwillison.net/2026/May/11/zombie-internet/#atom-tag</id><summary type="html">
    
&lt;p&gt;&lt;strong&gt;&lt;a href="https://www.404media.co/your-ai-use-is-breaking-my-brain/"&gt;Your AI Use Is Breaking My Brain&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;
Excellent, angry piece by Jason Koebler on how AI writing online is becoming impossible to avoid, filtering it is mentally exhausting and it's even starting to distort regular human writing styles.&lt;/p&gt;
&lt;p&gt;I particularly liked his use of the term "Zombie Internet" to define a different, more insidious alternative to the "Dead Internet" (which is just bots talking to each other):&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;I called it the Zombie Internet because the truth is that large parts of the internet are not just bots talking to bots or bots talking to people. It’s people talking to bots, people talking to people, people creating “AI agents” and then instructing them to interact with people. It’s people using AI talking to people who are not using AI, and it’s people using AI talking to other people who are using AI. It’s influencer hustlebros who are teaching each other how to make AI influencers and have spun up automated YouTube channels and blogs and social media accounts that are spamming the internet for the sole purpose of making money. It is whatever the fuck “Moltbook” is and whatever the fuck X and LinkedIn have become. It’s AI summaries of real books being sold as the book itself and inspirational Reddit posts and comment threads in which people give heartfelt advice to some account that’s actually being run by a marketing firm. [...]&lt;/p&gt;
&lt;/blockquote&gt;

    &lt;p&gt;&lt;small&gt;&lt;/small&gt;Via &lt;a href="https://bsky.app/profile/jasonkoebler.bsky.social/post/3mllgvidacs2n"&gt;@jasonkoebler.bsky.social&lt;/a&gt;&lt;/small&gt;&lt;/p&gt;


    &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/ai-ethics"&gt;ai-ethics&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/slop"&gt;slop&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/jason-koebler"&gt;jason-koebler&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/generative-ai"&gt;generative-ai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/ai"&gt;ai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/llms"&gt;llms&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/definitions"&gt;definitions&lt;/a&gt;&lt;/p&gt;



</summary><category term="ai-ethics"/><category term="slop"/><category term="jason-koebler"/><category term="generative-ai"/><category term="ai"/><category term="llms"/><category term="definitions"/></entry><entry><title>Using LLM in the shebang line of a script</title><link href="https://simonwillison.net/2026/May/11/llm-shebang/#atom-tag" rel="alternate"/><published>2026-05-11T18:48:57+00:00</published><updated>2026-05-11T18:48:57+00:00</updated><id>https://simonwillison.net/2026/May/11/llm-shebang/#atom-tag</id><summary type="html">
    
        &lt;p&gt;&lt;strong&gt;TIL:&lt;/strong&gt; &lt;a href="https://til.simonwillison.net/llms/llm-shebang"&gt;Using LLM in the shebang line of a script&lt;/a&gt;&lt;/p&gt;
        &lt;p&gt;Kim_Bruning &lt;a href="https://news.ycombinator.com/item?id=48073246#48090590"&gt;on Hacker News&lt;/a&gt;:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;But seriously, you can put a shebang on an english text file now (if you're sufficiently brave) [...]&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;This inspired me to look at patterns for doing exactly that with &lt;a href="https://llm.datasette.io/en/stable/"&gt;LLM&lt;/a&gt;. Here's the simplest, which takes advantage of &lt;a href="https://llm.datasette.io/en/stable/fragments.html"&gt;LLM fragments&lt;/a&gt;:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;#!/usr/bin/env -S llm -f
Generate an SVG of a pelican riding a bicycle
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;But you can also incorporate &lt;a href="https://llm.datasette.io/en/stable/tools.html"&gt;tool calls&lt;/a&gt; using the &lt;code&gt;-T name_of_tool&lt;/code&gt; option:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;#!/usr/bin/env -S llm -T llm_time -f
Write a haiku that mentions the exact current time
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Or even execute YAML templates directly that define extra tools as Python functions:&lt;/p&gt;
&lt;pre&gt;&lt;span class="pl-c"&gt;&lt;span class="pl-c"&gt;#&lt;/span&gt;!/usr/bin/env -S llm -t&lt;/span&gt;
&lt;span class="pl-ent"&gt;model&lt;/span&gt;: &lt;span class="pl-s"&gt;gpt-5.4-mini&lt;/span&gt;
&lt;span class="pl-ent"&gt;system&lt;/span&gt;: &lt;span class="pl-s"&gt;|&lt;/span&gt;
&lt;span class="pl-s"&gt;  Use tools to run calculations&lt;/span&gt;
&lt;span class="pl-s"&gt;&lt;/span&gt;&lt;span class="pl-ent"&gt;functions&lt;/span&gt;: &lt;span class="pl-s"&gt;|&lt;/span&gt;
&lt;span class="pl-s"&gt;  def add(a: int, b: int) -&amp;gt; int:&lt;/span&gt;
&lt;span class="pl-s"&gt;      return a + b&lt;/span&gt;
&lt;span class="pl-s"&gt;  def multiply(a: int, b: int) -&amp;gt; int:&lt;/span&gt;
&lt;span class="pl-s"&gt;      return a * b&lt;/span&gt;&lt;/pre&gt;

&lt;p&gt;Then:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;./calc.sh 'what is 2344 * 5252 + 134' --td
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Which outputs (thanks to that &lt;code&gt;--td&lt;/code&gt; tools debug option):&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;Tool call: multiply({'a': 2344, 'b': 5252})
  12310688

Tool call: add({'a': 12310688, 'b': 134})
  12310822

2344 × 5252 + 134 = **12,310,822**
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Read the full TIL for &lt;a href="https://til.simonwillison.net/llms/llm-shebang#templates-with-tools"&gt;a more complex example&lt;/a&gt; that uses the Datasette SQL API to answer questions about content on my blog.&lt;/p&gt;
    
    
        &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/llm"&gt;llm&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/llm-tool-use"&gt;llm-tool-use&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/llms"&gt;llms&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/ai"&gt;ai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/generative-ai"&gt;generative-ai&lt;/a&gt;&lt;/p&gt;
    

</summary><category term="llm"/><category term="llm-tool-use"/><category term="llms"/><category term="ai"/><category term="generative-ai"/></entry><entry><title>Learning on the Shop floor</title><link href="https://simonwillison.net/2026/May/11/learning-on-the-shop-floor/#atom-tag" rel="alternate"/><published>2026-05-11T15:46:36+00:00</published><updated>2026-05-11T15:46:36+00:00</updated><id>https://simonwillison.net/2026/May/11/learning-on-the-shop-floor/#atom-tag</id><summary type="html">
    
&lt;p&gt;&lt;strong&gt;&lt;a href="https://twitter.com/tobi/status/2053121182044451016"&gt;Learning on the Shop floor&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;
Tobias Lütke describes Shopify's internal coding agent tool, River, which operates entirely in public on their Slack:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;River does not respond to direct messages. She politely declines and suggests to create a public channel for you and her to start working in. I myself work with river in &lt;code&gt;#tobi_river&lt;/code&gt; channel and many followed this pattern.  Every conversation is therefore searchable.  Anyone at Shopify  can jump in. In my own channel, there are over 100 people who, react to threads, add color and add context, pick up the torch, help with the reviews, remind me how rusty I am, and importantly, learn from watching. [...]&lt;/p&gt;
&lt;p&gt;As so often with German, there is a word for the kind of environment: &lt;em&gt;Lehrwerkstatt&lt;/em&gt;. Literally: &lt;strong&gt;A teaching workshop&lt;/strong&gt;. The whole shop floor is the classroom. You learn by being near the work. Being a constant learner is one of the core values of the firm.&lt;/p&gt;
&lt;p&gt;Shopify wants to be a Lehrwerkstatt at scale and River has now gotten us closer to this ideal than ever. It’s &lt;em&gt;osmosis learning&lt;/em&gt;, because it does not require a curriculum, a training plan, or a manager. It just requires everyone's work to be visible to the maximum extent possible. Everyone learns from each other.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;I'm reminded of how Midjourney spent its first few years with the primary interface being public Discord channels, forcing users to share their prompts and learn from each other's experiments. I continue to believe that the early success of Midjourney was tied to this mechanism, helping to compensate for how weird and finicky text-to-image prompting is.


    &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/midjourney"&gt;midjourney&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/coding-agents"&gt;coding-agents&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/generative-ai"&gt;generative-ai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/ai"&gt;ai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/tobias-lutke"&gt;tobias-lutke&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/llms"&gt;llms&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/slack"&gt;slack&lt;/a&gt;&lt;/p&gt;



</summary><category term="midjourney"/><category term="coding-agents"/><category term="generative-ai"/><category term="ai"/><category term="tobias-lutke"/><category term="llms"/><category term="slack"/></entry><entry><title>Quoting New York Times Editors’ Note</title><link href="https://simonwillison.net/2026/May/10/new-york-times-editors-note/#atom-tag" rel="alternate"/><published>2026-05-10T23:58:49+00:00</published><updated>2026-05-10T23:58:49+00:00</updated><id>https://simonwillison.net/2026/May/10/new-york-times-editors-note/#atom-tag</id><summary type="html">
    &lt;blockquote cite="https://www.nytimes.com/2026/04/14/world/canada/election-carney-liberal-party.html"&gt;&lt;p&gt;&lt;em&gt;This article was updated after The Times learned that a remark attributed to Pierre Poilievre, the Conservative leader, was in fact an A.I.-generated summary of his views about Canadian politics that A.I. rendered as a quotation. The reporter should have checked the accuracy of what the A.I. tool returned.  The article now accurately quotes from a speech delivered by Mr. Poilievre in April. [...] He did not refer to politicians who changed allegiances as turncoats in that speech.&lt;/em&gt;&lt;/p&gt;&lt;/blockquote&gt;
&lt;p class="cite"&gt;&amp;mdash; &lt;a href="https://www.nytimes.com/2026/04/14/world/canada/election-carney-liberal-party.html"&gt;New York Times Editors’ Note&lt;/a&gt;&lt;/p&gt;

    &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/ai-ethics"&gt;ai-ethics&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/hallucinations"&gt;hallucinations&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/generative-ai"&gt;generative-ai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/new-york-times"&gt;new-york-times&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/journalism"&gt;journalism&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/ai"&gt;ai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/llms"&gt;llms&lt;/a&gt;&lt;/p&gt;



</summary><category term="ai-ethics"/><category term="hallucinations"/><category term="generative-ai"/><category term="new-york-times"/><category term="journalism"/><category term="ai"/><category term="llms"/></entry><entry><title>Using Claude Code: The Unreasonable Effectiveness of HTML</title><link href="https://simonwillison.net/2026/May/8/unreasonable-effectiveness-of-html/#atom-tag" rel="alternate"/><published>2026-05-08T21:00:11+00:00</published><updated>2026-05-08T21:00:11+00:00</updated><id>https://simonwillison.net/2026/May/8/unreasonable-effectiveness-of-html/#atom-tag</id><summary type="html">
    
&lt;p&gt;&lt;strong&gt;&lt;a href="https://twitter.com/trq212/status/2052809885763747935"&gt;Using Claude Code: The Unreasonable Effectiveness of HTML&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;
Thought-provoking piece by Thariq Shihipar (on the Claude Code team at Anthropic) advocating for HTML over Markdown as an output format to request from Claude.&lt;/p&gt;
&lt;p&gt;The article is crammed with interesting examples (collected on &lt;a href="https://thariqs.github.io/html-effectiveness/"&gt;this site&lt;/a&gt;) and prompt suggestions like this one:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;&lt;code&gt;Help me review this PR by creating an HTML artifact that describes it. I'm not very familiar with the streaming/backpressure logic so focus on that. Render the actual diff with inline margin annotations, color-code findings by severity and whatever else might be needed to convey the concept well.&lt;/code&gt;&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;I've been defaulting to asking for most things in Markdown since the GPT-4 days, when the 8,192 token limit meant that Markdown's token-efficiency over HTML was extremely worthwhile.&lt;/p&gt;
&lt;p&gt;Thariq's piece here has caused me to reconsider that, especially for output. Asking Claude for an explanation in HTML means it can drop in SVG diagrams, interactive widgets, in-page navigation and all sorts of other neat ways of making the information more pleasant to navigate.&lt;/p&gt;
&lt;p&gt;I wrote about &lt;a href="https://simonwillison.net/2025/Dec/10/html-tools/"&gt;Useful patterns for building HTML tools&lt;/a&gt; last December, but that was focused very much on interactive utilities like the ones on my &lt;a href="https://tools.simonwillison.net/"&gt;tools.simonwillison.net&lt;/a&gt; site. I'm excited to start experimenting more with rich HTML explanations in response to ad-hoc prompts.&lt;/p&gt;
&lt;h4 id="trying-this-out"&gt;Trying this out on copy.fail&lt;/h4&gt;

&lt;p&gt;&lt;a href="https://copy.fail/"&gt;copy.fail&lt;/a&gt; describes a recently discovered Linux security exploit, including a proof of concept distributed as obfuscated Python.&lt;/p&gt;
&lt;p&gt;I tried having GPT-5.5 create an HTML explanation of the exploit like this:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;&lt;code&gt;curl https://copy.fail/exp | llm -m gpt-5.5 -s 'Explain this code in detail. Reformat it, expand out any confusing bits and go deep into what it does and how it works. Output HTML, neatly styled and using capabilities of HTML and CSS and JavaScript to make the explanation rich and interactive and as clear as possible'&lt;/code&gt;&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;Here's &lt;a href="https://gisthost.github.io/?ae53e3461ffdbfd0826156aacf025c7e"&gt;the resulting HTML page&lt;/a&gt;. It's pretty good, though I should have emphasized explaining the exploit over the Python harness around it.&lt;/p&gt;
&lt;p&gt;&lt;img alt="Screenshot of a dark-themed technical document titled &amp;quot;What this Python script does&amp;quot;. Body text: &amp;quot;This is a compact, deliberately obfuscated Linux-specific local privilege-escalation proof-of-concept. Its apparent goal is to tamper with the in-memory image/page cache of /usr/bin/su, then execute su to obtain elevated privileges.&amp;quot; A yellow-bordered callout reads: &amp;quot;Safety note: This explanation is for code understanding, reverse engineering, and defensive analysis. Do not run this on systems you do not own or administer. On a vulnerable kernel, code like this can alter the behavior of a privileged executable.&amp;quot; Left column heading &amp;quot;High-level summary&amp;quot;: &amp;quot;The script opens /usr/bin/su read-only, decompresses an embedded binary payload, and then processes that payload in 4-byte chunks. For each chunk, it performs a carefully arranged sequence involving Linux's kernel crypto socket interface, AF_ALG, pipes, and splice(). The important point is that this is not ordinary file writing. It never calls write() on /usr/bin/su. Instead, it appears to rely on a kernel bug/primitive involving spliced file pages and the crypto API to get controlled bytes placed into the page-cache representation of a privileged executable.&amp;quot; Numbered steps follow: &amp;quot;1. Open target executable — /usr/bin/su is opened read-only. 2. Decode hidden payload — A zlib-compressed hex blob is decompressed into bytes. 3. Patch in 4-byte chunks — The helper function is called repeatedly with offsets 0, 4, 8, ...&amp;quot;. Right column heading &amp;quot;Why it looks strange&amp;quot; contains a table with Pattern and Purpose columns: &amp;quot;import os as g — Short aliasing to make the script compact and harder to read. socket(38, 5, 0) — Uses raw numeric Linux constants instead of readable names. Compressed hex blob — Hides binary payload bytes and keeps the script small. splice() — Moves file-backed pages through pipes without normal user-space copying. try: recv(...) except: 0 — Triggers the kernel operation and ignores expected errors.&amp;quot;" src="https://static.simonwillison.net/static/2026/python-script-explainer.jpg" /&gt;


    &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/generative-ai"&gt;generative-ai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/prompt-engineering"&gt;prompt-engineering&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/claude-code"&gt;claude-code&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/markdown"&gt;markdown&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/ai"&gt;ai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/html"&gt;html&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/llms"&gt;llms&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/security"&gt;security&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/llm"&gt;llm&lt;/a&gt;&lt;/p&gt;



</summary><category term="generative-ai"/><category term="prompt-engineering"/><category term="claude-code"/><category term="markdown"/><category term="ai"/><category term="html"/><category term="llms"/><category term="security"/><category term="llm"/></entry><entry><title>llm-gemini 0.31</title><link href="https://simonwillison.net/2026/May/7/llm-gemini/#atom-tag" rel="alternate"/><published>2026-05-07T19:57:06+00:00</published><updated>2026-05-07T19:57:06+00:00</updated><id>https://simonwillison.net/2026/May/7/llm-gemini/#atom-tag</id><summary type="html">
    
        &lt;p&gt;&lt;strong&gt;Release:&lt;/strong&gt; &lt;a href="https://github.com/simonw/llm-gemini/releases/tag/0.31"&gt;llm-gemini 0.31&lt;/a&gt;&lt;/p&gt;
        &lt;blockquote&gt;
&lt;ul&gt;
&lt;li&gt;&lt;code&gt;gemini-3.1-flash-lite&lt;/code&gt; is &lt;a href="https://cloud.google.com/blog/products/ai-machine-learning/gemini-3-1-flash-lite-is-now-generally-available"&gt;no longer a preview&lt;/a&gt;. &lt;/li&gt;
&lt;/ul&gt;
&lt;/blockquote&gt;
&lt;p&gt;Here's my write-up of the &lt;a href="https://simonwillison.net/2026/Mar/3/gemini-31-flash-lite/"&gt;Gemini 3.1 Flash-Lite Preview model&lt;/a&gt; back in March. I don't believe this new non-preview model has changed since then.&lt;/p&gt;
    
    
        &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/llm-release"&gt;llm-release&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/gemini"&gt;gemini&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/llm"&gt;llm&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/google"&gt;google&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/generative-ai"&gt;generative-ai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/ai"&gt;ai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/llms"&gt;llms&lt;/a&gt;&lt;/p&gt;
    

</summary><category term="llm-release"/><category term="gemini"/><category term="llm"/><category term="google"/><category term="generative-ai"/><category term="ai"/><category term="llms"/></entry><entry><title>Behind the Scenes Hardening Firefox with Claude Mythos Preview</title><link href="https://simonwillison.net/2026/May/7/firefox-claude-mythos/#atom-tag" rel="alternate"/><published>2026-05-07T17:56:25+00:00</published><updated>2026-05-07T17:56:25+00:00</updated><id>https://simonwillison.net/2026/May/7/firefox-claude-mythos/#atom-tag</id><summary type="html">
    
&lt;p&gt;&lt;strong&gt;&lt;a href="https://hacks.mozilla.org/2026/05/behind-the-scenes-hardening-firefox/"&gt;Behind the Scenes Hardening Firefox with Claude Mythos Preview&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;
Fascinating, in-depth details on how Mozilla used their access to the Claude Mythos preview to locate and then fix hundreds of vulnerabilities in Firefox:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Suddenly, the bugs are very good&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;Just a few months ago, AI-generated security bug reports to open source projects were mostly known for being unwanted slop. Dealing with reports that look plausibly correct but are wrong imposes an asymmetric cost on project maintainers: it’s cheap and easy to prompt an LLM to find a “problem” in code, but slow and expensive to respond to it.&lt;/p&gt;
&lt;p&gt;It is difficult to overstate how much this dynamic changed for us over a few short months. This was due to a combination of two main factors. First, the models got a lot more capable. Second, we dramatically improved our techniques for &lt;em&gt;harnessing&lt;/em&gt; these models — steering them, scaling them, and stacking them to generate large amounts of signal and filter out the noise.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;They include some detailed bug descriptions too, including a 20-year old XSLT bug and a 15-year-old bug in the &lt;code&gt;&amp;lt;legend&amp;gt;&lt;/code&gt; element.&lt;/p&gt;
&lt;p&gt;A lot of the attempts made by the harness were blocked by Firefox's existing defense-in-depth measures, which is reassuring.&lt;/p&gt;
&lt;p&gt;Mozilla were fixing around 20-30 security bugs in Firefox per month through 2025. That jumped to 423 in April.&lt;/p&gt;
&lt;p&gt;&lt;img alt="Bar chart titled &amp;quot;Firefox Security Bug Fixes by Month&amp;quot; with subtitle &amp;quot;All Sources • All Severities&amp;quot; on a dark purple background, showing monthly counts: Jan 2025: 21, Feb 2025: 20, Mar 2025: 26, Apr 2025: 31, May 2025: 17, Jun 2025: 21, Jul 2025: 22, Aug 2025: 17, Sep 2025: 18, Oct 2025: 26, Nov 2025: 19, Dec 2025: 20, Jan 2026: 25, Feb 2026: 61, Mar 2026: 76, Apr 2026: 423 — a dramatic spike in the final month." src="https://static.simonwillison.net/static/2026/firefox-security.webp" /&gt;

    &lt;p&gt;&lt;small&gt;&lt;/small&gt;Via &lt;a href="https://lobste.rs/s/7zppv1/behind_scenes_hardening_firefox_with"&gt;Lobste.rs&lt;/a&gt;&lt;/small&gt;&lt;/p&gt;


    &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/anthropic"&gt;anthropic&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/claude"&gt;claude&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/ai"&gt;ai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/firefox"&gt;firefox&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/llms"&gt;llms&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/mozilla"&gt;mozilla&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/security"&gt;security&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/generative-ai"&gt;generative-ai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/ai-security-research"&gt;ai-security-research&lt;/a&gt;&lt;/p&gt;



</summary><category term="anthropic"/><category term="claude"/><category term="ai"/><category term="firefox"/><category term="llms"/><category term="mozilla"/><category term="security"/><category term="generative-ai"/><category term="ai-security-research"/></entry><entry><title>Notes on the xAI/Anthropic data center deal</title><link href="https://simonwillison.net/2026/May/7/xai-anthropic/#atom-tag" rel="alternate"/><published>2026-05-07T17:09:28+00:00</published><updated>2026-05-07T17:09:28+00:00</updated><id>https://simonwillison.net/2026/May/7/xai-anthropic/#atom-tag</id><summary type="html">
    &lt;p&gt;There weren't a lot of big new announcements from Anthropic at yesterday's Code w/ Claude event, but the biggest by far was the deal they've struck with SpaceX/xAI to use "all of the capacity of their Colossus data center".&lt;/p&gt;
&lt;p&gt;As I mentioned in my &lt;a href="https://simonwillison.net/2026/May/6/code-w-claude-2026/"&gt;live blog of the keynote&lt;/a&gt;, that's the one with the &lt;a href="https://www.politico.com/news/2025/05/06/elon-musk-xai-memphis-gas-turbines-air-pollution-permits-00317582"&gt;particularly bad environmental record&lt;/a&gt;. The gas turbines installed to power the facility initially ran without Clean Air Act permits or pollution control devices, which they got away with by classifying them as "temporary". Credible reports link it to increases in hospital admissions relating to low air quality.&lt;/p&gt;
&lt;p&gt;Andy Masley, one of the most prolific voices pushing back against misleading rhetoric about data centers (see &lt;a href="https://blog.andymasley.com/p/the-ai-water-issue-is-fake"&gt;The AI water issue is fake&lt;/a&gt; and &lt;a href="https://blog.andymasley.com/p/data-center-land-use-issues-are-fake"&gt;Data center land issues are fake&lt;/a&gt;), had &lt;a href="https://x.com/andymasley/status/2052070252930826384"&gt;this to say&lt;/a&gt; about Colossus:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;I would simply not run my computing out of this specific data center&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;I get that Anthropic are severely compute-constrained, but in a world where the very existence of "AI data centers" is a red-hot political issue (see recent &lt;a href="https://kutv.com/news/local/amid-boos-box-elder-county-commission-unanimously-approves-plan-for-massive-data-center"&gt;news out of Utah&lt;/a&gt; for a fresh example), signing up with this particular data center is a really bad look.&lt;/p&gt;
&lt;p&gt;There was a lot of initial chatter about how this meant xAI were clearly giving up on their own Grok models, since all of their capacity would be sold to Anthropic instead. That was a misconception - Anthropic are getting Colossus 1, but xAI are keeping their larger Colossus 2 data center for their own work.&lt;/p&gt;
&lt;p&gt;As an interesting side note, the night before the Anthropic announcement, xAI sent out a deprecation notice for Grok 4.1 Fast and several other models providing just two weeks' notice before shutdown, reported here &lt;a href="https://twitter.com/xlr8harder/status/2051901091906834439"&gt;by @xlr8harder&lt;/a&gt; from SpeechMap:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;&lt;img src="https://static.simonwillison.net/static/2026/grok-fast-shutdown.png" alt="Effective May 15, 2026 at 12:00pm PT, the following models will be retired from the xAI API: grok-4-1-fast-reasoning, grok-4-1-fast-non-reasoning, grok-4-fast-reasoning, grok-4-fast-non-reasoning, grok-4-0709, grok-code-fast-1, grok-3, grok-imagine-image-pro. After May 15, 2026, requests to these models will no longer work." style="max-width: 100%;" /&gt;&lt;/p&gt;
&lt;p&gt;This is terrible @xai. I just spent time and money to migrate to grok 4.1 fast, and you're disabling it with less than two weeks notice, after releasing it in November, with no migration path to a fast/cheap alternative.&lt;/p&gt;
&lt;p&gt;I will never depend on one of your products again.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;Here's &lt;a href="https://speechmap.substack.com/p/speechmap-update-xai-loses-top-spot"&gt;SpeechMap's detailed explanation&lt;/a&gt; of how they selected Grok 4.1 Fast for their project in March.&lt;/p&gt;
&lt;p&gt;Were xAI serving those models out of Colossus 1?&lt;/p&gt;
&lt;p&gt;xAI owner Elon Musk (who previously delighted in calling Anthropic &lt;a href="https://twitter.com/search?q=from%3Aelonmusk+misanthropic&amp;amp;src=typed_query&amp;amp;f=live"&gt;"Misanthropic"&lt;/a&gt;) &lt;a href="https://twitter.com/elonmusk/status/2052069691372478511"&gt;tweeted&lt;/a&gt; the following:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;By way of background for those who care, I spent a lot of time last week with senior members of the Anthropic team to understand what they do to ensure Claude is good for humanity and was impressed. [...]&lt;/p&gt;
&lt;p&gt;After that, I was ok leasing Colossus 1 to Anthropic, as SpaceXAI had already moved training to Colossus 2.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;And then &lt;a href="https://twitter.com/elonmusk/status/2052076315306864756"&gt;shortly afterwards&lt;/a&gt;:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;Just as SpaceX launches hundreds of satellites for competitors with fair terms and pricing, we will provide compute to AI companies that are taking the right steps to ensure it is good for humanity.&lt;/p&gt;
&lt;p&gt;We reserve the right to reclaim the compute if their AI engages in actions that harm humanity.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;Presumably the criteria for "harm humanity" are decided by Elon himself. Sounds like a new form of supply chain risk for Anthropic to me!&lt;/p&gt;
    
        &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/ai-ethics"&gt;ai-ethics&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/anthropic"&gt;anthropic&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/xai"&gt;xai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/ai-energy-usage"&gt;ai-energy-usage&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/andy-masley"&gt;andy-masley&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/ai"&gt;ai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/llms"&gt;llms&lt;/a&gt;&lt;/p&gt;
    

</summary><category term="ai-ethics"/><category term="anthropic"/><category term="xai"/><category term="ai-energy-usage"/><category term="andy-masley"/><category term="ai"/><category term="llms"/></entry><entry><title>Live blog: Code w/ Claude 2026</title><link href="https://simonwillison.net/2026/May/6/code-w-claude-2026/#atom-tag" rel="alternate"/><published>2026-05-06T15:58:27+00:00</published><updated>2026-05-06T15:58:27+00:00</updated><id>https://simonwillison.net/2026/May/6/code-w-claude-2026/#atom-tag</id><summary type="html">
    &lt;p&gt;I'm at Anthropic's Code w/ Claude event today. Here's my live blog of the morning keynote sessions.&lt;/p&gt;
    
        &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/anthropic"&gt;anthropic&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/claude"&gt;claude&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/generative-ai"&gt;generative-ai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/live-blog"&gt;live-blog&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/ai"&gt;ai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/llms"&gt;llms&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/claude-code"&gt;claude-code&lt;/a&gt;&lt;/p&gt;
    

</summary><category term="anthropic"/><category term="claude"/><category term="generative-ai"/><category term="live-blog"/><category term="ai"/><category term="llms"/><category term="claude-code"/></entry><entry><title>Vibe coding and agentic engineering are getting closer than I'd like</title><link href="https://simonwillison.net/2026/May/6/vibe-coding-and-agentic-engineering/#atom-tag" rel="alternate"/><published>2026-05-06T14:24:08+00:00</published><updated>2026-05-06T14:24:08+00:00</updated><id>https://simonwillison.net/2026/May/6/vibe-coding-and-agentic-engineering/#atom-tag</id><summary type="html">
    &lt;p&gt;I recently talked with Joseph Ruscio about AI coding tools for Heavybit's High Leverage podcast: &lt;a href="https://www.heavybit.com/library/podcasts/high-leverage/ep-9-the-ai-coding-paradigm-shift-with-simon-willison"&gt;Ep. #9, The AI Coding Paradigm Shift with Simon Willison&lt;/a&gt;. Here are some of my highlights, including my disturbing realization that vibe coding and agentic engineering have started to converge in my own work.&lt;/p&gt;
&lt;p&gt;One thing I really enjoy about podcasts is that they sometimes push me to think out loud in a way that exposes an idea I've not previously been able to put into words.&lt;/p&gt;
&lt;h4 id="vibe-coding-and-agentic-engineering-are-starting-to-overlap"&gt;Vibe coding and agentic engineering are starting to overlap&lt;/h4&gt;
&lt;p&gt;A few weeks after vibe coding was first coined I published &lt;a href="https://simonwillison.net/2025/Mar/19/vibe-coding/"&gt;Not all AI-assisted programming is vibe coding (but vibe coding rocks)&lt;/a&gt;, where I firmly staked out my belief that "vibe coding" is a very different beast from responsible use of AI to write code, which I've since started to call &lt;a href="https://simonwillison.net/guides/agentic-engineering-patterns/what-is-agentic-engineering/"&gt;agentic engineering&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;When Joseph brought up the distinction between the two I had a sudden realization that they're not nearly as distinct for me as they used to be:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;Weirdly though, those things have started to blur for me already, which is quite upsetting.&lt;/p&gt;
&lt;p&gt;I thought we had a very clear delineation where vibe coding is the thing where you're not looking at the code at all. You might not even know how to program. You might be a non-programmer who asks for a thing, and gets a thing, and if the thing works, then great! And if it doesn't, you tell it that it doesn't work and cross your fingers.&lt;/p&gt;
&lt;p&gt;But at no point are you really caring about the code quality or any of those additional constraints. And my take on vibe coding was that it's fantastic, provided you understand when it can be used and when it can't.&lt;/p&gt;
&lt;p&gt;A personal tool for you, where if there's a bug it hurts only you, go ahead!&lt;/p&gt;
&lt;p&gt;If you're building software for other people, vibe coding is grossly irresponsible because it's other people's information. Other people get hurt by your stupid bugs. You need to have a higher level than that.&lt;/p&gt;
&lt;p&gt;This contrasts with agentic engineering where you are a professional software engineer. You understand security and maintainability and operations and performance and so forth. You're using these tools to the highest of your own ability. I'm finding the scope of challenges I can take on has gone up by a significant amount because I've got the support of these tools.&lt;/p&gt;
&lt;p&gt;But I'm still leaning on my 25 years of experience as a software engineer.&lt;/p&gt;
&lt;p&gt;The goal is to build high quality production systems: if you're building lower quality stuff faster, I think that's bad. I want to build &lt;em&gt;higher&lt;/em&gt; quality stuff faster. I want everything I'm building to be better in every way than it was before.&lt;/p&gt;
&lt;p&gt;The problem is that as the coding agents get more reliable, I'm not reviewing every line of code that they write anymore, even for my production level stuff.&lt;/p&gt;
&lt;p&gt;I know full well that if you ask Claude Code to build a JSON API endpoint that runs a SQL query and outputs the results as JSON, it's just going to do it right. It's not going to mess that up. You have it add automated tests, you have it add documentation, you know it's going to be good.&lt;/p&gt;
&lt;p&gt;But I'm not reviewing that code. And now I've got that feeling of guilt: if I haven't reviewed the code, is it really responsible for me to use this in production?&lt;/p&gt;
&lt;p&gt;The thing that really helps me is thinking back to when I've worked at larger organizations where I've been an engineering manager. Other teams are building software that my team depends on.&lt;/p&gt;
&lt;p&gt;If another team hands over something and says, "hey, this is the image resize service, here's how to use it to resize your images"... I'm not going to go and read every line of code that they wrote.&lt;/p&gt;
&lt;p&gt;I'm going to look at their documentation and I'm going to use it to resize some images. And then I'm going to start shipping my own features. And if I start running into problems where the image resizer thing appears to have bugs or the performance isn't good, that's when I might dig into their Git repositories and see what's going on. But for the most part I treat that as a semi-black box that I don't look at until I need to.&lt;/p&gt;
&lt;p&gt;I'm starting to treat the agents in the same way. And it still feels uncomfortable, because human beings are accountable for what they do. A team can build a reputation. I can say "I trust that team over there. They built good software in the past. They're not going to build something rubbish because that affects their professional reputations."&lt;/p&gt;
&lt;p&gt;Claude Code does not have a professional reputation! It can't take accountability for what it's done. But it's been proving itself anyway - time and time again it's churning out straightforward things and doing them right in the style that I like.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;There's an element of &lt;a href="https://simonwillison.net/2025/Dec/10/normalization-of-deviance/"&gt;the normalization of deviance&lt;/a&gt; here - every time a model turns out to have written the right code without me monitoring it closely there's a risk that I'll trust it at the wrong moment in the future and get burned.&lt;/p&gt;
&lt;h4 id="the-new-challenge-of-evaluating-software"&gt;The new challenge of evaluating software&lt;/h4&gt;
&lt;blockquote&gt;
&lt;p&gt;It used to be if you found a GitHub repository with a hundred commits and a good readme and automated tests and stuff, you could be pretty sure that the person writing that had put a lot of care and attention into that project.&lt;/p&gt;
&lt;p&gt;And now I can knock out a git repository with a hundred commits and a beautiful readme and comprehensive tests of every line of code in half an hour! It looks identical to those projects that have had a great deal of care and attention. Maybe it is as good as them. I don't know. I can't tell from looking at it. Even for my &lt;em&gt;own&lt;/em&gt; projects, I can't tell.&lt;/p&gt;
&lt;p&gt;So I realized what I value more than the quality of the tests and documentation is that I want somebody to have &lt;em&gt;used&lt;/em&gt; the thing. If you've got a vibe coded thing which you have used every day for the past two weeks, that's much more valuable to me than something that you've just spat out and hardly even exercised.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;h4 id="the-bottlenecks-have-shifted"&gt;The bottlenecks have shifted&lt;/h4&gt;
&lt;blockquote&gt;
&lt;p&gt;If you can go from producing 200 lines of code a day to 2,000 lines of code a day, what else breaks? The entire software development lifecycle was, it turns out, designed around the idea that it takes a day to produce a few hundred lines of code. And now it doesn't.&lt;/p&gt;
&lt;p&gt;It's not just the downstream stuff, it's the upstream stuff as well. I saw &lt;a href="https://simonwillison.net/2026/Jan/24/dont-trust-the-process/"&gt;a great talk by Jenny Wen&lt;/a&gt;, who's the design leader at Anthropic, where she said we have all of these design processes that are based around the idea that you need to get the design &lt;em&gt;right&lt;/em&gt; - because if you hand it off to the engineers and they spend three months building the wrong thing, that's catastrophic.&lt;/p&gt;
&lt;p&gt;There's this whole very extensive design process that you put in place because that design results in expensive work. But if it doesn't take three months to build, maybe the design process can be a whole lot riskier because cost, if you get something wrong, has been reduced so much.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;h4 id="why-i-m-still-not-afraid-for-my-career"&gt;Why I'm still not afraid for my career&lt;/h4&gt;
&lt;blockquote&gt;
&lt;p&gt;When I look at my conversations with the agents, it's very clear to me that this is moon language for the vast majority of human beings.&lt;/p&gt;
&lt;p&gt;There are a whole bunch of reasons I'm not scared that my career as a software engineer is over now that computers can write their own code, partly because these things are amplifiers of existing experience. If you know what you're doing, you can run so much faster with them. [...]&lt;/p&gt;
&lt;p&gt;I'm constantly reminded as I work with these tools how hard the thing that we do is. Producing software is a &lt;em&gt;ferociously&lt;/em&gt; difficult thing to do. And you could give me all of the AI tools in the world and what we're trying to achieve here is still really difficult. [...]&lt;/p&gt;
&lt;p&gt;Matthew Yglesias, who's a political commentator, yesterday &lt;a href="https://twitter.com/mattyglesias/status/2049105745132585161"&gt;tweeted&lt;/a&gt;, "Five months in, I think I've decided that I don't want to vibecode — I want professionally managed software companies to use AI coding assistance to make more/better/cheaper software products that they sell to me for money." And that feels about right to me. I can plumb my house if I watch enough YouTube videos on plumbing. I would rather hire a plumber.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;On the threat to SaaS providers of companies rolling their own solutions instead:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;I just realized it's the thing I said earlier about how I only want to use your side project if you've used it for a few weeks. The enterprise version of that is I don't want a CRM unless at least two other giant enterprises have successfully used that CRM for six months. [...] You want solutions that are proven to work before you take a risk on them.&lt;/p&gt;
&lt;/blockquote&gt;
    
        &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/vibe-coding"&gt;vibe-coding&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/coding-agents"&gt;coding-agents&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/agentic-engineering"&gt;agentic-engineering&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/generative-ai"&gt;generative-ai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/podcast-appearances"&gt;podcast-appearances&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/ai"&gt;ai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/llms"&gt;llms&lt;/a&gt;&lt;/p&gt;
    

</summary><category term="vibe-coding"/><category term="coding-agents"/><category term="agentic-engineering"/><category term="generative-ai"/><category term="podcast-appearances"/><category term="ai"/><category term="llms"/></entry><entry><title>Our AI started a cafe in Stockholm</title><link href="https://simonwillison.net/2026/May/5/our-ai-started-a-cafe-in-stockholm/#atom-tag" rel="alternate"/><published>2026-05-05T22:14:21+00:00</published><updated>2026-05-05T22:14:21+00:00</updated><id>https://simonwillison.net/2026/May/5/our-ai-started-a-cafe-in-stockholm/#atom-tag</id><summary type="html">
    
&lt;p&gt;&lt;strong&gt;&lt;a href="https://andonlabs.com/blog/ai-cafe-stockholm"&gt;Our AI started a cafe in Stockholm&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;
Andon Labs previously &lt;a href="https://andonlabs.com/blog/andon-market-launch"&gt;started an AI-run retail store&lt;/a&gt; in San Francisco. Now they're running a similar experiment in Stockholm, Sweden, only this time it's a cafe.&lt;/p&gt;
&lt;p&gt;These experiments are interesting, and often throw out amusing anecdotes:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;During the first week of inventory, Mona ordered 120 eggs even though the café has no stove. When the staff told her they couldn’t cook them, she suggested using the high-speed oven, until they pointed out the eggs would likely explode. She also tried to solve the problem of fresh tomatoes being spoiled too fast by ordering 22.5 kg of canned tomatoes for the fresh sandwiches. The baristas eventually started a “Hall of Shame”, a shelf visible to customers with all the weird things Mona ordered, including 6,000 napkins, 3,000 nitrile gloves, 9L coconut milk, and industrial-sized trash bags.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;Where they lose their shine is when these AI managers start wasting the time of human beings who have &lt;em&gt;not&lt;/em&gt; opted into the experiment:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;She also successfully applied for an outdoor seating permit through the Police e-service, which didn’t require BankID. Her first submission included a sketch she had generated herself, despite having never seen the street outside the café. Unsurprisingly, the Police sent it back for revision. [...]&lt;/p&gt;
&lt;p&gt;When she makes a mistake, she often sends multiple emails to suppliers with the subject “EMERGENCY” to cancel or change the order.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;I don't think it's ethical to run experiments like this that affect real-world systems and steal time from people.&lt;/p&gt;
&lt;p&gt;I'm reminded of the incident last year where the AI Village experiment &lt;a href="https://simonwillison.net/2025/Dec/26/slop-acts-of-kindness/"&gt;infuriated Rob Pike&lt;/a&gt; by sending him unsolicited gratitude emails as an "act of kindness". That was just an unwanted email - asking suppliers to correct mistakes that were made without a human-in-the-loop or wasting police time with slop diagrams feels a whole lot worse to me.&lt;/p&gt;
&lt;p&gt;I think experiments like this need to keep their own human operators in-the-loop for outbound actions that affect other people.

    &lt;p&gt;&lt;small&gt;&lt;/small&gt;Via &lt;a href="https://news.ycombinator.com/item?id=48028289"&gt;Hacker News&lt;/a&gt;&lt;/small&gt;&lt;/p&gt;


    &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/ai-ethics"&gt;ai-ethics&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/generative-ai"&gt;generative-ai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/ai-agents"&gt;ai-agents&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/ai"&gt;ai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/llms"&gt;llms&lt;/a&gt;&lt;/p&gt;



</summary><category term="ai-ethics"/><category term="generative-ai"/><category term="ai-agents"/><category term="ai"/><category term="llms"/></entry><entry><title>Quoting John Gruber</title><link href="https://simonwillison.net/2026/May/5/john-gruber/#atom-tag" rel="alternate"/><published>2026-05-05T00:46:29+00:00</published><updated>2026-05-05T00:46:29+00:00</updated><id>https://simonwillison.net/2026/May/5/john-gruber/#atom-tag</id><summary type="html">
    &lt;blockquote cite="https://daringfireball.net/2026/05/y_combinators_stake_in_openai"&gt;&lt;p&gt;So it’s well known that Y Combinator owns &lt;em&gt;some&lt;/em&gt; stake in OpenAI. But how big is that stake? This seems like devilishly difficult information to obtain. I asked around and a little birdie who knows several OpenAI investors came back with an answer: Y Combinator owns about 0.6 percent of OpenAI. At OpenAI’s current &lt;a href="https://openai.com/index/accelerating-the-next-phase-ai/"&gt;$852 billion valuation&lt;/a&gt;, that’s worth over $5 billion.&lt;/p&gt;&lt;/blockquote&gt;
&lt;p class="cite"&gt;&amp;mdash; &lt;a href="https://daringfireball.net/2026/05/y_combinators_stake_in_openai"&gt;John Gruber&lt;/a&gt;, Y Combinator’s Stake in OpenAI&lt;/p&gt;

    &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/openai"&gt;openai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/y-combinator"&gt;y-combinator&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/ai"&gt;ai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/john-gruber"&gt;john-gruber&lt;/a&gt;&lt;/p&gt;



</summary><category term="openai"/><category term="y-combinator"/><category term="ai"/><category term="john-gruber"/></entry><entry><title>Granite 4.1 3B SVG Pelican Gallery</title><link href="https://simonwillison.net/2026/May/4/granite-41-3b-svg-pelican-gallery/#atom-tag" rel="alternate"/><published>2026-05-04T23:49:24+00:00</published><updated>2026-05-04T23:49:24+00:00</updated><id>https://simonwillison.net/2026/May/4/granite-41-3b-svg-pelican-gallery/#atom-tag</id><summary type="html">
    
&lt;p&gt;&lt;strong&gt;&lt;a href="https://simonw.github.io/granite-4.1-3b-gguf-pelicans/"&gt;Granite 4.1 3B SVG Pelican Gallery&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;
IBM released their &lt;a href="https://research.ibm.com/blog/granite-4-1-ai-foundation-models"&gt;Granite 4.1 family&lt;/a&gt; of LLMs a few days ago. They're Apache 2.0 licensed and come in 3B, 8B and 30B sizes.&lt;/p&gt;
&lt;p&gt;&lt;a href="https://huggingface.co/blog/ibm-granite/granite-4-1"&gt;Granite 4.1 LLMs: How They’re Built&lt;/a&gt; by Granite team member Yousaf Shah describes the training process in detail.&lt;/p&gt;
&lt;p&gt;Unsloth released the &lt;a href="https://huggingface.co/unsloth/granite-4.1-3b-GGUF"&gt;unsloth/granite-4.1-3b-GGUF&lt;/a&gt; collection of GGUF encoded quantized variants of the 3B model - 21 different model files ranging in size from 1.2GB to 6.34GB.&lt;/p&gt;
&lt;p&gt;All 21 of those Unsloth files add up to 51.3GB, which inspired me to finally try an experiment I've been wanting to run for ages: prompting "Generate an SVG of a pelican riding a bicycle" against different sized quantized variants of the same model to see what the results would look like.&lt;/p&gt;
&lt;p&gt;Honestly, &lt;a href="https://simonw.github.io/granite-4.1-3b-gguf-pelicans/"&gt;the results&lt;/a&gt; are less interesting than I expected. There's no distinguishable pattern relating quality to size - they're all pretty terrible!&lt;/p&gt;
&lt;p&gt;&lt;img alt="Six different SVG images from models ranging in size from 1.67GB to 1.2GB. They are almost all an abstract collection of shapes - weirdly the smallest model had the best version of a bicycle, while the largest one had something that looked a tiny bit like a pelican." src="https://static.simonwillison.net/static/2026/granite-3B-pelicans.jpg" /&gt;&lt;/p&gt;
&lt;p&gt;I'll likely try this again in the future with a model that's better at drawing pelicans.


    &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/llm-release"&gt;llm-release&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/generative-ai"&gt;generative-ai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/pelican-riding-a-bicycle"&gt;pelican-riding-a-bicycle&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/ai"&gt;ai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/ibm"&gt;ibm&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/llms"&gt;llms&lt;/a&gt;&lt;/p&gt;



</summary><category term="llm-release"/><category term="generative-ai"/><category term="pelican-riding-a-bicycle"/><category term="ai"/><category term="ibm"/><category term="llms"/></entry><entry><title>Quoting Andy Masley</title><link href="https://simonwillison.net/2026/May/4/andy-masley/#atom-tag" rel="alternate"/><published>2026-05-04T22:51:09+00:00</published><updated>2026-05-04T22:51:09+00:00</updated><id>https://simonwillison.net/2026/May/4/andy-masley/#atom-tag</id><summary type="html">
    &lt;blockquote cite="https://blog.andymasley.com/p/data-center-land-use-issues-are-fake"&gt;&lt;p&gt;[...] Between 2000 and 2024, farmers sold in total a Colorado-sized chunk of land all on their own, 77 times all land on data center property in 2028, and grew more food than ever on what was left. None of this caused any problems for US food access.&lt;/p&gt;
&lt;p&gt;And then, in the middle of all this, a farmer in Loudoun County sells a few acres of mediocre hay field to a hyperscaler for ten times its agricultural value, and the response is that we’re running out of farmland.&lt;/p&gt;&lt;/blockquote&gt;
&lt;p class="cite"&gt;&amp;mdash; &lt;a href="https://blog.andymasley.com/p/data-center-land-use-issues-are-fake"&gt;Andy Masley&lt;/a&gt;, pushing back against the "land use" argument against data center construction&lt;/p&gt;

    &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/ai-ethics"&gt;ai-ethics&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/ai"&gt;ai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/generative-ai"&gt;generative-ai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/andy-masley"&gt;andy-masley&lt;/a&gt;&lt;/p&gt;



</summary><category term="ai-ethics"/><category term="ai"/><category term="generative-ai"/><category term="andy-masley"/></entry><entry><title>Redis Array Playground</title><link href="https://simonwillison.net/2026/May/4/redis-array/#atom-tag" rel="alternate"/><published>2026-05-04T15:53:57+00:00</published><updated>2026-05-04T15:53:57+00:00</updated><id>https://simonwillison.net/2026/May/4/redis-array/#atom-tag</id><summary type="html">
    
        &lt;p&gt;&lt;strong&gt;Tool:&lt;/strong&gt; &lt;a href="https://tools.simonwillison.net/redis-array"&gt;Redis Array Playground&lt;/a&gt;&lt;/p&gt;
        &lt;p&gt;Salvatore Sanfilippo submitted &lt;a href="https://github.com/redis/redis/pull/15162"&gt;a PR&lt;/a&gt; adding a new data type - arrays - to Redis. &lt;/p&gt;
&lt;p&gt;The new commands are &lt;code&gt;ARCOUNT&lt;/code&gt;, &lt;code&gt;ARDEL&lt;/code&gt;, &lt;code&gt;ARDELRANGE&lt;/code&gt;, &lt;code&gt;ARGET&lt;/code&gt;, &lt;code&gt;ARGETRANGE&lt;/code&gt;, &lt;code&gt;ARGREP&lt;/code&gt;, &lt;code&gt;ARINFO&lt;/code&gt;, &lt;code&gt;ARINSERT&lt;/code&gt;, &lt;code&gt;ARLASTITEMS&lt;/code&gt;, &lt;code&gt;ARLEN&lt;/code&gt;, &lt;code&gt;ARMGET&lt;/code&gt;, &lt;code&gt;ARMSET&lt;/code&gt;, &lt;code&gt;ARNEXT&lt;/code&gt;, &lt;code&gt;AROP&lt;/code&gt;, &lt;code&gt;ARRING&lt;/code&gt;, &lt;code&gt;ARSCAN&lt;/code&gt;, &lt;code&gt;ARSEEK&lt;/code&gt;, &lt;code&gt;ARSET&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;The implementation is currently available in a branch, so I &lt;a href="https://github.com/simonw/tools/pull/277"&gt;had Claude Code for web&lt;/a&gt; 
build this interactive playground for trying out the new commands in a WASM-compiled build of a subset of Redis running in the browser.&lt;/p&gt;
&lt;p&gt;&lt;img alt="Screenshot of a Redis command builder UI. Left sidebar shows commands ARSCAN, ARSEEK, ARSET. Main panel has a &amp;quot;predicate oneof&amp;quot; section with a MATCH dropdown and value CHERRY, plus a &amp;quot;+ add another&amp;quot; button. Below is &amp;quot;options (optional) oneof&amp;quot; with checkboxes: AND (checked), OR (unchecked), LIMIT (checked, value 10), WITHVALUES (checked), NOCASE (checked). COMMAND section shows: ARGREP myarr - + MATCH CHERRY AND LIMIT 10 WITHVALUES NOCASE. A red &amp;quot;Run command&amp;quot; button is below. REPLY section shows &amp;quot;(no reply yet)&amp;quot;." src="https://static.simonwillison.net/static/2026/redis-array-explorer-card.jpg" /&gt;&lt;/p&gt;
&lt;p&gt;The most interesting new command is &lt;code&gt;ARGREP&lt;/code&gt; which can run a server-side grep against a range of values in the array using the newly vendored &lt;a href="https://github.com/laurikari/tre/"&gt;TRE regex library&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;Salvatore wrote more about the AI-assisted development process for the array type in &lt;a href="https://antirez.com/news/164"&gt;Redis array type: short story of a long development&lt;/a&gt;.&lt;/p&gt;
    
    
        &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/salvatore-sanfilippo"&gt;salvatore-sanfilippo&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/webassembly"&gt;webassembly&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/generative-ai"&gt;generative-ai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/agentic-engineering"&gt;agentic-engineering&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/ai"&gt;ai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/redis"&gt;redis&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/llms"&gt;llms&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/regular-expressions"&gt;regular-expressions&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/c"&gt;c&lt;/a&gt;&lt;/p&gt;
    

</summary><category term="salvatore-sanfilippo"/><category term="webassembly"/><category term="generative-ai"/><category term="agentic-engineering"/><category term="ai"/><category term="redis"/><category term="llms"/><category term="regular-expressions"/><category term="c"/></entry></feed>