Simon Willison’s Weblog

Entries Links Quotes Notes Guides Elsewhere

May 19, 2026

Release llm-gemini 0.32

New model gemini-3.5-flash for Gemini 3.5 Flash.

See also my notes on Gemini 3.5 Flash, and the pelican I drew using this upgrade to the plugin.

19th May 2026, 11:46 pm · gemini, llm

Gemini 3.5 Flash: more expensive, but Google plan to use it for everything

Today at Google I/O, Google released Gemini 3.5 Flash. This one skipped the -preview modifier and went straight to general availability, and Google appear to be using it for a whole lot of their key products:

[... 610 words]

10:40 pm / google, ai, generative-ai, llms, gemini, llm-pricing, pelican-riding-a-bicycle, llm-release

Release datasette-llm-accountant 0.1a4

Fixed bug tracking chains of responses. Refs datasette-llm#7

19th May 2026, 8:45 pm · llm, datasette

Release llm-gemini 0.32a0

Compatible with llm>=0.32a0 alpha - adds the ability to stream reasoning tokens.

19th May 2026, 8:36 pm · gemini, llm

Release datasette-llm 0.1a8

Fix for bug where llm_prompt_context() hook did not fully collect chains of responses. #7

19th May 2026, 8:28 pm

The last six months in LLMs in five minutes

I put together these annotated slides from my five minute lightning talk at PyCon US 2026, using the latest iteration of my annotated presentation tool.

[... 2,061 words]

1:09 am / lightning-talks, pycon, speaking, ai, generative-ai, local-llms, llms, annotated-talks, pelican-riding-a-bicycle, coding-agents

May 18, 2026

Sighting 7:51 AM – 8:13 AM — Glaucous-winged Gull, Brown Pelican, Snowy Egret, Canada Goose, in Los Angeles River, CA, US

I'm heading home from PyCon US today so I went on a last morning walk to try and spot a pelican. I saw one! Didn't get a great photo of that, but I did see some goslings down by the swan boat lake.

18th May 2026

May 17, 2026

GDS weighs in on the NHS’s decision to retreat from Open Source. Terence Eden continues his coverage of the NHS' poorly considered decision to close down access to their open source repositories in response to vulnerabilities reported to them as part of Project Glasswing.

Now the Government Digital Service have joined the conversation with AI, open code and vulnerability risk in the public sector, published May 14th. Their key recommendation:

Keep open by default. Making everything private adds additional delivery and policy costs, and can reduce reuse and scrutiny. Openness should remain the default posture, with closure used sparingly and deliberately.

While they don't mention the NHS by name, Terence speaks the language of the civil service and interprets this as a major escalation:

Within the UK's Civil Service you occasionally hear the expression "being invited to a meeting without biscuits". It implies a rather frosty discussion without any of the polite niceties of a normal meeting. In general though, even when people have severe disagreements, it is rare for tempers to fray. It is even rarer for those internal disagreements to spill over into public.

# 3:59 pm / open-source, security, ai, generative-ai, llms, gov-uk, terence-eden, ai-ethics, ai-security-research

May 16, 2026

In preparation for a lightning talk I'm giving at PyCon US this afternoon I decided to figure out how many names OpenClaw has actually had since that first commit back in November.

Thanks to this first_line_history.py tool (code here) the answer, according to the Git history of the OpenClaw README, is:

Warelay → CLAWDIS → CLAWDBOT → Clawdbot → Moltbot →🦞 OpenClaw

Or in detail (the output from the tool):

2025-11-24T11:23:15+01:00 16dfc1a # Warelay — WhatsApp Relay CLI (Twilio)
2025-11-24T11:41:37+01:00 d4153da # 📡 Warelay — WhatsApp Relay CLI (Twilio)
2025-11-24T17:47:57+01:00 343ef9b # 📡 warelay — WhatsApp Relay CLI (Twilio)
2025-11-25T04:44:10+01:00 14b3c6f # 📡 warelay — WhatsApp Relay CLI
2025-11-25T12:48:40+01:00 4814021 # 📡 warelay — Send, receive, and auto-reply on WhatsApp—Twilio-backed or QR-linked.
2025-11-25T13:50:18+01:00 d51a3e9 # warelay 📡 - Send, receive, and auto-reply on WhatsApp via Twilio or QR-linked WhatsApp Web; webhook setup in one command
2025-11-25T13:51:13+01:00 4d2a8a8 # 📡 warelay — Send, receive, and auto-reply on WhatsApp—Twilio-backed or QR-linked.
2025-11-25T14:52:43+01:00 1ef7f4d # 📡 warelay — Send, receive, and auto-reply on WhatsApp.
2025-12-03T15:45:32+00:00 a27ee23 # 🦞 CLAWDIS — WhatsApp Gateway for AI Agents
2025-12-08T12:43:13+01:00 17fa2f4 # 🦞 CLAWDIS — WhatsApp & Telegram Gateway for AI Agents
2025-12-19T18:41:17+01:00 7710439 # 🦞 CLAWDIS — Personal AI Assistant
2026-01-04T14:32:47+00:00 246adaa # 🦞 CLAWDBOT — Personal AI Assistant
2026-01-10T05:14:09+01:00 cdb915d # 🦞 Clawdbot — Personal AI Assistant
2026-01-27T13:37:47-05:00 3fe4b25 # 🦞 Moltbot — Personal AI Assistant
2026-01-30T03:15:10+01:00 9a71607 # 🦞 OpenClaw — Personal AI Assistant

# 8:23 pm / openclaw, git, tools

[...] in the last 10 years I’ve learned to really love and respect CSS as a technology.

So I decided years ago that I wanted to react to “CSS is hard” by getting better at CSS and taking it seriously as a technology, instead of devaluing it. Doing that changed everything for me: I learned that so many of my frustrations (“centering is impossible”) had been addressed in CSS a long time ago, and that also what “centering” means is not always straightforward and it makes sense that there are many ways to do it. CSS is hard because it’s solving a hard problem!

— Julia Evans, Moving away from Tailwind, and learning to structure my CSS

# 4:45 pm / css, julia-evans

May 15, 2026

Release inaturalist-clumper 0.1

Part of the infrastructure I use for publishing my iNaturalist sightings on my blog. I've been running this in production for a few weeks now, inspiring some iterations on how it works, so I decided to ship a 0.1 release.

You can see an example of the output in this JSON file.

15th May 2026, 11:53 pm · projects, inaturalist

Sighting 7:42 AM – 7:47 AM — Western Gull, Rock Pigeon, in Los Angeles Area (custom), CA, US

I went for a bird walk in the morning before PyCon, and we spotted a local seagull enjoying a Starbucks.

15th May 2026

Tool QR code generator

Claude helped me build this tool for creating QR codes, for both text/URLs and for connecting to WiFi networks.

15th May 2026, 4 am · vibe-coding, tools, generative-ai, ai, llms

Release datasette-llm-limits 0.1a0

This plugin works in conjunction with datasette-llm and datasette-llm-accountant to let you configure a per-user (or global) spending limit for LLM usage inside of Datasette. Configuration looks something like this:

plugins:
  datasette-llm-limits:
    limits:
      per-user-daily:
        scope: actor
        window: rolling-24h
        amount_usd: 1.00

15th May 2026, 12:42 am · llm, datasette

May 14, 2026

This Mitchell Hashimoto quote about Bun migrating from Zig to Rust reminded me of a similar conversation I had at a conference last week.

I was talking to someone who worked for a medium sized technology company with a pair of legacy/legendary iPhone and Android apps.

They told me they had just completed a coding-agent driven rewrite of both apps to React Native.

I asked why they chose that, given that coding agents presumably drive down the cost of maintaining separate iPhone and Android apps.

They said that React Native has improved a lot over the past few years and covered everything their apps needed to do.

And... if it turned out to be the wrong decision, they could just port back to native in the future.

Like Mitchell said:

Programming languages used to be LOCK IN, and they're increasingly not so.

# 10:53 pm / react, coding-agents, ai-assisted-programming, generative-ai, ai, llms

[...] On the interesting side is how fungible programming languages are nowadays. Programming languages used to be LOCK IN, and they're increasingly not so. You think the Bun rewrite in Rust is good for Rust? Bun has shown they can be in probably any language they want in roughly a week or two. Rust is expendable. Its useful until its not then it can be thrown out. That's interesting!

— Mitchell Hashimoto, on Bun porting from Zig to Rust

# 10:31 pm / zig, ai, mitchell-hashimoto, llms, rust, generative-ai, agentic-engineering, bun

Release datasette-ip-rate-limit 0.1a0

The datasette.io site was being hammered by poorly-behaved crawlers, so I had Codex (GPT-5.5 xhigh) build a configurable rate limiting plugin to block IPs that were hammering specific areas of the site too quickly.

Here's the production configuration I'm using on that site for the new plugin:

  datasette-ip-rate-limit:
    header: Fly-Client-IP
    max_keys: 10000
    exempt_paths:
    - "/static/*"
    - "/-/turnstile*"
    rules:
    - name: demo-databases
      paths:
      - "/global-power-plants/*"
      - "/legislators/*"
      window_seconds: 60
      max_requests: 60
      block_seconds: 20

14th May 2026, 4:10 am · datasette, rate-limiting, codex

Sighting 5:20 PM — American Crow, in Stanford University, CA, US

14th May 2026

May 13, 2026

Welcome to the Datasette blog. We have a bunch of neat Datasette announcements in the pipeline so we decided it was time the project grew an official blog.

I built this using OpenAI Codex desktop, which turns out to have the Markdown session transcript export feature I've always wanted. Here's the session that built the blog. See also issue 179.

# 11:59 pm / ai, datasette, generative-ai, llms, ai-assisted-programming, codex

Sighting 10:41 AM – 10:44 AM — Surf Scoter, Western Gull, in Monterey Bay National Marine Sanctuary, CA, US, CA

13th May 2026

“11 AI agents” is meaningless as a phrase.

If I said “I have 11 spreadsheets” or “I have 11 browser tabs” to do my work, it means about the same thing.

— Boris Mann

# 4:15 pm / ai-agents, ai, agent-definitions

Tool CSP Allow-list Experiment

An experiment that shows that you can load an app in a CSP-protected sandboxed iframe (see previous note) and have a custom fetch() that intercepts CSP errors and passes them up to the parent window... which can then prompt the user to add that domain to an allow-list and then refresh the page.

I built this one with GPT-5.5 xhigh running in the Codex desktop app.

13th May 2026, 4:50 am · content-security-policy, iframes, security

Sighting 5:15 PM – 5:21 PM — Cedar Waxwing, California Scrub-Jay, in San Mateo County, CA, US

13th May 2026

May 12, 2026

Release datasette 1.0a29

New TokenRestrictions.abbreviated(datasette) utility method for creating "_r" dictionaries. #2695

Table headers and column options are now visible even if a table contains zero rows. #2701

Fixed bug with display of column actions dialog on Mobile Safari. #2708

Fixed bug where tests could crash with a segfault due to a race condition between Datasette.close() and Database.close(). #2709

That segfault bug was gnarly. I added a mechanism to Datasette recently that would automatically close connections at the end of each test, but it turned out that introduced a race condition where an in-flight query could sometimes be executing in a thread against a connection while it was being closed. I ended up solving that by having Codex CLI (with GPT-5.5 xhigh) create a minimal Dockerfile that recreated the bug.

12th May 2026, 11:41 pm · projects, datasette

Now, if your CEO has never heard the phrase Ralph Loop, oh man, you are less than 30 days away from your next promotion. I'm not even exaggerating. Walk into his office, close the door, and say, hey chief, been experimenting with something. It's called Ralph Loops. And I think it could change literally everything. And he's gonna say, what's a Ralph loop? And you will say, give me $18,000 worth of API credits and I'll show you. Now you won't actually do anything, because you can't do anything. Because nobody can, because nobody knows what they're doing. But by the time he figures that out, you'll have a new title, and equity bump. [...]

Talk about automation constantly. Nothing arouses the slumbering capitalists than the mention of automation. Drop names too, bro. Like talk about specific team members you can automate out of existence. Be like, yo, I automated Gary, bro. Tag Gary in the message. Tag him in Slack in a very public channel. Be like, yo, I just automated @Gary. His function has been Ralph Looped. And tag your CEO in the same message. You think you're getting laid off after that?

— Mo Bitar, The Unethical Guide to Surviving AI Layoffs, TikTok

# 10:59 pm / ai-ethics, tiktok, careers, ai

The thing about 90% of TDMs [Technical Decision Makers] is that they're motivated primarily by NOT GETTING FIRED. These aren't people who browser Lobsters or push to GH on the weekend. These are people that work 9 to 5, get paid, go home, and NEVER THINK ABOUT WORK AGAIN. So to achieve all that, they follow secular trends supported by analysts and broad public sentiment. Oh, Gartner said that "AI strategy" is most important? McKinsey said "context" needs to be managed? Well, "Context Engine for AI Apps" is going to be defensible. Buy it.

— Mitchell Hashimoto, in a conversation about the design of the Redis homepage

# 10:21 pm / marketing, mitchell-hashimoto, redis

Release llm 0.32a2

A bunch of useful stuff in this LLM alpha, but the most important detail is this one:

Most reasoning-capable OpenAI models now use the /v1/responses endpoint instead of /v1/chat/completions. This enables interleaved reasoning across tool calls for GPT-5 class models. #1435

This means you can now see the summarized reasoning tokens when you run prompts against an OpenAI model, displayed in a different color to standard error. Use the -R or --hide-reasoning flags if you don't want to see that.

12th May 2026, 5:45 pm · llm, projects, openai, generative-ai, annotated-release-notes, ai, llms

May 11, 2026

GitLab Act 2 (via) There's a lot going on in this announcement from GitLab about the "workforce reduction" and "structural and strategic decisions" they are making with respect to the agentic era.

They're "planning to reduce the number of countries by up to 30% where we have small teams". One of the most interesting things about GitLab is that they have employees spread across a large number of countries - 18 are listed in their public employee handbook but this post says they are "operating in nearly 60 countries". That handbook used to document their payroll workflows for those countries too - they stopped publishing that in 2023 but the last public version (hooray for version control) remains a fascinating read. Since we don't know which of those 60 countries have small teams, we can't calculate how many countries that 30% applies to.
"We're planning to flatten the organization, removing up to three layers of management in some functions so leaders are closer to the work." - this isn't the first announcement of this type I've seen that's trimming management. Coinbase recently announced a much more aggressive version of this: they were "flattening our org structure to 5 layers max below" and "No pure managers: Every leader at Coinbase must also be a strong and active individual contributor. Managers should be like player-coaches".
In terms of team structure: "We're re-organizing R&D to create roughly 60 smaller, more empowered teams with end-to-end ownership, nearly doubling the number of independent teams." I've always loved the idea of individual teams that can ship features unblocked by other teams, and it makes sense to me that agentic engineering can increase the capability of such teams. The 37signals public employee handbook used to have a section on working In self-sufficient, independent teams which perfectly captured this for me, I'm sad to see they removed that detail in January 2024!
Tucked away towards the bottom: "We will be retiring CREDIT as our values framework" - that's the values framework described on this page: "Collaboration, Results for Customers, Efficiency, Diversity, Inclusion & Belonging, Iteration, and Transparency". The new values are "Speed with Quality, Ownership Mindset, Customer Outcomes". The fact that "Diversity" is no longer in there is likely to attract a whole lot of attention, so it's worth noting that a sub-bullet under Customer Outcomes reads "Interpersonal excellence: individuals who are good humans, embrace diversity, inclusion and belonging, assume good intent and treat everyone with respect".

Here's the part of their new strategy that most resonated with me:

The agentic era multiplies demand for software. Software has been the force multiplier behind nearly every business transformation of the last two decades. The constraint was the cost and time of producing and managing it. That constraint is collapsing. As the cost of producing software collapses, demand for it will expand. Last year, the developer platform market used to be measured in tens of dollars per user per month, this year it is hundreds/user/month and headed to thousands. Not only is the value of software for builders increasing, but we believe there will be more software and builders than ever, and we will serve an increasing volume of both.

That very much encapsulates my own optimistic, Jevons-paradox-inspired hope for how this will all work out.

Their opinion on this does need to be taken with a big grain of salt though. GitLab's stock price was ~$52 a year ago and is ~$26 today, and it's plausible that the drop corresponds to uncertainty about GitLab's continued growth as agentic engineering eats its way through their core market.

If your entire business depends on software engineering growing as a field and producing larger volumes of more lucrative seats, you have a strong incentive to believe that agents will have that effect!

# 11:58 pm / 37signals, careers, ai, gitlab, coding-agents, jevons-paradox, agentic-engineering

Your AI coding agent, the one you use to write code, needs to reduce your maintenance costs. Not by a little bit, either. You write code twice as quick now? Better hope you’ve halved your maintenance costs. Three times as productive? One third the maintenance costs. Otherwise, you’re screwed. You’re trading a temporary speed boost for permanent indenture. [...]

The math only works if the LLM decreases your maintenance costs, and by exactly the inverse of the rate it adds code. If you double your output and your cost of maintaining that output, two times two means you’ve quadrupled your maintenance costs. If you double your output and hold your maintenance costs steady, two times one means you’ve still doubled your maintenance costs.

— James Shore, You Need AI That Reduces Maintenance Costs

# 7:48 pm / coding-agents, ai-assisted-programming, generative-ai, agentic-engineering, ai, llms

Your AI Use Is Breaking My Brain (via) Excellent, angry piece by Jason Koebler on how AI writing online is becoming impossible to avoid, filtering it is mentally exhausting and it's even starting to distort regular human writing styles.

I particularly liked his use of the term "Zombie Internet" to define a different, more insidious alternative to the "Dead Internet" (which is just bots talking to each other):

I called it the Zombie Internet because the truth is that large parts of the internet are not just bots talking to bots or bots talking to people. It’s people talking to bots, people talking to people, people creating “AI agents” and then instructing them to interact with people. It’s people using AI talking to people who are not using AI, and it’s people using AI talking to other people who are using AI. It’s influencer hustlebros who are teaching each other how to make AI influencers and have spun up automated YouTube channels and blogs and social media accounts that are spamming the internet for the sole purpose of making money. It is whatever the fuck “Moltbook” is and whatever the fuck X and LinkedIn have become. It’s AI summaries of real books being sold as the book itself and inspirational Reddit posts and comment threads in which people give heartfelt advice to some account that’s actually being run by a marketing firm. [...]

# 7:21 pm / definitions, ai, generative-ai, llms, slop, jason-koebler, ai-ethics

TIL Using LLM in the shebang line of a script

Kim_Bruning on Hacker News:

But seriously, you can put a shebang on an english text file now (if you're sufficiently brave) [...]

This inspired me to look at patterns for doing exactly that with LLM. Here's the simplest, which takes advantage of LLM fragments:

#!/usr/bin/env -S llm -f
Generate an SVG of a pelican riding a bicycle

But you can also incorporate tool calls using the -T name_of_tool option:

#!/usr/bin/env -S llm -T llm_time -f
Write a haiku that mentions the exact current time

Or even execute YAML templates directly that define extra tools as Python functions:

#!/usr/bin/env -S llm -t
model: gpt-5.4-mini
system: |
  Use tools to run calculations
functions: |
  def add(a: int, b: int) -> int:
      return a + b
  def multiply(a: int, b: int) -> int:
      return a * b

Then:

./calc.sh 'what is 2344 * 5252 + 134' --td

Which outputs (thanks to that --td tools debug option):

Tool call: multiply({'a': 2344, 'b': 5252})
  12310688

Tool call: add({'a': 12310688, 'b': 134})
  12310822

2344 × 5252 + 134 = **12,310,822**

Read the full TIL for a more complex example that uses the Datasette SQL API to answer questions about content on my blog.

11th May 2026, 6:48 pm · llm, llm-tool-use, llms, ai, generative-ai

Learning on the Shop floor. Tobias Lütke describes Shopify's internal coding agent tool, River, which operates entirely in public on their Slack:

River does not respond to direct messages. She politely declines and suggests to create a public channel for you and her to start working in. I myself work with river in #tobi_river channel and many followed this pattern. Every conversation is therefore searchable. Anyone at Shopify can jump in. In my own channel, there are over 100 people who, react to threads, add color and add context, pick up the torch, help with the reviews, remind me how rusty I am, and importantly, learn from watching. [...]

As so often with German, there is a word for the kind of environment: Lehrwerkstatt. Literally: A teaching workshop. The whole shop floor is the classroom. You learn by being near the work. Being a constant learner is one of the core values of the firm.

Shopify wants to be a Lehrwerkstatt at scale and River has now gotten us closer to this ideal than ever. It’s osmosis learning, because it does not require a curriculum, a training plan, or a manager. It just requires everyone's work to be visible to the maximum extent possible. Everyone learns from each other.

I'm reminded of how Midjourney spent its first few years with the primary interface being public Discord channels, forcing users to share their prompts and learn from each other's experiments. I continue to believe that the early success of Midjourney was tied to this mechanism, helping to compensate for how weird and finicky text-to-image prompting is.

# 3:46 pm / ai, slack, generative-ai, llms, midjourney, coding-agents, tobias-lutke

May 10, 2026

This article was updated after The Times learned that a remark attributed to Pierre Poilievre, the Conservative leader, was in fact an A.I.-generated summary of his views about Canadian politics that A.I. rendered as a quotation. The reporter should have checked the accuracy of what the A.I. tool returned. The article now accurately quotes from a speech delivered by Mr. Poilievre in April. [...] He did not refer to politicians who changed allegiances as turncoats in that speech.

— New York Times Editors’ Note

# 11:58 pm / ai-ethics, hallucinations, generative-ai, new-york-times, journalism, ai, llms

One could say in the first quarter-century of my life, that while I was always fascinated by programming, I could never overcome the guilt of not really knowing whether the tool I am building right now isn’t already superceded by some much better implementation someone else has already written 30 or 40 years ago; I could write a TSV-aware search and replace, or I could find out about awk and solve that entire class of problems in one fell swoop, for example. My central conceit is that this is a trap. You need to reinvent a couple of wheels to get to the edge of what we know about wheel-making, not a thousand wheels, and not zero; probably four or five is sufficient in most domains, maybe closer to twenty or thirty in the most epistemically rigorous and developed fields like mathematics or computer science. Each wheel you reinvent, and every directed question you ask along the way, will propel you faster to the true frontier than that same amount of time spend in idle study, or even five times that amount.

— Andrew Quinn, footnote on Replacing a 3 GB SQLite database with a 10 MB FST (finite state transducer) binary

# 2:59 pm / careers, sqlite

May 9, 2026

WebRTC is designed to degrade and drop my prompt during poor network conditions.

wtf my dude

WebRTC aggressively drops audio packets to keep latency low. If you’ve ever heard distorted audio on a conference call, that’s WebRTC baybee. The idea is that conference calls depend on rapid back-and-forth, so pausing to wait for audio is unacceptable.

…but as a user, I would much rather wait an extra 200ms for my slow/expensive prompt to be accurate. After all, I’m paying good money to boil the ocean, and a garbage prompt means a garbage response. It’s not like LLMs are particularly responsive anyway.

But I’m not allowed to wait. It’s impossible to even retransmit a WebRTC audio packet within a browser; we tried at Discord. The implementation is hard-coded for real-time latency or else.

— Luke Curley, OpenAI’s WebRTC Problem, in response to How OpenAI delivers low-latency voice AI at scale

# 1:03 am / webrtc, openai