| blogmark |
9477 |
2026-05-31 16:31:32+00:00 |
The solution might be cancelling my AI subscription - Hacker News |
I find this post by David Wilson very relatable. David lists 16+ projects he's spun up with AI tooling, and concludes:
> I didn't mean to build most of these things. Usually the Claude session started with something like "*write a quick script for X*", and one hour later the result is not a *quick script for X*, nor in the usual case is my problem solved, whatever the original itch happened to be.
>
> On that last point, this technology is **horrific** for attention. It's a thermonuclear ADHD amplifier and I have seen the same effect in every single one of my adult friends. Folk running 3 screens simultaneously working on totally unrelated "projects" they have little hope of maintaining, and such little commitment to the outcome that the time is obviously wasted.
This is a *very* real problem. I'm finding that coding agents can take me from a vague idea to a working solution, one with tests and documentation and that *looks* like a carefully considered project evolved over the course of many weeks... in less than an hour.
Even if the code is rock solid, there's a limit to how many projects like that I can sensibly care for - and if they're instantly abandoned, what value was there from creating them in the first place?
David doesn't think this is sustainable at all:
> I have no idea how to manage AI at present except by curtailing use, because a tool producing a cheap reward with minimal input and no friction can only be a liability, and achieving that realisation is probably the only real contribution of AI to date.
I'm hopeful that the critical skill to develop here is *discipline*. That’s not great news for me: I’ve been trying to figure that one out for decades!
Interestingly, the [Hacker News thread](https://news.ycombinator.com/item?id=48345896) has gathered a number of comments from people with ADHD who are finding agents help them achieve the focus they've been missing:
- "... for me (also ADHD) it's kind of the opposite. I'm finishing side projects for the first time ever because I can actually get them working before I get bored of them"
- "As someone with ADHD I feel like AI is a salve for my mind. I used to listen to intense EDM while working. Now I sit in silence and talk to my agents. I maintain inbox zero. I absorb and comment across all relevant projects, even outside my team. I literally feel like I have a support team for the first time."
- "For those of us prone to hyperfocus, working with AI can provide the kinds of stimulation we crave. I can hardly remember a time when I've felt more engaged with my work, more productive, and more badass." |
| quotation |
2207 |
2026-05-31 01:48:12+00:00 |
Anthropic defines “run-rate revenue” in two parts. Use the last 28 days of sales from customers charged on a consumption basis and multiply it by 13. Then, multiply the monthly subscription take by 12, and add the two together. - Karen Kwok for Reuters Breakingviews |
|
| blogmark |
9476 |
2026-05-30 21:36:24+00:00 |
How we contain Claude across products - |
A complaint I often have about sandboxing products is that they are rarely thoroughly *documented*, and in the absence of detailed documentation it's hard to know how much I can trust them.
Anthropic just published a fantastic overview of how their various sandbox techniques work across [Claude.ai](https://claude.ai/), Claude Code, and Cowork.
> We constrain where and how an agent can act with process sandboxes, VMs, filesystem boundaries, and egress controls. The goal is to set a hard boundary on what an agent can reach. For example, if credentials never enter the sandbox, they can't be exfiltrated, regardless of whether the cause is a user, a model finding a “creative” path, or an attacker.
Claude.ai uses gVisor. Claude Code, run locally, uses Seatbelt on macOS and Bubblewrap on Linux. Claude Cowork runs a full VM (Apple's Virtualization framework on macOS, HCS on Windows).
There's a lot in here, including some interesting stories of risks they missed such as the `api.anthropic.com/v1/files` exfiltration vector [covered here previously](https://simonwillison.net/2026/Jan/14/claude-cowork-exfiltrates-files/).
This reminded me it's time I took another look at Anthropic's open source [srt (Anthropic Sandbox Runtime)](https://github.com/anthropic-experimental/sandbox-runtime) tool - it's mature enough know that I'm ready to give it a proper go. |
| blogmark |
9475 |
2026-05-30 19:39:08+00:00 |
I Am Retiring from Tech to Live Offline - Hacker News |
I've seen a lot of posts on forums from people threatening to quit their careers over AI. This is *not* one of those: Chad Whitacre is taking concrete steps, starting with this typewritten, scanned letter
> I'm retiring from tech. Well, "retiring" is euphemistic. I'm stepping away from tech, and that includes Open Source. [...]
>
> AI was the last straw. Have you heard of that island off India where the indigenous population kills any outsiders fool-hardy enough to land? They are doing the rest of us a favor by preserving a way of life we may need again someday, or at the very least should not want to see completely extinguished. A reminder. Never forget your roots. Here in Pennsylvania we have the Amish performing a similar function. Significantly less hostile, though still set apart, they bear witness to what was normal for all of us a couple short centuries ago: horse and buggy, wood stoves and lanterns. My intent is to be AI Amish, which means Internet Amish. Not 1780, but 1980. Neo-Amish. I'm fine driving a car and flipping a lightswitch, by which I mean that they don't make me into something I hate, which AI and [struck through: social media] [handwritten above: doomscrolling] do.
I'll admit that at first I wasn't entirely sure if this was serious. Then I found this earlier post by Chad from Feb 19 2026, [Spitting Out the Agentic Kool-Aid](https://openpath.quest/2026/spitting-out-the-agentic-kool-aid/):
> I figured I’d better taste the Kool-Aid in order to form an opinion, so I dove into Claude Code with Opus 4.5 on a side project. I spent three 12+ hour days with it. I was intoxicated. My family was weirded out. [...]
>
> It weirded me out too, when I unplugged for a long weekend. Something felt off. It was like I had another “person” in my head, sharing my inner monologue—but the “person” was a computer system owned by a budding megacorp.
>
> [...] I am now also committing myself to disembarking from the titantic of technological accelerationism.
>
> All efforts to address the problems of invasive technology are worthwhile, even those that are only partially effective. For my part, I have started trying to return more fully to a pre-screen, analog life.
It's accompanied by [a video version of the essay](https://www.youtube.com/watch?v=DCC76jmmzkc) which I found touching and sincere.
Chad has been trying to solve the open source sustainability problem [for *years*](https://simonwillison.net/2024/Jan/23/the-open-source-sustainability-crisis/) - I talked with him about this at PyCon 2025 in Cleveland. That's a very tough nut to crack, and the disruption caused by AI looks to be making it even harder.
I'm glad that the [Open Source
Endowment](https://endowment.dev/) will continue without him. I'm very much going to miss his online voice. |
| quotation |
2206 |
2026-05-30 17:29:55+00:00 |
My take on AI is, essentially, everybody who’s against it is too against it and everybody who’s for it is too for it. - Daniel Jalkut |
|
| entry |
9309 |
2026-05-28 23:59:50+00:00 |
Claude Opus 4.8: "a modest but tangible improvement" |
<p>Anthropic shipped <a href="https://www.anthropic.com/news/claude-opus-4-8">Claude Opus 4.8</a> today. My favourite thing about it is this note in the release announcement:</p>
<blockquote>
<p>Users will find Opus 4.8 to be a modest but tangible improvement on its predecessor. There’s still more to be done: we’re working on developing and releasing models that provide many of the same capabilities as Opus at a lower cost.</p>
</blockquote>
<p>It's so refreshing to see an AI lab honestly describe a release as a minor incremental improvement over the previous model!</p>
<p>Honesty seems to be a theme. Here's my other favorite note from that announcement:</p>
<blockquote>
<p>One of the most prominent improvements in Opus 4.8 is its <em>honesty</em>. We train all our models to be honest---for instance, to avoid making claims that they can't support. But a general problem with AI models is that they sometimes jump to conclusions, confidently claiming to have made progress in their work despite the evidence being thin. Early testers report that Opus 4.8 is more likely to flag uncertainties about its work and less likely to make unsupported claims. This is borne out in <a href="https://www.anthropic.com/claude-opus-4-8-system-card">our evaluations</a>, which show that Opus 4.8 is around four times less likely than its predecessor to allow flaws in code it has written to pass unremarked.</p>
</blockquote>
<p>That linked system card includes the following:</p>
<blockquote>
<p>Claude Opus 4.8 had the lowest incorrect-rate of the six models on every benchmark—the most direct measure of factual hallucination. It achieved this mainly by abstaining on questions about which it was uncertain rather than by answering more questions correctly.</p>
</blockquote>
<h4 id="model-characteristics">Model characteristics</h4>
<p>Not much has changed since 4.7.</p>
<p>It's priced the same as Opus 4.5/4.6/4.7 - $5/million input and $25 per million output. "Fast mode" is twice that price, which is a significant reduction from their previous models - fast mode on 4.6/4.7 remains at $30/$150. Note that <a href="https://platform.claude.com/docs/en/build-with-claude/fast-mode">fast mode</a> is only available to organizations that are part of the research preview, "Contact your account manager to request access".</p>
<p>Both the reliable knowledge cutoff and the training data cutoff are January 2026, the same as for 4.7.</p>
<p>The context window is still 1,000,000 tokens, and the max output is 128,000 tokens.</p>
<p>The <a href="https://platform.claude.com/docs/en/about-claude/models/whats-new-claude-4-8">What's new in Claude Opus 4.8</a> document has some of the more interesting details. These caught my eye:</p>
<blockquote>
<p><strong>Mid-conversation system messages</strong>. Claude Opus 4.8 accepts <code>role: "system"</code> messages immediately after a user turn in the <code>messages</code> array (subject to <a href="https://platform.claude.com/docs/en/build-with-claude/mid-conversation-system-messages#limitations">placement rules</a>). This lets you append updated instructions later in a long-running conversation without restating the full system prompt, which preserves <a href="https://platform.claude.com/docs/en/build-with-claude/prompt-caching">prompt cache</a> hits on the earlier turns and reduces input cost on agentic loops.</p>
</blockquote>
<p>See also <a href="https://github.com/anthropics/anthropic-sdk-python/commit/2b826760101664ef89db42132932f53ba97c894d#diff-a947c9c02eab58e8ddbe799a11832d533836d242e07c7251997f8543f0981f2f">this update</a> to the Anthropic Python SDK. Being able to steer the system prompt mid-conversation sounds really powerful. I was worried this would be incompatible with the abstraction provided by my own <a href="https://llm.datasette.io/en/stable/python-api.html#system-prompts">LLM library</a>, which expects a single system prompt per conversation... but it turns out my recent <a href="https://simonwillison.net/2026/Apr/29/llm/">redesign</a> should handle that <a href="https://github.com/simonw/llm-anthropic/issues/73">just fine</a>.</p>
<blockquote>
<p><strong>Lower prompt cache minimum</strong>. The minimum cacheable prompt length on Claude Opus 4.8 is 1,024 tokens, lower than on Claude Opus 4.7.</p>
</blockquote>
<p>I checked and 4.7's minimum <a href="https://platform.claude.com/docs/en/build-with-claude/prompt-caching#cache-limitations">was 4,096</a>.</p>
<h4 id="and-some-pelicans">And some pelicans</h4>
<p>Here are <a href="https://tools.simonwillison.net/markdown-svg-renderer#url=https%3A%2F%2Fgist.github.com%2Fsimonw%2Ffea4f7546626d627862dc241a4e3a86a">pelicans riding bicycles</a> for all five thinking levels, <code>low</code>, <code>medium</code>, <code>high</code>, <code>xhigh</code>, and <code>max</code>:</p>
<div style="display: grid; grid-template-columns: repeat(6, 1fr); gap: 1rem; max-width: 900px; margin: 0 auto;">
<figure style="grid-column: span 2; margin: 0; text-align: center;">
<img src="https://static.simonwillison.net/static/2026/claude-opus-4.8-low.png" alt="Flat-style cartoon illustration of a white duck with an orange beak and legs riding a black bicycle, its feet on the pedals, against a blue sky and green grass background." style="width: 100%; height: auto; border: 1px solid #ccc;" />
<figcaption style="margin-top: 0.5rem; font-family: system-ui, sans-serif; font-weight: bold;">
<a href="https://gist.github.com/simonw/fea4f7546626d627862dc241a4e3a86a#response">low</a>
</figcaption>
</figure>
<figure style="grid-column: span 2; margin: 0; text-align: center;">
<img src="https://static.simonwillison.net/static/2026/claude-opus-4.8-medium.png" alt="Flat-style illustration of a white egret or heron with an orange beak and legs riding a black bicycle, against a blue sky and green grass background." style="width: 100%; height: auto; border: 1px solid #ccc;" />
<figcaption style="margin-top: 0.5rem; font-family: system-ui, sans-serif; font-weight: bold;">
<a href="https://gist.github.com/simonw/fea4f7546626d627862dc241a4e3a86a#response-1">medium</a>
</figcaption>
</figure>
<figure style="grid-column: span 2; margin: 0; text-align: center;">
<img src="https://static.simonwillison.net/static/2026/claude-opus-4.8-high.png" alt="Cartoon illustration of a white duck with an orange beak riding a black bicycle, against a light blue sky with a pale yellow sun in the upper left and a green ground line at the bottom." style="width: 100%; height: auto; border: 1px solid #ccc;" />
<figcaption style="margin-top: 0.5rem; font-family: system-ui, sans-serif; font-weight: bold;">
<a href="https://gist.github.com/simonw/fea4f7546626d627862dc241a4e3a86a#response-2">high</a>
</figcaption>
</figure>
<figure style="grid-column: span 3; margin: 0; text-align: center;">
<img src="https://static.simonwillison.net/static/2026/claude-opus-4.8-xhigh.png" alt="Cartoon illustration of a white pelican with an orange beak riding a black bicycle, its orange legs extending down to the pedals, against a blue sky with a yellow sun and green ground." style="width: 100%; height: auto; border: 1px solid #ccc;" />
<figcaption style="margin-top: 0.5rem; font-family: system-ui, sans-serif; font-weight: bold;">
<a href="https://gist.github.com/simonw/fea4f7546626d627862dc241a4e3a86a#response-3">xhigh</a>
</figcaption>
</figure>
<figure style="grid-column: span 3; margin: 0; text-align: center;">
<img src="https://static.simonwillison.net/static/2026/claude-opus-4.8-max.png" alt="Cartoon illustration of a white pelican with an orange beak riding a red bicycle on green grass, against a light blue sky with a fluffy white cloud and a yellow sun." style="width: 100%; height: auto; border: 1px solid #ccc;" />
<figcaption style="margin-top: 0.5rem; font-family: system-ui, sans-serif; font-weight: bold;"><a href="https://gist.github.com/simonw/fea4f7546626d627862dc241a4e3a86a#response-4">max</a></figcaption>
</figure>
</div>
<p>This time I ran them using the <a href="https://llm.datasette.io/en/stable/usage.html">LLM CLI</a>, exported the logs to Markdown and then had Claude Opus 4.8 <a href="https://github.com/simonw/tools/commit/71e4944766b577a327ff048cc63b739ba4cbade9">build me</a> an HTML tool that could render that Markdown with the <code>svg</code> fenced code blocks displayed as SVGs on the page.</p>
<p>(I later had GPT-5.5 xhigh in Codex <a href="https://gist.github.com/simonw/bb5a267f8144dfe4e92e50a014e49e98">update that code</a> to remove any XSS holes. I'm sure Claude could have done that if I'd asked, but GPT-5.5 is my code security blanket at the moment.)</p>
<p>The max one was clearly the best, but it did take 25 input, 17,167 output tokens for a total cost of <a href="https://www.llm-prices.com/#it=25&ot=17167&ic=5&oc=25&sel=claude-opus-4-5">43 cents</a>!</p> |
| blogmark |
9473 |
2026-05-27 23:44:37+00:00 |
sqlite AGENTS.md - Alex Garcia on the Datasette Discord |
SQLite gained an AGENTS.md file [five days ago](https://github.com/sqlite/sqlite/commit/a1e5778889252d2609a59fd9b819d70392c5789e) - but it's not intended for their own development, it's presumably aimed at people who are pointing agents at the SQLite codebase. It includes:
> SQLite does not accept pull requests without prior agreement and/or accompanying legal paperwork that places the pull request in the public domain. However, the human SQLite developers will review a concise and well-written pull request as a proof-of-concept prior to reimplementing the changes themselves.
>
> SQLite does not accept agentic code. However the project will accept agentic bug reports that include a reproducible test case. Patches or pull requests demonstrating a possible fix, for documentation purposes, are welcomed.
The [most recent commit](https://github.com/sqlite/sqlite/commit/db7fe319ed5a18dbc732ab8eacea557f41cd910f) to that file removed "(currently)" from "SQLite does not (currently) accept agentic code", with the commit message "Strengthen the statement about not accepting agentic code".
Meanwhile the SQLite forum was being flooded with so many AI-generated bug reports - of varying quality - that they've now [split those off](https://sqlite.org/forum/forumpost/2e7a8d6ba4b46d8315e80fd4a1e2feb40948dff5b7b11d5ba9cea5cb40aa252b) into a [new SQLite Bug Forum](https://sqlite.org/bugs/forum). D. Richard Hipp is resolving issues on there with a flurry of commits to the codebase. |
| entry |
9308 |
2026-05-27 16:38:35+00:00 |
I think Anthropic and OpenAI have found product-market fit |
<p>Anthropic are <a href="https://techcrunch.com/2026/05/20/anthropic-says-its-about-to-have-its-first-profitable-quarter/">strongly rumored</a> to be about to have their first profitable quarter. Stories <a href="https://www.theinformation.com/newsletters/applied-ai/uber-cto-shows-claude-code-can-blow-ai-budgets">are circulating</a> of companies surprised at how expensive their LLM bills are becoming from usage by their staff. I think this is because OpenAI and Anthropic have both found product-market fit.</p>
<ul>
<li><a href="https://simonwillison.net/2026/May/27/product-market-fit/#enterprise-customers-are-now-paying-api-prices">Enterprise customers are now paying API prices</a></li>
<li><a href="https://simonwillison.net/2026/May/27/product-market-fit/#i-think-they-ve-found-product-market-fit">I think they've found product-market fit</a></li>
<li><a href="https://simonwillison.net/2026/May/27/product-market-fit/#and-they-re-ramping-up">And they're ramping up</a></li>
<li><a href="https://simonwillison.net/2026/May/27/product-market-fit/#the-ai-failure-stories-around-this-are-pretty-thin">The AI-failure stories around this are pretty thin</a></li>
<li><a href="https://simonwillison.net/2026/May/27/product-market-fit/#we-also-know-the-labs-are-spending-a-lot">We also know the labs are spending a lot</a></li>
<li><a href="https://simonwillison.net/2026/May/27/product-market-fit/#api-revenue-is-becoming-less-important">API revenue is becoming less important</a></li>
<li><a href="https://simonwillison.net/2026/May/27/product-market-fit/#april-is-a-new-inflection-point">April is a new inflection point</a></li>
</ul>
<h4 id="enterprise-customers-are-now-paying-api-prices">Enterprise customers are now paying API prices</h4>
<p>I currently subscribe to the $100/month Max plan from Anthropic and the $100/month Pro plan from OpenAI. If you are a heavy user of coding agents these plans are a fantastic deal. I just ran the <a href="https://github.com/ryoppippi/ccusage">ccusage</a> tool on my laptop to get an estimate of how much I would have spent if I were to pay for API tokens in the past 30 days and got:</p>
<ul>
<li>$1,199.79 for Anthropic Claude Code</li>
<li>$980.37 for OpenAI Codex</li>
</ul>
<p>That's $2,180.16 worth of tokens for $200 - not bad at all! I'm a moderately heavy user of these tools, but I'm certainly not running agents every hour of the day and night.</p>
<p>I had assumed that companies making extensive use of agents were getting similar discounts. It turns out I <em>could not have been more wrong</em> about that.</p>
<p>I haven't been able to track down the exact date, but at some point in the last six months Anthropic switched their Enterprise plan (originally <a href="https://www.anthropic.com/news/claude-code-on-team-and-enterprise">"Claude seats include enough usage for a typical workday" back in August 2025</a>) to $20/seat/month plus API pricing for usage. This story about the change <a href="https://www.theinformation.com/articles/anthropic-changes-pricing-bill-firms-based-ai-use-amid-compute-crunch">from The Information</a> is dated Apr 14, 2026, but cites an Anthropic spokesperson claiming that the pricing change occurred in November 2025. Existing customers are finding out about the change as they renew their contracts.</p>
<p>OpenAI made a similar pricing change in April. The <a href="https://help.openai.com/en/articles/20001106-codex-rate-card">Codex rate card</a> (<a href="https://web.archive.org/web/20260519062438/https://help.openai.com/en/articles/20001106-codex-rate-card">Internet Archive copy</a>) currently says:</p>
<blockquote>
<p><strong>Note</strong>: On April 2, 2026, we updated Codex pricing to align with API token usage, instead of per-message pricing. This change was applicable to new and existing Plus, Pro, ChatGPT Business and new ChatGPT Enterprise plans.</p>
<p>On April 23, 2026, we made this update for all existing ChatGPT Enterprise plans as well, inclusive of Edu, Health, Gov, and ChatGPT for Teachers.</p>
</blockquote>
<p>It's a little harder to decode as they quote prices in "credits", but as far as I can tell those credit costs are an exact match for the API token costs listed for those models.</p>
<p>All of which is to say that as of April 2026 the "Enterprise" cost for both OpenAI Codex and Anthropic Claude Code/Cowork is the same as the listed API price.</p>
<p>GPT-5.5 (released April 23rd) is 2x the API price of GPT-5.4. Opus 4.7 (April 16th) is <a href="https://simonwillison.net/2026/Apr/20/claude-token-counts/">around 1.4x</a> the price of Opus 4.6 when you take their new tokenizer into account.</p>
<p>So April saw both leading model companies release new frontier models with a higher API price, <em>and</em> both companies now have measures to lock their enterprise customers (who tend to sign year-long deals) at those API prices, not the previous extreme discounts.</p>
<h4 id="i-think-they-ve-found-product-market-fit">I think they've found product-market fit</h4>
<p>Why these sudden aggressive moves on pricing? Both Anthropic and OpenAI are planning to IPO, but I suspect there's a more important factor here: I think they've finally found product-market fit, with the coding/general-purpose agent products embodied by Claude Code/Cowork and Codex.</p>
<p>Tools like ChatGPT are wildly popular, but that wild popularity has been difficult to turn into revenue. In February <a href="https://finance.yahoo.com/news/chatgpt-almost-1-billion-weekly-212157499.html">OpenAI boasted</a> more than 900 million weekly active users for ChatGPT, but only 50 million - 5.6% of that - were paying consumer subscribers.</p>
<p>Charging $10-$20/month per user is an OK business, but you'd need 1-2 billion subscribers sticking around for four years to cover <a href="https://openai.com/global-affairs/seizing-the-ai-opportunity/">$1 trillion in infrastructure</a>.</p>
<p>Companies spending $200+/month/user will get you there a whole lot faster - and as noted above, as a power-user I'm at ~$1,000/month in API costs per vendor already.</p>
<p>Coding agents really did change everything. These are tools which burn <em>vastly</em> more tokens, but are also quickly becoming daily drivers for the work carried out by extremely well-compensated professionals. Right now that's still mostly software engineers, but a coding agent is a tool that can automate anything you can do by typing commands into a computer... so they are clearly applicable to a much wider set of skilled knowledge workers.</p>
<p>As I've <a href="https://simonwillison.net/tags/november-2025-inflection/">discussed on this site at length</a>, the models released in November 2025 elevated agents to being genuinely useful. We've had six months to get used to that idea now - it's no wonder companies are beginning to spend real money on this technology.</p>
<p>You could argue that ChatGPT achieved product-market fit when it became the <a href="https://www.reuters.com/technology/chatgpt-sets-record-fastest-growing-user-base-analyst-note-2023-02-01/">fastest-growing consumer app in history</a> back in February 2023... but it certainly wasn't making any actual money back then. Coding agents plus enterprise pricing marks the point when these companies start making <em>very</em> real revenue. Maybe even enough to start covering their costs!</p>
<h4 id="and-they-re-ramping-up">And they're ramping up</h4>
<p>As further evidence that enterprise agents represent product-market fit for these companies, consider their open job listings.</p>
<p>OpenAI have <a href="https://openai.com/careers/search/">703 open jobs</a> right now, of which I'd categorize 229 (32.6%) as relating to enterprise sales and support - account executives, "Go To Market", "Forward Deployed Engineers" and the like.</p>
<p>Anthropic have <a href="https://www.anthropic.com/careers/jobs">390 open jobs</a>, 105 (26.9%) of which look enterprisey to me.</p>
<p>It's pleasingly ironic that these AI labs have picked a business model with such a heavy demand on human labor - enterprise sales contracts don't close themselves without a whole lot of humans in the mix!</p>
<p><small>(I ran this analysis by scraping their job sites with Claude Code, then having it use Datasette's <a href="https://docs.datasette.io/en/latest/json_api.html">JSON API</a> to pipe that data into Datasette Cloud where I used <a href="https://agent.datasette.io/">Datasette Agent</a> for the analysis, <a href="https://gist.github.com/simonw/5632d208d76b3c8b34f1fdbaf69eb1b8#agent-4">exported here</a>. Dogfood!)</small></p>
<h4 id="the-ai-failure-stories-around-this-are-pretty-thin">The AI-failure stories around this are pretty thin</h4>
<p>I started digging into this in response to <a href="https://news.ycombinator.com/item?id=48287025#48287219">a growing volume</a> of stories claiming that large companies were sounding the alarm because their AI usage costs had grown so large.</p>
<p>The most widely cited of these stories appear quite overblown to me.</p>
<p>The most discussed has been Uber, based on <a href="https://www.theinformation.com/newsletters/applied-ai/uber-cto-shows-claude-code-can-blow-ai-budgets">this report</a> where CTO Praveen Neppalli Naga indicated that Uber had "maxed out its full year AI budget just a few months into 2026", mostly thanks to Claude Code.</p>
<p>Given that Claude Code only got <em>really</em> good in November it's entirely unsurprising to me that a budget set in 2025 may have failed to predict demand for that tool in 2026!</p>
<p>That Uber story was further fueled by comments made by Uber's COO, Andrew Macdonald, on the Rapid Response podcast. I tracked down <a href="https://www.youtube.com/watch?v=y_mQ6xLcKyc&t=1616s">the segment</a> and there really isn't much there. Here's what Andrew said:</p>
<blockquote>
<p>But then you sometimes go and talk to your senior engineering leaders and you're saying, OK, how many projects that were on the cutting room floor got moved above the line because of the productivity gains because 25% of our code commits were via Claude Code last quarter?</p>
<p>That link is not there yet, right? I think maybe implicitly there's more that is getting shipped. But it's very hard to draw a line between one of those stats and, OK, now we're actually producing like 25% more useful consumer features, right? And that line is hard to draw.</p>
<p>[...] And so if you're not actually able to draw a direct line to how much useful features and functionality you're shipping to your users, that trade becomes harder to justify.</p>
</blockquote>
<p>Somehow this fragment turned into headlines like <a href="https://www.businessinsider.com/uber-coo-andrew-macdonald-ai-token-spending-harder-justify-2026-5">Uber's COO says it's getting harder to justify the money spent on AI tokenmaxxing</a>, because the market for stories about AI failures remains enormous.</p>
<p><em><strong>Update 29th May 2026</strong>: I edited the above quote to add that last paragraph ending in "becomes harder to justify" on <a href="https://x.com/MadisonMills22/status/2060343512936186240">the suggestion of Madison Mills</a> - previously my quoted section stopped at "hard to draw". Here's the <a href="https://gist.github.com/simonw/59096a338c82f6f95e40e3d7c7b5bad9">full unedited transcript</a> from MacWhisper.</em></p>
<p>The other popular story around this is <a href="https://www.theverge.com/tech/930447/microsoft-claude-code-discontinued-notepad">Microsoft starts canceling Claude Code licenses</a>, ostensibly to encourage their engineers to dogfood their own Copilot CLI agent instead - but The Verge reporter Tom Warren says "sources tell me the decision is also a financial one", triggered by the June 30th end of Microsoft's financial year.</p>
<p>I think both of these stories support my "product-market fit" hypothesis. The best advice I ever heard on pricing a product was that your customer should <em>suck air through their teeth</em> and then say yes. Uber's budget overrun and Microsoft's seat cancellations look like that effect playing out in practice.</p>
<h4 id="we-also-know-the-labs-are-spending-a-lot">We also know the labs are spending a lot</h4>
<p>The big AI labs spend billions of dollars on both training and inference. Credible figures are hard to come by, but we did get one huge hint as to the figures involved from, oddly enough, the recent <a href="https://www.sec.gov/Archives/edgar/data/1181412/000162828026036936/spaceexplorationtechnologi.htm">SpaceX S-1</a>:</p>
<blockquote>
<p>[...] in May 2026, we entered into <strong>Cloud Services Agreements with Anthropic PBC</strong> (“Anthropic”), an AI research and development public benefit corporation, with respect to access to <strong>compute capacity across COLOSSUS and COLOSSUS II</strong>. Pursuant to these agreements, the customer <strong>has agreed to pay us $1.25 billion per month</strong> through May 2029 [...]</p>
</blockquote>
<p>The <a href="https://www.anthropic.com/news/higher-limits-spacex">Anthropic announcement</a> said that this deal meant they could "increase our usage limits for Claude Code and the Claude API", heavily implying that Colossus is being used for inference, not model training.</p>
<p>Anthropic already have vast amounts of compute from other providers. The fact that they're willing to spend $1.25 billion per month for extra capacity from just <em>one</em> of their vendors hints at how big these inference budgets have become.</p>
<h4 id="api-revenue-is-becoming-less-important">API revenue is becoming less important</h4>
<p>Over the past two years my impression has been that OpenAI made more of their income from subscription revenue while Anthropic made more from their API.</p>
<p>Anthropic's API revenue was historically quite dependent on a small number of large API customers - <a href="https://venturebeat.com/ai/anthropic-revenue-tied-to-two-customers-as-ai-pricing-war-threatens-margins">this VentureBeat story from August 2025</a> quotes "sources familiar with the matter" suggesting that just Cursor and GitHub Copilot were responsible for $1.2 billion of the company's then-$4 billion revenue.</p>
<p>Today Anthropic are rumored to hit <a href="https://www.wsj.com/tech/ai/mind-blowing-growth-is-about-to-propel-anthropic-into-its-first-profitable-quarter-7edbf2f4">$10.9 billion in the second quarter</a>, potentially even operating at a profit for the first time.</p>
<p>This pivot-to-Enterprise suggests that the labs have realized that the real money lies in cutting out the middlemen. Anthropic's Claude Code directly competes with Cursor and Copilot. No wonder Cursor are <a href="https://cursor.com/blog/composer-2">investing in their own models</a>!</p>
<h4 id="april-is-a-new-inflection-point">April is a new inflection point</h4>
<p>I've called November 2025 the <a href="https://simonwillison.net/tags/november-2025-inflection/">November inflection point</a> because that was when GPT-5.1 and Opus 4.5, combined with their respective coding agent harnesses, got <em>good</em> - good enough that we've spent the last six months adapting to agent systems that can reliably get useful work done.</p>
<p>I think April 2026 is a new inflection point where the revenue implications of this have started to land, to the benefit of the frontier AI labs and with material impacts on the budgets of large companies.</p>
<p>We'll know for sure how real this moment is when the S-1 documents for the upcoming Anthropic and OpenAI IPOs give us some real, audited numbers to get our teeth into.</p> |
| quotation |
2205 |
2026-05-27 06:41:43+00:00 |
PICARD: Data, shields up
DATA: Brilliant! Shields can reduce damage we sustain. Not immunity. Not hubris. Just prudence. It's not precaution—it's strategy.
[camera shakes]
WORF: HULL BREACHES ON NINE DECKS
DATA: Here's what happened: you told me to raise shields, and I didn't - Kyle Ferrana |
|
| blogmark |
9472 |
2026-05-26 23:48:45+00:00 |
The pressure - Lobste.rs |
Daniel Stenberg on the unprecedented level of pressure the `curl` team are facing right now thanks to the deluge of (credible) AI-assisted security issues being reported.
> The rate of incoming security reports is 4-5 times higher than it was in 2024 and double the speed of 2025 -- meaning that **on average we now get more than one report per day**. The quality is way higher than ever before. The reports are typically *very* detailed and long. [...]
>
> For the first time in my life, my wife voiced concerns about my work hours and my imbalanced work/life situation. I work more than I’ve done before, but the flood keeps coming. [...]
>
> This is a never-before seen or experienced pressure on the curl project and its security team members. An avalanche of high priority work that trumps all other things in the project that is primarily mental because we certainly *could* ignore them all if we wanted, but we feel a responsibility, we have a conscience and we are proud about our work.
The good news is that `curl` is a very solid piece of software, so the vulnerabilities people are finding tend not to be of high severity:
> What is also a good trend: almost no one finds *terrible* vulnerabilities. All vulnerabilities found the last few years in curl have *all* been deemed severity LOW or MEDIUM. I'm not saying there won't be any more HIGH ever, but at least they are rare. The [most recent severity high curl CVE](https://curl.se/docs/CVE-2023-38545.html) was published in October 2023. |
| blogmark |
9471 |
2026-05-26 15:36:48+00:00 |
Microsoft Copilot Cowork Exfiltrates Files - Hacker News |
The biggest challenge in designing agentic systems continues to be preventing them from enabling attackers to exfiltrate data.
In this case Microsoft Copilot Cowork (yes, that's [a real product name](https://www.microsoft.com/en-us/microsoft-365/blog/2026/03/09/copilot-cowork-a-new-way-of-getting-work-done/)) was allowing agents to send emails to the user's own inbox without approval... but those messages were then displayed in a way that could leak data to an attacker via rendered images:
> Because these messages can contain external images that trigger network requests to external websites, data can be exfiltrated when a user opens a compromised message sent by the agent.
Since OneDrive can create pre-authenticated download links, a successful prompt injection could cause those links to be leaked, allowing files to be downloaded by the attacker. |
| quotation |
2204 |
2026-05-26 15:02:30+00:00 |
A lot of the emails I get from founders are now written in a hard-hitting journalistic style. I know they're written by AI, because no founder ever wrote this way before. And once you realize something is written by AI, it's hard not to ignore it.
I have never knowingly finished reading an email signed by a human but written by AI. It feels like being lied to, and who would stand for that?
[[...](https://twitter.com/paulg/status/2058863028523659390)] It makes me think less of the author. It means they can't write well unaided (or feel they can't), and that they're trying to trick me.
It's not impressive to use AI to write stuff for you; any teenager can do that. - Paul Graham |
|
| quotation |
2203 |
2026-05-26 02:28:54+00:00 |
I cannot believe I'm saying this, but getting the literal Pope to canonize your product's specific technical limitations as a spiritual treatise is the single greatest act of vendor lobbying I have ever seen. - Corey Quinn |
|
| entry |
9307 |
2026-05-25 23:58:17+00:00 |
Notes on Pope Leo XIV's encyclical on AI |
<p>Dropped this morning by the Vatican: <a href="https://www.vatican.va/content/leo-xiv/en/encyclicals/documents/20260515-magnifica-humanitas.html">Magnifica Humanitas of His Holiness Pope Leo XIV on Safeguarding the Human Person in the Time of Artificial Intelligence</a>. This is a <em>very interesting</em> document. It's some of the clearest writing I've seen on the ethics of integrating AI into modern society.</p>
<p>Pope Leo XIV chose the name Leo in honor of Pope Leo XIII, who is known for his 1891 <em><a href="https://en.wikipedia.org/wiki/Rerum_novarum">Rerum novarum</a></em> encyclical on "Rights and Duties of Capital and Labor".</p>
<p><a href="https://www.vaticannews.va/en/church/news/2025-05/leo-xiii-s-times-and-our-own.html">This story</a> on Vatican News further clarifies the significance of that decision:</p>
<blockquote>
<p>Meeting with the College of Cardinals for their first formal encounter after his election, Pope Leo XIV explained part of the reason for the choice of his papal name. "There are different reasons for this," he said, before going on to explain that he chose the name Leo "mainly because Pope Leo XIII, in his historic encyclical <em><a href="https://www.vatican.va/content/leo-xiii/en/encyclicals/documents/hf_l-xiii_enc_15051891_rerum-novarum.html">Rerum novarum</a></em> addressed the social question in the context of the first great industrial revolution."</p>
<p>"In our own day," he continued, "the Church offers to everyone the treasury of her social teaching in response to another industrial revolution and to developments in the field of artificial intelligence that pose new challenges for the defence of human dignity, justice, and labour."</p>
</blockquote>
<p>And now we get Pope Leo XIV's own encyclical on the AI revolution. There's a lot in here, but the writing style is very approachable, including to non-Catholics.</p>
<h4 id="a-few-of-my-highlights">A few of my highlights</h4>
<p><small>(I listened to most of the encyclical on a walk with our dog, my first time trying the <a href="https://apps.apple.com/us/app/elevenreader-read-books-aloud/id6479373050">ElevenReader iPhone app</a>. It worked very well: I pasted in a URL to the document and it read it to me in a very high quality voice, highlighting each paragraph as it went.)</small></p>
<p>Here are some of my highlights. In each case below <strong>emphasis</strong> is mine.</p>
<p>Here's a useful description of the interpretability problem for LLMs in section 98:</p>
<blockquote>
<p>First, any statement regarding AI risks becoming quickly outdated, given the remarkable pace at which these systems are developing. Second, all of us, including those who design them, possess only a limited understanding of their actual functioning. Indeed, <strong>current AI systems are more “cultivated” than “built,” for developers do not directly design every detail, but instead create a framework within which the intelligence “grows.”</strong> As a result, fundamental scientific aspects — such as the internal representations and computational processes of these systems — remain, at present, unknown.</p>
</blockquote>
<p>I liked section 83's description of the relationship between development and dignity:</p>
<blockquote>
<p>For individuals as well as for nations, development is both a duty and a right. Minimum conditions are required for enabling every person and people to flourish in accord with their dignity, without being kept in a state of dependence or excluded from access to necessary goods. Development is truly human when it places people at the center instead of the accumulation of wealth, and when it concerns peoples as well as individuals. Justice demands the recognition of the rights of society and the rights of peoples, and includes a responsibility toward future generations. <strong>Development is not truly human if it increases consumption for some while shifting costs and burdens onto others, or relegates entire regions to subordinate roles, preventing them from realizing their full potential</strong>.</p>
</blockquote>
<p>Baked in cultural biases and sycophancy get a mention in section 100:</p>
<blockquote>
<p>In personal use, three aspects in particular deserve careful consideration: the ease with which results are obtained, the impression of objectivity and the simulation of human communication. The speed and simplicity with which information, complex analyses, media content and practical assistance can be accessed undoubtedly makes life easier. Yet they can also encourage excessive reliance and the search for ready-made answers, and weaken personal creativity and judgment. <strong>The apparent objectivity of the responses and suggestions these systems provide can lead us to overlook the fact that they reflect the cultural assumptions of those who designed and trained them, with all their strengths and limitations</strong>. The artificial imitation of positive human communication — words of advice, empathy, friendship and even love — can be engaging and at times genuinely helpful. <strong>However, for less discerning users, it can also be misleading, creating the illusion of a relationship with a real personal subject</strong>. When words are simulated, they do not build genuine relationships, but only their appearance. The artificial imitation of care or support can become particularly risky when it enters contexts where real relationships and emotional bonds are lacking.</p>
</blockquote>
<p>101 touches on the environmental impact:</p>
<blockquote>
<p>Current AI systems require enormous amounts of energy and water, significantly influencing carbon dioxide emissions, and place heavy demands on natural resources. <strong>As their complexity increases, especially in the case of large language models, the need for computing power and storage capacity grows too, which requires an extensive network of machines, cables, data centers and energy-intensive infrastructure</strong>. For this reason, it is essential to develop more sustainable technological solutions that reduce environmental impact and help protect our common home.</p>
</blockquote>
<p>102 covers the risks of algorithmic systems making decisions that impact people's lives without "compassion, mercy, forgiveness":</p>
<blockquote>
<p>The use of AI is never a purely technical matter: <strong>when it enters processes that affect people’s lives, it touches on rights, opportunities, status and freedom</strong>. Important and sensitive decisions — concerning employment, credit, access to public services or even a person’s reputation — <strong>risk being fully delegated to automated systems that do not know “compassion, mercy, forgiveness, and above all, the hope that people are able to change,”</strong> and can therefore give rise to new forms of exclusion.</p>
</blockquote>
<p>105 emphasizes the need for human accountability in how these systems are applied:</p>
<blockquote>
<p>For AI to respect human dignity and truly serve the common good, responsibility must be clearly defined at every stage: <strong>from those who design and develop these systems to those who use them and rely on them for concrete decisions</strong>. In many cases, however, the internal processes leading to a result remain opaque, making it harder to assign responsibility and correct errors. <strong>This is where accountability becomes crucial: the possibility of identifying who must “account” for decisions, justify them, monitor them, and, when necessary, challenge them and remedy any harm caused</strong>.</p>
</blockquote>
<p>And 108 touches on the way AI amplifies the power of those with resources:</p>
<blockquote>
<p>In fact, as with every major technological shift, <strong>AI tends to amplify the power of those who already possess economic resources, expertise and access to data</strong>. In light of the common good and the universal destination of goods, this raises serious concerns, since small but highly influential groups can shape information and consumption patterns, influence democratic processes and steer economic dynamics to their own advantage, undermining social justice and solidarity among peoples. For this reason, it is essential that the use of AI, especially when it touches on public goods and fundamental rights, be guided by clear criteria and effective oversight, grounded in participation and subsidiarity.</p>
</blockquote>
<p>That same section explicitly calls out data as something that should be thought of more as a public good:</p>
<blockquote>
<p>[...] Moreover, <strong>ownership of data cannot be left solely in private hands</strong> but must be appropriately regulated. <strong>Data is the product of many contributors and should not be treated as something to be sold off or entrusted to a select few</strong>. It is necessary to think creatively in order to manage data as a common or shared good, in a spirit of participation, as <a href="https://www.vatican.va/content/john-paul-ii/en.html">Saint John Paul II</a> already suggested regarding collective goods.</p>
</blockquote>
<p>Given that Palantir is named after a <em>Lord of the Rings</em> reference, I can't help but wonder if the J.R.R. Tolkien quote from <em>The Return of the King</em> (section 213) was the Pope throwing a little shade at Peter Thiel.</p>
<blockquote>
<p>The twentieth-century Catholic author J.R.R. Tolkien, in the words of a protagonist in one of his novels, described our responsibility in this way: “It is not our part to master all the tides of the world, but to do what is in us for the succour of those years wherein we are set, uprooting the evil in the fields that we know, so that those who live after may have clean earth to till.” The civilization of love will not arise from a single or spectacular gesture, but from the sum total of small and steadfast acts of fidelity that serve as a bulwark against dehumanization. For this reason, it is worthwhile pausing to reflect on some aspects of how we, each in our own way, can cooperate in building the civilization of love.</p>
</blockquote>
<h4 id="another-2026-prediction-down">Another 2026 prediction down</h4>
<p>On 6th January this year I joined the <a href="https://oxide-and-friends.transistor.fm/episodes/predictions-2026">Oxide and Friends 2026 predictions</a> podcast episode to talk about predictions for 2026, 2029 and 2032. I <a href="https://simonwillison.net/2026/Jan/8/llm-predictions-for-2026/">wrote mine up here</a>, with hindsight they weren't nearly ambitious enough - it's already undeniable that LLMs write good code, we've made huge advances in sandboxing and New Zealand kākāpō have indeed <a href="https://news.mongabay.com/short-article/2026/03/critically-endangered-kakapo-parrot-has-standout-breeding-season/">had a truly excellent breeding season</a>.</p>
<p>There's one segment from the episode that I didn't bother to include in my write-up, but that I can't resist providing as a lightly-edited transcript here:</p>
<blockquote>
<p><strong>Bryan Cantrill:</strong> <a href="https://oxide-and-friends.transistor.fm/episodes/predictions-2026/transcript#t=37m13s">37:13</a></p>
<p>I think that AI has created some real public perception problems for itself. And I think that you are gonna have one of the frontier model companies, this year, have a white paper explaining how the proliferation of AI will mean prosperity for everybody. They will be trying to make some economic argument - because this is gonna be a 2026 election issue, how we think of these things and how they are regulated and it's a big mess. There's more heat than light in this debate.</p>
<p><strong>Simon Willison:</strong> <a href="https://oxide-and-friends.transistor.fm/episodes/predictions-2026/transcript#t=38m5s">38:05</a></p>
<p>I'd like to tag something on to that one: I think that only works if they can sort of wash that through existing trusted experts. Sam Altman and Dario are constantly publishing essays about this stuff and nobody believes a word they say. Get Barack Obama's signature on one of these position papers and <em>maybe</em> you've got something people might start to trust a little bit.</p>
<p><strong>Adam Leventhal:</strong> <a href="https://oxide-and-friends.transistor.fm/episodes/predictions-2026/transcript#t=38m27s">38:27</a></p>
<p>Otherwise, it's just like "leaded gas is good for you", says Exxon.</p>
<p><strong>Bryan Cantrill:</strong> <a href="https://oxide-and-friends.transistor.fm/episodes/predictions-2026/transcript#t=38m31s">38:31</a></p>
<p>I mean, yeah. God. Obama... let's go with that, that's a great one because if it's like Bill Clinton everyone's gonna kind of roll their eyes, so it's gotta be someone who's got real credibility saying that this is gonna be broad-based... I'd say if they get that person to do it, it's gonna be revealed that that's also a bit crooked.</p>
<p><strong>Simon Willison:</strong> <a href="https://oxide-and-friends.transistor.fm/episodes/predictions-2026/transcript#t=38m57s">38:57</a></p>
<p>How about the Pope?</p>
<p><strong>Bryan Cantrill:</strong> <a href="https://oxide-and-friends.transistor.fm/episodes/predictions-2026/transcript#t=39m1s">39:01</a></p>
<p>The Pope is very into this stuff! That's a great prediction. We've hit pay dirt. The Pope weighing in on LLMs and their economic impact on the world.</p>
<p>Simon, I'm giving you full credit if the Pope weighs in believing that this is gonna be economic devastation.</p>
</blockquote>
<p>My prediction here looks a whole lot less insightful given the Leo XIV/Leo XIII relationship, which I was unaware of when we recorded the episode!</p> |
| blogmark |
9470 |
2026-05-25 20:22:56+00:00 |
Magnifica Humanitas of His Holiness Pope Leo XIV on Safeguarding the Human Person in the Time of Artificial Intelligence - Hacker News |
This is a *very interesting* document.
Pope Leo XIV chose the name Leo in honor of Pope Leo XIII, who is known for his 1891 *[Rerum novarum](https://en.wikipedia.org/wiki/Rerum_novarum)* encyclical on "Rights and Duties of Capital and Labor".
[This story](https://www.vaticannews.va/en/church/news/2025-05/leo-xiii-s-times-and-our-own.html) on Vatican News further clarifies the significance of that decision:
> Meeting with the College of Cardinals for their first formal encounter after his election, Pope Leo XIV explained part of the reason for the choice of his papal name. "There are different reasons for this," he said, before going on to explain that he chose the name Leo "mainly because Pope Leo XIII, in his historic encyclical *[Rerum novarum](https://www.vatican.va/content/leo-xiii/en/encyclicals/documents/hf_l-xiii_enc_15051891_rerum-novarum.html)* addressed the social question in the context of the first great industrial revolution."
>
> "In our own day," he continued, "the Church offers to everyone the treasury of her social teaching in response to another industrial revolution and to developments in the field of artificial intelligence that pose new challenges for the defence of human dignity, justice, and labour."
And now we get Pope Leo XIV's own encyclical on the AI revolution. I'm still working my way through it. There's a lot in here, but the writing style is very approachable, including to non-Catholics.
I can't resist including this lightly edited segment of the transcript of our [Oxide and Friends 2026 predictions](https://simonwillison.net/2026/Jan/8/llm-predictions-for-2026/) podcast episode from the 6th of January this year:
> **Bryan Cantrill:** [37:13](https://oxide-and-friends.transistor.fm/episodes/predictions-2026/transcript#t=37m13s)
>
> I think that AI has created some real public perception problems for itself. And I think that you are gonna have one of the frontier model companies, this year, have a white paper explaining how the proliferation of AI will mean prosperity for everybody. They will be trying to make some economic argument - because this is gonna be a 2026 election issue, how we think of these things and how they are regulated and it's a big mess. There's more heat than light in this debate.
>
> **Simon Willison:** [38:05](https://oxide-and-friends.transistor.fm/episodes/predictions-2026/transcript#t=38m5s)
>
> I'd like to tag something on to that one: I think that only works if they can sort of wash that through existing trusted experts. Sam Altman and Dario are constantly publishing essays about this stuff and nobody believes a word they say. Get Barack Obama's signature on one of these position papers and *maybe* you've got something people might start to trust a little bit.
>
> **Adam Leventhal:** [38:27](https://oxide-and-friends.transistor.fm/episodes/predictions-2026/transcript#t=38m27s)
>
> Otherwise, it's just like "leaded gas is good for you", says Exxon.
>
> **Bryan Cantrill:** [38:31](https://oxide-and-friends.transistor.fm/episodes/predictions-2026/transcript#t=38m31s)
>
> I mean, yeah. God. Obama... let's go with that, that's a great one because if it's like Bill Clinton everyone's gonna kind of roll their eyes, so it's gotta be someone who's got real credibility saying that this is gonna be broad-based... I'd say if they get that person to do it, it's gonna be revealed that that's also a bit crooked.
>
> **Simon Willison:** [38:57](https://oxide-and-friends.transistor.fm/episodes/predictions-2026/transcript#t=38m57s)
>
> How about the Pope?
>
> **Bryan Cantrill:** [39:01](https://oxide-and-friends.transistor.fm/episodes/predictions-2026/transcript#t=39m1s)
>
> The Pope is very into this stuff! That's a great prediction. We've hit pay dirt. The Pope weighing in on LLMs and their economic impact on the world.
>
> Simon, I'm giving you full credit if the Pope weighs in believing that this is gonna be economic devastation.
(My prediction here looks a whole lot less insightful given the Leo XIV/Leo XIII relationship, which I was unaware of when we recorded the episode!) |
| quotation |
2202 |
2026-05-24 18:46:53+00:00 |
The most frustrating failure mode right now is that people submit issues that are not in their own voice. They contain an observed problem somewhere, but it has been thrown into a clanker and the clanker reworded it and made a huge mess of it. Typically, it was prompted so badly that the conclusions produced are more often than not inaccurate but always full of confidence. The result is complete guesswork on root causes, fake-minimal repros, suggested implementation strategies, analogies to adjacent but often the wrong code, and long lists of error classes that might or might not matter. [...]
So at least personally, I increasingly want issue reports to be condensed to what the human actually observed:
1. I ran this command.
2. I expected this to happen.
3. This happened instead.
4. Here is the exact error or log. - Armin Ronacher |
|
| blogmark |
9469 |
2026-05-23 20:24:48+00:00 |
On the <dl> - Hacker News |
I learned a few new-to-me things about the `<dl>` element from this article by Ben Meyer:
1. A `<dt>` can be followed by *multiple* `<dd>`
2. You can optionally group the `<dt>` and `<dd>` elements in a `<div>` for styling - but only a `<div>`.
3. You can label them using ARIA.
4. They've been called "description lists", not "definition lists", since [an HTML5 draft in 2008](https://www.w3.org/TR/2008/WD-html5-20080122/#the-dl).
So this is valid:
<pre><span class="pl-kos"><</span><span class="pl-ent">h2</span> <span class="pl-c1">id</span>="<span class="pl-s">credits</span>"<span class="pl-kos">></span>Credits<span class="pl-kos"></</span><span class="pl-ent">h2</span><span class="pl-kos">></span>
<span class="pl-kos"><</span><span class="pl-ent">dl</span> <span class="pl-c1">aria-labelledby</span>="<span class="pl-s">credits</span>"<span class="pl-kos">></span>
<span class="pl-kos"><</span><span class="pl-ent">div</span><span class="pl-kos">></span>
<span class="pl-kos"><</span><span class="pl-ent">dt</span><span class="pl-kos">></span>Author<span class="pl-kos"></</span><span class="pl-ent">dt</span><span class="pl-kos">></span>
<span class="pl-kos"><</span><span class="pl-ent">dd</span><span class="pl-kos">></span>Jeffrey Zeldman<span class="pl-kos"></</span><span class="pl-ent">dd</span><span class="pl-kos">></span>
<span class="pl-kos"><</span><span class="pl-ent">dd</span><span class="pl-kos">></span>Ethan Marcotte<span class="pl-kos"></</span><span class="pl-ent">dd</span><span class="pl-kos">></span>
<span class="pl-kos"></</span><span class="pl-ent">div</span><span class="pl-kos">></span>
<span class="pl-kos"></</span><span class="pl-ent">dl</span><span class="pl-kos">></span></pre>
Here's a useful note from Adrian Roselli on [screen reader support for description lists](https://adrianroselli.com/2025/01/updated-brief-note-on-description-list-support.html). |
| blogmark |
9468 |
2026-05-22 22:01:31+00:00 |
The memory shortage is causing a repricing of consumer electronics - Hacker News |
David Oks provides the clearest explanation I've seen yet of why consumer products that use memory are likely to get significantly more expensive over the next few years.
The short version is that memory manufacturers - of which there are just three remaining large companies - have a fixed capacity in terms of how many wafers they can process at any one time. This fixed wafer capacity is then split between DDR - used in desktops and servers, LPDDR - used in mobile phones and low-energy devices, and HBM - used with GPUs.
Until recently, HBM got just 2% of that wafer allocation. The enormous growth in AI data centers has pushed that up to an expected 20% by the end of 2026, and "a single gigabyte of HBM consumes more than three times the wafer capacity that a gigabyte of DDR or LPDDR does".
Memory companies have learned from the extinction of their rivals that you should always under-provision rather than over-provision your fabricator capacity. The profit margins and demand for HBM (high-bandwidth memory) will constrain the production of consumer-device RAM for several years.
This is already being felt in the sub-$100 smartphone market, which is particularly important to markets like Africa and South Asia.
(The original title of the piece was "AI is killing the cheap smartphone" but I'm using the Hacker News rephrased title, which I think does more justice to the content.) |
| blogmark |
9467 |
2026-05-22 04:48:32+00:00 |
FTC to Require Cox Media Group, Two Other Firms to Pay Nearly $1 Million to Settle Charges They Deceived Customers About “Active Listening” AI-Powered Marketing Service - @nydiatisdale |
Back in 2024 Cox Media Group were caught trying to sell advertisers packages based on "active listening", with [this deck](https://www.documentcloud.org/documents/25051283-cmg-pitch-deck-on-voice-data-advertising-active-listening/) which claimed:
> - Smart devices capture real-time intent data by listening to our conversations
> - Advertisers can pair this voice-data with behavioral data to target in-market consumers
I wrote about this [in September 2024](https://simonwillison.net/2024/Sep/2/facebook-cmg/). My theory:
> I think **active listening** is the term that the team came up with for “something that sounds fancy but really just means the way ad targeting platforms work already”. Then they got over-excited about the new metaphor and added that first couple of slides that talk about “voice data”, without really understanding how the tech works or what kind of a shitstorm that could kick off when people who DID understand technology started paying attention to their marketing.
This FTC press release appears to confirm that's pretty much what happened:
> CMG, MindSift and 1010 Digital Works claimed their “Active Listening” branded marketing service listened in on consumers’ conversations overheard by smart devices, in real time, to target advertising [...]
>
> According to the complaints, this service did not, in fact, listen in on consumers’ conversations or use voice data at all—nor did the service accurately place ads in customers’ desired locations. Instead, the service the companies provided consisted of reselling—at a significant markup—email lists obtained from other data brokers.
The FTC also clarify that hiding an "opt-in" to using voice data in terms of service would not be acceptable, as tricks like that do not constitute "adequate consent":
> The FTC also alleged that all three companies deceived potential customers by claiming that consumers had opted into the Active Listening service. The company, however, did not seek or obtain consumers’ consent, according to the complaints. Instead, the companies claimed that consumers had “opted in” by agreeing to the terms of service that people have to accept when downloading and using apps. Clicking through mandatory terms of service does not constitute “opt-in consent” for such an invasive service or for use of consumers’ voice data from inside their homes. If the Active Listening service had functioned as advertised, this collection and use of consumers’ voice data without adequate consent would itself violate Section 5 of the FTC Act.
Attempting to myth bust [the conspiracy theory](https://simonwillison.net/tags/microphone-ads-conspiracy/) that our mobile devices target ads to us based on spying through the microphones continues to be my least rewarding niche online hobby. It's nice to have a new piece of ammunition. |
| entry |
9306 |
2026-05-21 19:52:19+00:00 |
Datasette Agent |
<p>We just <a href="https://datasette.io/blog/2026/datasette-agent/">announced the first release of Datasette Agent</a>, a new extensible AI assistant for Datasette. I've been working on my <a href="https://llm.datasette.io/">LLM</a> Python library for just over three years now, and Datasette Agent represents the moment that LLM and <a href="https://datasette.io/">Datasette</a> finally come together. I'm really excited about it!</p>
<p>Datasette Agent provides a conversational interface for asking questions of the data you have stored in Datasette. Add the <a href="https://github.com/datasette/datasette-agent-charts">datasette-agent-charts</a> plugin and it can generate charts of your data as well.</p>
<h4 id="the-demo">The demo</h4>
<p>The <a href="">announcement post</a> (on the new Datasette project blog) includes this <a href="https://www.youtube.com/watch?v=AFZKp6hbFjI">demo video</a>:</p>
<iframe style="margin-bottom: 1.5em;" width="560" height="315" src="https://www.youtube-nocookie.com/embed/AFZKp6hbFjI" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share" referrerpolicy="strict-origin-when-cross-origin" allowfullscreen="allowfullscreen"> </iframe>
<p>I recorded the video against the new <a href="https://agent.datasette.io/">agent.datasette.io</a> live demo instance, which runs Datasette Agent against example databases including the classic <a href="https://datasette.io/global-power-plants">global-power-plants</a> by <a href="https://www.wri.org/research/global-database-power-plants">WRI</a>, and a copy of the <a href="https://datasette.simonwillison.net/">Datasette backup</a> of my blog.</p>
<p>The live demo runs on <a href="https://ai.google.dev/gemini-api/docs/models/gemini-3.1-flash-lite">Gemini 3.1 Flash-Lite</a> - it's cheap, fast and has no trouble writing SQLite queries.</p>
<p>A question I asked in the demo was:</p>
<blockquote>
<p>when did Simon most recently see a pelican?</p>
</blockquote>
<p>Which ran <a href="https://datasette.simonwillison.net/simonwillisonblog?sql=SELECT+title%2C+commentary%2C+created+FROM+blog_beat+WHERE+beat_type+%3D+%27sighting%27+AND+%28title+LIKE+%27%25pelican%25%27+OR+commentary+LIKE+%27%25pelican%25%27%29+ORDER+BY+created+DESC+LIMIT+5">this SQL query</a>:</p>
<div class="highlight highlight-source-sql"><pre><span class="pl-k">SELECT</span>
title, commentary, created
<span class="pl-k">FROM</span>
blog_beat
<span class="pl-k">WHERE</span>
beat_type <span class="pl-k">=</span> <span class="pl-s"><span class="pl-pds">'</span>sighting<span class="pl-pds">'</span></span>
<span class="pl-k">AND</span> (title <span class="pl-k">LIKE</span> <span class="pl-s"><span class="pl-pds">'</span>%pelican%<span class="pl-pds">'</span></span> <span class="pl-k">OR</span> commentary <span class="pl-k">LIKE</span> <span class="pl-s"><span class="pl-pds">'</span>%pelican%<span class="pl-pds">'</span></span>)
<span class="pl-k">ORDER BY</span>
created <span class="pl-k">DESC</span>
<span class="pl-k">LIMIT</span>
<span class="pl-c1">5</span></pre></div>
<p>And replied:</p>
<blockquote>
<p>The most recent sighting of a pelican by Simon was recorded on <strong>May 20, 2026</strong>.</p>
<p>The observation included a California Brown Pelican, along with a Common Loon, Canada Goose, Striped Shore Crab, and a California Sea Lion.</p>
</blockquote>
<p>Here's <a href="https://simonwillison.net/2026/May/20/sighting-363395265/">that sighting on my blog</a>, and the <a href="https://gist.github.com/simonw/a46d17b69659a4866adb1d868280091d">Markdown export</a> of the full conversation transcript.</p>
<h4 id="the-plugins">The plugins</h4>
<p>My favorite feature of Datasette Agent is that, like the rest of Datasette, it's extensible using plugins.</p>
<p>We've shipped three plugins so far:</p>
<ul>
<li>
<a href="https://github.com/datasette/datasette-agent-charts">datasette-agent-charts</a>, shown in the video, adds charts to Datasette Agent, powered by <a href="https://observablehq.com/plot/">Observable Plot</a>.</li>
<li>
<a href="https://github.com/datasette/datasette-agent-openai-imagegen">datasette-agent-openai-imagegen</a> adds an image generation tool to Datasette Agent using <a href="https://openai.com/index/introducing-chatgpt-images-2-0/">ChatGPT Images 2.0</a>.</li>
<li>
<a href="https://github.com/datasette/datasette-agent-sprites">datasette-agent-sprites</a> provides tools for executing code in a <a href="https://sprites.dev/">Fly Sprites</a> persistent sandbox.</li>
</ul>
<p>Building plugins is <em>really fun</em>. I have a bunch more prototypes that aren't quite alpha-quality yet.</p>
<p>Claude Code and OpenAI Codex are both proving excellent at writing plugins - just point them at a checkout of the <a href="https://github.com/datasette/datasette-agent">datasette-agent repo</a> for reference and tell them what you want to build!</p>
<h4 id="running-it-against-local-models">Running it against local models</h4>
<p>I've also been having fun running the new plugin against local models. Here's a <code>uv</code> one-liner to run the plugin against <a href="https://huggingface.co/google/gemma-4-26B-A4B">gemma-4-26b-a4b</a> in <a href="https://lmstudio.ai">LM Studio</a> on a Mac:</p>
<div class="highlight highlight-source-shell"><pre>uvx --prerelease=allow \
--with datasette-agent --with llm-lmstudio \
datasette --internal internal.db --root \
-s plugins.datasette-llm.default_model lmstudio/google/gemma-4-26b-a4b \
data.db</pre></div>
<p>Datasette Agent needs reliable tool calls and the ability for a model to produce SQL queries that run against SQLite. The open weight models released in the past six months are increasingly able to handle that.</p>
<h4 id="what-s-next">What's next</h4>
<p>Datasette Agent opens up <em>so many</em> opportunities for the LLM and Datasette ecosystem in general.</p>
<p>It's already informed <a href="https://simonwillison.net/2026/Apr/29/llm/">the major LLM 0.32a0 refactor</a> which I'm nearly ready to roll into a stable release, maybe with some additional "LLM agent" abstractions extracte from Datasette Agent itself.</p>
<p>I've been exploring my own take on the Claude Artifacts, which is shaping up nicely as a plugin.</p>
<p>I'm excited to use Datasette Agent to build my own <a href="https://simonwillison.net/2026/May/19/5-minute-llms/#5-minutes-llms.013.jpeg">Claw</a> - a personal AI assistant built around data imported from different parts of my digital life, which is a neat excuse to revisit my older <a href="https://dogsheep.github.io">Dogsheep</a> family of tools.</p>
<p>We'll also be rolling out Datasette Agent for users of <a href="https://datasette.cloud/">Datasette Cloud</a>.</p>
<p>Join our <a href="https://discord.gg/hdxyusUFv">#datasette-agent Discord channel</a> if you'd like to talk about the project.</p> |
| quotation |
2201 |
2026-05-20 22:26:36+00:00 |
We have the ability to use compute resources to support our proprietary AI applications (such as Grok 5, which is currently being trained at COLOSSUS II), while also providing access to select compute capacity to third-party customers. For example, in May 2026, we entered into **Cloud Services Agreements with Anthropic PBC** (“Anthropic”), an AI research and development public benefit corporation, with respect to access to **compute capacity across COLOSSUS and COLOSSUS II**. Pursuant to these agreements, the customer **has agreed to pay us $1.25 billion per month** through May 2029, with capacity ramping in May and June 2026 at a reduced fee. The agreements may be terminated by either party upon 90 days’ notice. - SpaceX S-1 |
|
| blogmark |
9466 |
2026-05-20 17:57:45+00:00 |
How fast is 10 tokens per second really? - Hacker News |
Neat little HTML app by Mike Veerman ([source code here](https://github.com/MikeVeerman/tokenspeed/blob/master/index.html)) which simulates LLM token output speeds from 5/second to 800/second.
Useful if you see a model advertised as "30 tokens/second" and want to get a feel for what that actually looks like. |
| entry |
9305 |
2026-05-19 22:40:25+00:00 |
Gemini 3.5 Flash: more expensive, but Google plan to use it for everything |
<p>Today at Google I/O, Google <a href="https://blog.google/innovation-and-ai/models-and-research/gemini-models/gemini-3-5/">released Gemini 3.5 Flash</a>. This one skipped the <code>-preview</code> modifier and went straight to general availability, and Google appear to be using it for a whole lot of their key products:</p>
<blockquote>
<p>3.5 Flash is available today to billions of people globally:</p>
<ul>
<li>For everyone via the Gemini app and AI Mode in <a href="https://blog.google/products-and-platforms/products/search/search-io-2026">Google Search</a>
</li>
<li>For developers in our agent-first development platform Google Antigravity and Gemini API in Google AI Studio and Android Studio</li>
<li>For enterprises in Gemini Enterprise Agent Platform and Gemini Enterprise.</li>
</ul>
</blockquote>
<p>As usual with Gemini, the most interesting details are tucked away in the <a href="https://ai.google.dev/gemini-api/docs/whats-new-gemini-3.5">What's new in Gemini 3.5 Flash</a> developer documentation. It mostly has the same set of platform features as the previous Gemini 3.x series, albeit with no <a href="https://ai.google.dev/gemini-api/docs/computer-use">computer use</a>. The model ID is <code>gemini-3.5-flash</code>. The knowledge cut-off is January 2025, and it supports 1,048,576 input tokens and 65,536 maximum output tokens.</p>
<p>Google are also pushing a new <a href="https://ai.google.dev/gemini-api/docs/interactions">Interactions API</a>, currently in beta, which looks to me like their version of the patterns introduced by <a href="https://developers.openai.com/api/reference/responses/overview">OpenAI Responses</a> - in particular server-side history management.</p>
<h4 id="the-price-has-gone-up">The price has gone up</h4>
<p>Gemini 3.5 Flash is accompanied by a notable price bump. The previous models in the "Flash" family were <a href="https://ai.google.dev/gemini-api/docs/models/gemini-3-flash-preview">Gemini 3 Flash Preview</a> and <a href="https://ai.google.dev/gemini-api/docs/models/gemini-3.1-flash-lite">Gemini 3.1 Flash-Lite</a>. The new 3.5 Flash is 3x the price of 3 Flash Preview and 6x the price of 3.1 Flash-Lite (see <a href="https://www.llm-prices.com/#sel=gemini-3-flash-preview%2Cgemini-3.5-flash%2Cgemini-3.1-flash-lite-preview">price comparison here</a>).</p>
<p>At $1.50/million input and $9/million output it's getting close in price to Google's Gemini 3.1 Pro, which is $2 and $12.</p>
<p>The Gemini team promise that 3.5 Pro will roll out "next month" - presumably at an even higher price.</p>
<p>This fits a trend: OpenAI's GPT-5.5 was 2x the price of GPT-5.4, and Claude Opus 4.7 is around 1.46x the price of 4.6 when you take the <a href="https://simonwillison.net/2026/Apr/20/claude-token-counts/">new tokenizer into account</a>.</p>
<p>Given the price increase it's interesting to see Google roll it out for so many of their own free-to-consumer products. It feels like all three of the major AI labs are starting to probe the price tolerance of their API customers.</p>
<p>Artificial Analysis publish the cost to run their proprietary benchmark against models, which is a useful way to take things like tokenization and increased volume of reasoning tokens into account. Some numbers worth comparing:</p>
<ul>
<li>
<a href="https://artificialanalysis.ai/models/gemini-3-5-flash">Gemini 3.5 Flash (high)</a>: $1,551.60</li>
<li>
<a href="https://artificialanalysis.ai/models/gemini-3-1-pro-preview">Gemini 3.1 Pro Preview</a>: $892.28</li>
<li>
<a href="https://artificialanalysis.ai/models/gemini-3-flash-reasoning">Gemini 3 Flash Preview (Reasoning)</a>: $278.26</li>
<li>
<a href="https://artificialanalysis.ai/models/gemini-3-1-flash-lite-preview">Gemini 3.1 Flash-Lite Preview</a>: $93.60</li>
</ul>
<p>Running the benchmark for 3.5 Flash (high) cost significantly more than 3.1 Pro Preview!</p>
<p>Here are some numbers from other vendors:</p>
<ul>
<li>
<a href="https://artificialanalysis.ai/models/claude-opus-4-7">Claude Opus 4.7 (Adaptive Reasoning, Max Effort)</a>: $5,117.14</li>
<li>
<a href="https://artificialanalysis.ai/models/claude-opus-4-7-non-reasoning">Claude Opus 4.7 (Non-reasoning, High Effort)</a>: $1,217.23</li>
<li>
<a href="https://artificialanalysis.ai/models/gpt-5-5">GPT-5.5 (xhigh)</a>: $3,357.00</li>
<li>
<a href="https://artificialanalysis.ai/models/gpt-5-5-medium">GPT-5.5 (medium)</a>: $1,199.14</li>
</ul>
<h4 id="a-pelican-on-a-bicycle">A pelican on a bicycle</h4>
<p>I ran "Generate an SVG of a pelican riding a bicycle" <a href="https://gist.github.com/simonw/09cc5a5545d7e75b33b75ffa92a34601">against the Gemini API</a> and got back this pelican, which is a <em>lot</em>:</p>
<p><img src="https://static.simonwillison.net/static/2026/gemini-3.5-flash.png" alt="Black background, bats in the sky against a stylized moon. Pelican is funky looking. Very good beak. Bicycle frame is a bit twisted, and the bar from pedals to back wheel is missing. Bike lamp illuminates the road in front. Quite stylish." style="max-width: 100%;" /></p>
<p>From the code comments: <code><!-- Pelican Eye / Sunglasses (Cool Retro Aviators) --></code></p>
<p><a href="https://news.ycombinator.com/item?id=48196570#48198275">hedgehog on Hacker News</a>:</p>
<blockquote>
<p>That pelican looks like it's in Miami for a crypto conference.</p>
</blockquote>
<p>That one cost me 11 input tokens and 14,403 output tokens, for a total cost of <a href="https://www.llm-prices.com/#it=11&ot=14403&sel=gemini-3.5-flash">just under 13 cents</a>.</p> |
| entry |
9304 |
2026-05-19 01:09:44+00:00 |
The last six months in LLMs in five minutes |
<p>I put together these annotated slides from my five minute lightning talk at PyCon US 2026, using the <a href="https://tools.simonwillison.net/annotated-presentations">latest iteration</a> of my <a href="https://simonwillison.net/2023/Aug/6/annotated-presentations/">annotated presentation tool</a>.</p>
<div class="slide" id="5-minutes-llms.001.jpeg">
<img loading="lazy" src="https://static.simonwillison.net/static/2026/5-minutes-llms/5-minutes-llms.001.jpeg" alt="The last six months in LLMs in
five minutes
Simon Willison - simonwillison.net
PyCon US 2026 Lightning Talk
" style="max-width: 100%" />
<div><a style="float: right; text-decoration: none; border-bottom: none; padding-left: 1em;" href="https://simonwillison.net/2026/May/19/5-minute-llms/#5-minutes-llms.001.jpeg">#</a>
<p>I presented this lightning talk at PyCon US 2026, attempting to summarize the last six months of developments in LLMs in five minutes.</p>
</div>
</div>
<div class="slide" id="5-minutes-llms.002.jpeg">
<img loading="lazy" src="https://static.simonwillison.net/static/2026/5-minutes-llms/5-minutes-llms.002.jpeg" alt="The November inflection point
" style="max-width: 100%" />
<div><a style="float: right; text-decoration: none; border-bottom: none; padding-left: 1em;" href="https://simonwillison.net/2026/May/19/5-minute-llms/#5-minutes-llms.002.jpeg">#</a>
<p>Six months is a pretty convenient time period to cover, because it captures what I've been calling the <a href="https://simonwillison.net/tags/november-2025-inflection/">November 2025 inflection point</a>. November was a critical month in LLMs, especially for coding.</p>
</div>
</div>
<div class="slide" id="5-minutes-llms.003.jpeg">
<img loading="lazy" src="https://static.simonwillison.net/static/2026/5-minutes-llms/5-minutes-llms.003.jpeg" alt="The “best” model changed hands 5 times
between Anthropic, OpenAl and Google
" style="max-width: 100%" />
<div><a style="float: right; text-decoration: none; border-bottom: none; padding-left: 1em;" href="https://simonwillison.net/2026/May/19/5-minute-llms/#5-minutes-llms.003.jpeg">#</a>
<p>For one thing, the supposedly "best" model (depending mostly on vibes) changed hands five times between the three big providers.</p>
</div>
</div>
<div class="slide" id="5-minutes-llms.004.jpeg">
<img loading="lazy" src="https://static.simonwillison.net/static/2026/5-minutes-llms/5-minutes-llms.004.jpeg" alt="Generate an SVG of a
pelican riding a bicycle
" style="max-width: 100%" />
<div><a style="float: right; text-decoration: none; border-bottom: none; padding-left: 1em;" href="https://simonwillison.net/2026/May/19/5-minute-llms/#5-minutes-llms.004.jpeg">#</a>
<p>As always, I'm using my <a href="https://simonwillison.net/tags/pelican-riding-a-bicycle/">Generate an SVG of a pelican riding a bicycle</a> test to help illustrate the differences between the models.</p>
<p>Why this test? Because pelicans are hard to draw, bicycles are hard to draw, pelicans <em>can't ride bicycles</em>... and there's zero chance any AI lab would train a model for such a ridiculous task.</p>
</div>
</div>
<div class="slide" id="5-minutes-llms.005.jpeg">
<img loading="lazy" src="https://static.simonwillison.net/static/2026/5-minutes-llms/5-minutes-llms.005.jpeg" alt="Five pelicans, one for each of the following models. Varying qualities!" style="max-width: 100%" />
<div><a style="float: right; text-decoration: none; border-bottom: none; padding-left: 1em;" href="https://simonwillison.net/2026/May/19/5-minute-llms/#5-minutes-llms.005.jpeg">#</a>
<p>At the start of November the widely acknowledged "best" model was Claude Sonnet 4.5, released on <a href="https://simonwillison.net/2025/Sep/29/claude-sonnet-4-5/">29th September</a>. It drew me this pelican.</p>
<p>In November it was overtaken by <a href="https://simonwillison.net/2025/Nov/13/gpt-51/">GPT-5.1</a>, then <a href="https://simonwillison.net/2025/Nov/18/gemini-3/">Gemini 3</a>, then <a href="https://simonwillison.net/2025/Nov/19/gpt-51-codex-max/">GPT-5.1 Codex Max</a>, and then Anthropic took the crown back again with <a href="https://simonwillison.net/2025/Nov/24/claude-opus/">Claude Opus 4.5</a>.</p>
<p>I think Gemini 3 drew the best pelican out of this lot, but pelicans aren't everything. Most practitioners will agree that Opus 4.5 held the crown for the next couple of months.</p>
</div>
</div>
<div class="slide" id="5-minutes-llms.006.jpeg">
<img loading="lazy" src="https://static.simonwillison.net/static/2026/5-minutes-llms/5-minutes-llms.006.jpeg" alt="The coding agents got good
" style="max-width: 100%" />
<div><a style="float: right; text-decoration: none; border-bottom: none; padding-left: 1em;" href="https://simonwillison.net/2026/May/19/5-minute-llms/#5-minutes-llms.006.jpeg">#</a>
<p>It took a little while for this to become clear, but the real news from November was that the coding agents got <em>good</em>.</p>
<p>OpenAI and Anthropic had spent most of 2025 running <a href="https://simonwillison.net/2025/Dec/19/andrej-karpathy/">Reinforcement Learning from Verifiable Rewards</a> to increase the quality of code written by their models, especially when paired up with their Codex and Claude Code agent harnesses.</p>
<p>In November the results of this work became apparent. Coding agents went from often-work to mostly-work, crossing a quality barrier where you could use them as a daily-driver to get real work done, without needing to spend most of your time fixing their stupid mistakes.</p>
</div>
</div>
<div class="slide" id="5-minutes-llms.007.jpeg">
<img loading="lazy" src="https://static.simonwillison.net/static/2026/5-minutes-llms/5-minutes-llms.007.jpeg" alt="Screenshot of "Initial commit" on GitHub to steipete/Warelay, commit f6dd362, steipete authored on Nov 24, 2025
It's a copy of the MIT license" style="max-width: 100%" />
<div><a style="float: right; text-decoration: none; border-bottom: none; padding-left: 1em;" href="https://simonwillison.net/2026/May/19/5-minute-llms/#5-minutes-llms.007.jpeg">#</a>
<p>Also in November, this happened - the first commit to an obscure (back then) repo called "Warelay" by some guy called Pete.</p>
</div>
</div>
<div class="slide" id="5-minutes-llms.008.jpeg">
<img loading="lazy" src="https://static.simonwillison.net/static/2026/5-minutes-llms/5-minutes-llms.008.jpeg" alt="December/January
(A little bit of LLM psychosis)
" style="max-width: 100%" />
<div><a style="float: right; text-decoration: none; border-bottom: none; padding-left: 1em;" href="https://simonwillison.net/2026/May/19/5-minute-llms/#5-minutes-llms.008.jpeg">#</a>
<p>Over the holiday period, from December to January, a whole lot of us took advantage of the break to have a poke at these new models and coding agents and see what they could do.</p>
<p>They could do a lot! Some of us got a little bit over-excited. I had my own short-lived bout of a form of LLM psychosis as I started spinning up wildly ambitious projects to see how far I could push them.</p>
</div>
</div>
<div class="slide" id="5-minutes-llms.009.jpeg">
<img loading="lazy" src="https://static.simonwillison.net/static/2026/5-minutes-llms/5-minutes-llms.009.jpeg" alt="micro-javascript playground
Execute JavaScript code in a sandboxed micro-javascript environment powered by Pyodide
var numbers = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10];
var doubled = numbers.map(n => n * 2);
console.log('Doubled: "', doubled);
var evens = numbers.filter(n => n % 2 === 0);
console.log('Evens: ', evens);
var sum = numbers.reduce((a, b) => a + b, @);
console.log('Sum:", sum);
Output 27
Doubled: [2, 4, 6, 8, 10, 12, 14, 16, 18, 20]
Evens: [2, 4, 6, 8, 10]
Sum: 55
Execution time: 8.00ms
About: micro-javascript is a pure Python JavaScript interpreter with configurable memory and time limits. This playground runs entirely in your browser using
Pyodide (Python compiled to WebAssembly). View on GitHub" style="max-width: 100%" />
<div><a style="float: right; text-decoration: none; border-bottom: none; padding-left: 1em;" href="https://simonwillison.net/2026/May/19/5-minute-llms/#5-minutes-llms.009.jpeg">#</a>
<p>One of my projects was a vibe-coded implementation of JavaScript in Python - a loose port of <a href="https://github.com/bellard/mquickjs">MicroQuickJS</a> - which I called <a href="https://github.com/simonw/micro-javascript">micro-javascript</a>. You can try it out in your browser in <a href="https://simonw.github.io/micro-javascript/playground.html">this playground</a>.</p>
</div>
</div>
<div class="slide" id="5-minutes-llms.010.jpeg">
<img loading="lazy" src="https://static.simonwillison.net/static/2026/5-minutes-llms/5-minutes-llms.010.jpeg" alt="JavaScript running in Python running in Pyodide running in WebAssembly running in JavaScript" style="max-width: 100%" />
<div><a style="float: right; text-decoration: none; border-bottom: none; padding-left: 1em;" href="https://simonwillison.net/2026/May/19/5-minute-llms/#5-minutes-llms.010.jpeg">#</a>
<p>That playground demo shows JavaScript code run using my micro-javascript library, in Python, running inside Pyodide, running in WebAssembly, running in JavaScript, running in a browser!</p>
<p>It's pretty cool! But did anyone out there <em>need</em> a buggy, slow, insecure half-baked implementation of JavaScript in Python?</p>
<p>They did not. I have quite a few other projects from that holiday period that I have since quietly retired!</p>
</div>
</div>
<div class="slide" id="5-minutes-llms.011.jpeg">
<img loading="lazy" src="https://static.simonwillison.net/static/2026/5-minutes-llms/5-minutes-llms.011.jpeg" alt="February 2026
" style="max-width: 100%" />
<div><a style="float: right; text-decoration: none; border-bottom: none; padding-left: 1em;" href="https://simonwillison.net/2026/May/19/5-minute-llms/#5-minutes-llms.011.jpeg">#</a>
<p>On to February. Remember that Warelay project that had its first commit at the end of November?</p>
</div>
</div>
<div class="slide" id="5-minutes-llms.012.jpeg">
<img loading="lazy" src="https://static.simonwillison.net/static/2026/5-minutes-llms/5-minutes-llms.012.jpeg" alt="Warelay → CLAWDIS → CLAWDBOT →
Clawdbot → Moltbot →🦞 OpenClaw" style="max-width: 100%" />
<div><a style="float: right; text-decoration: none; border-bottom: none; padding-left: 1em;" href="https://simonwillison.net/2026/May/19/5-minute-llms/#5-minutes-llms.012.jpeg">#</a>
<p>In December and January it had gone through <a href="https://simonwillison.net/2026/May/16/openclaw-names/">quite a few name changes</a>... and by February it was taking the world by storm under its final name, <a href="https://openclaw.ai/">OpenClaw</a>.</p>
<p>The amount of attention it got is pretty astonishing for a project that was less than three months old.</p>
</div>
</div>
<div class="slide" id="5-minutes-llms.013.jpeg">
<img loading="lazy" src="https://static.simonwillison.net/static/2026/5-minutes-llms/5-minutes-llms.013.jpeg" alt="Generic term: Claw
" style="max-width: 100%" />
<div><a style="float: right; text-decoration: none; border-bottom: none; padding-left: 1em;" href="https://simonwillison.net/2026/May/19/5-minute-llms/#5-minutes-llms.013.jpeg">#</a>
<p>OpenClaw is a "personal AI assistant", and we actually got a generic term for these, based on NanoClaw and ZeroClaw and suchlike... they're called <strong>Claws</strong>.</p>
</div>
</div>
<div class="slide" id="5-minutes-llms.014.jpeg">
<img loading="lazy" src="https://static.simonwillison.net/static/2026/5-minutes-llms/5-minutes-llms.014.jpeg" alt="An aquarium for your Claw
" style="max-width: 100%" />
<div><a style="float: right; text-decoration: none; border-bottom: none; padding-left: 1em;" href="https://simonwillison.net/2026/May/19/5-minute-llms/#5-minutes-llms.014.jpeg">#</a>
<p>Mac Minis started to sell out around Silicon Valley, because people were buying them to run their Claws.</p>
<p><a href="https://www.dbreunig.com/">Drew Breunig</a> joked to me that this is because they're the new digital pets, and a Mac Mini is the perfect aquarium for your Claw.</p>
</div>
</div>
<div class="slide" id="5-minutes-llms.015.jpeg">
<img loading="lazy" src="https://static.simonwillison.net/static/2026/5-minutes-llms/5-minutes-llms.015.jpeg" alt="Alfred Molina's Doc Ock in Spider-Man 2, tearing apart a New York subway train with his four claws." style="max-width: 100%" />
<div><a style="float: right; text-decoration: none; border-bottom: none; padding-left: 1em;" href="https://simonwillison.net/2026/May/19/5-minute-llms/#5-minutes-llms.015.jpeg">#</a>
<p>My favourite metaphor for Claws is Alfred Molina's Doc Ock in the 2004 movie Spider-Man 2. His claws were powered by AI, and were perfectly safe provided nothing damaged his inhibitor chip... after which they turned evil and took over.</p>
</div>
</div>
<div class="slide" id="5-minutes-llms.016.jpeg">
<img loading="lazy" src="https://static.simonwillison.net/static/2026/5-minutes-llms/5-minutes-llms.016.jpeg" alt="Gemini 3.1 Pro
A really good illustration of a pelican riding a bicycle.
" style="max-width: 100%" />
<div><a style="float: right; text-decoration: none; border-bottom: none; padding-left: 1em;" href="https://simonwillison.net/2026/May/19/5-minute-llms/#5-minutes-llms.016.jpeg">#</a>
<p>Also in February: Gemini 3.1 Pro came out, and drew me a <em>really good pelican riding a bicycle</em>. Look at this! It's even got a fish in its basket.</p>
</div>
</div>
<div class="slide" id="5-minutes-llms.017.jpeg">
<img loading="lazy" src="https://static.simonwillison.net/static/2026/5-minutes-llms/5-minutes-llms.017.jpeg" alt="Gemini 3 Pro pelican contrasted with Gemini 3.1 Pro, as animated SVGs" style="max-width: 100%" />
<div><a style="float: right; text-decoration: none; border-bottom: none; padding-left: 1em;" href="https://simonwillison.net/2026/May/19/5-minute-llms/#5-minutes-llms.017.jpeg">#</a>
<p>And then Google's Jeff Dean <a href="https://simonwillison.net/2026/Feb/19/gemini-31-pro/#jeff-dean">tweeted this video</a> of an animated pelican riding a bicycle, plus a frog on a penny-farthing and a giraffe driving a tiny car and an ostrich on roller skates and a turtle kickflipping a skateboard and a dachshund driving a stretch limousine.</p>
<p>So maybe the AI labs have been paying attention after all!</p>
</div>
</div>
<div class="slide" id="5-minutes-llms.018.jpeg">
<img loading="lazy" src="https://static.simonwillison.net/static/2026/5-minutes-llms/5-minutes-llms.018.jpeg" alt="April 2026
" style="max-width: 100%" />
<div><a style="float: right; text-decoration: none; border-bottom: none; padding-left: 1em;" href="https://simonwillison.net/2026/May/19/5-minute-llms/#5-minutes-llms.018.jpeg">#</a>
<p>A lot of stuff happened just in the past month.</p>
</div>
</div>
<div class="slide" id="5-minutes-llms.019.jpeg">
<img loading="lazy" src="https://static.simonwillison.net/static/2026/5-minutes-llms/5-minutes-llms.019.jpeg" alt="Gemma 4 26B-A4B (17.99GB)
A pretty decent pelican riding a bicycle, though the bike is a bit mis-shapen." style="max-width: 100%" />
<div><a style="float: right; text-decoration: none; border-bottom: none; padding-left: 1em;" href="https://simonwillison.net/2026/May/19/5-minute-llms/#5-minutes-llms.019.jpeg">#</a>
<p>Google released the <a href="https://simonwillison.net/2026/Apr/2/gemma-4/">Gemma 4</a> series of models, which are the most capable open weight models I've seen from a US company.</p>
</div>
</div>
<div class="slide" id="5-minutes-llms.020.jpeg">
<img loading="lazy" src="https://static.simonwillison.net/static/2026/5-minutes-llms/5-minutes-llms.020.jpeg" alt="GLM-5.1
MIT, 754B parameter, 1.51TB!
" style="max-width: 100%" />
<div><a style="float: right; text-decoration: none; border-bottom: none; padding-left: 1em;" href="https://simonwillison.net/2026/May/19/5-minute-llms/#5-minutes-llms.020.jpeg">#</a>
<p>Also last month, Chinese AI lab GLM came out with <a href="https://simonwillison.net/2026/Apr/7/glm-51/">GLM-5.1</a> - an open weight 1.5TB monster! This is a very effective model... if you can afford the hardware to run it.</p>
</div>
</div>
<div class="slide" id="5-minutes-llms.021.jpeg">
<img loading="lazy" src="https://static.simonwillison.net/static/2026/5-minutes-llms/5-minutes-llms.021.jpeg" alt="" style="max-width: 100%" />
<div><a style="float: right; text-decoration: none; border-bottom: none; padding-left: 1em;" href="https://simonwillison.net/2026/May/19/5-minute-llms/#5-minutes-llms.021.jpeg">#</a>
<p>GLM-5.1 drew me this very competent pelican on a bicycle.</p>
</div>
</div>
<div class="slide" id="5-minutes-llms.022.jpeg">
<img loading="lazy" src="https://static.simonwillison.net/static/2026/5-minutes-llms/5-minutes-llms.022.jpeg" alt="The bike is wonky, the pelican is floating." style="max-width: 100%" />
<div><a style="float: right; text-decoration: none; border-bottom: none; padding-left: 1em;" href="https://simonwillison.net/2026/May/19/5-minute-llms/#5-minutes-llms.022.jpeg">#</a>
<p>... though when it <a href="https://gisthost.github.io/?73bb6808b18c2482f66e5f082c75f36e">tried to animate it</a> the bicycle bounced off into the top and the bicycle got warped.</p>
</div>
</div>
<div class="slide" id="5-minutes-llms.023.jpeg">
<img loading="lazy" src="https://static.simonwillison.net/static/2026/5-minutes-llms/5-minutes-llms.023.jpeg" alt="Screenshot of Bluesky
Charles
@charles.capps.me
I think you should pester it with another animal using another method of locomotion.
Something tells me it was trained for this. I can't quite put my finger on it. /s
NORTH VIRGINIA OPOSSUM ON AN E-SCOOTER!!" style="max-width: 100%" />
<div><a style="float: right; text-decoration: none; border-bottom: none; padding-left: 1em;" href="https://simonwillison.net/2026/May/19/5-minute-llms/#5-minutes-llms.023.jpeg">#</a>
<p>Charles <a href="https://bsky.app/profile/charles.capps.me/post/3miwrn42mjc2t">on Bluesky</a> suggested I try it with a North Virginia Opossum on an E-scooter</p>
</div>
</div>
<div class="slide" id="5-minutes-llms.024.jpeg">
<img loading="lazy" src="https://static.simonwillison.net/static/2026/5-minutes-llms/5-minutes-llms.024.jpeg" alt="NORTH VIRGINIA OPOSSUM
CRUISING THE COMMONWEALTH SINCE DUSK
And a really cool illustration of a possum." style="max-width: 100%" />
<div><a style="float: right; text-decoration: none; border-bottom: none; padding-left: 1em;" href="https://simonwillison.net/2026/May/19/5-minute-llms/#5-minutes-llms.024.jpeg">#</a>
<p>And it did this! I've tried this on other models and they don't even come close. "Cruising the commonwealth since dusk" is perfect. It's <a href="https://static.simonwillison.net/static/2026/glm-possum-escooter.html">animated too</a>.</p>
</div>
</div>
<div class="slide" id="5-minutes-llms.025.jpeg">
<img loading="lazy" src="https://static.simonwillison.net/static/2026/5-minutes-llms/5-minutes-llms.025.jpeg" alt="Qwen3.6-35B-A3B is a 20.9GB file that runs on my laptop
It drew a better pelican on a bicycle than Opus 4.7, which messed up the bicycle frame." style="max-width: 100%" />
<div><a style="float: right; text-decoration: none; border-bottom: none; padding-left: 1em;" href="https://simonwillison.net/2026/May/19/5-minute-llms/#5-minutes-llms.025.jpeg">#</a>
<p>The other neat Chinese open weight models in April came from Qwen. <a href="https://simonwillison.net/2026/Apr/16/qwen-beats-opus/">Qwen3.6-35B-A3B on my laptop drew me a better pelican than Claude Opus 4.7</a>. That's a 20.9GB open weights model that runs on my laptop!</p>
<p>(I think this mainly demonstrates that the pelican on the bicycle has firmly exceeded its limits as a useful benchmark.)</p>
</div>
</div>
<div class="slide" id="5-minutes-llms.026.jpeg">
<img loading="lazy" src="https://static.simonwillison.net/static/2026/5-minutes-llms/5-minutes-llms.026.jpeg" alt="Claude Sonnet 4.5 pelican for comparison." style="max-width: 100%" />
<div><a style="float: right; text-decoration: none; border-bottom: none; padding-left: 1em;" href="https://simonwillison.net/2026/May/19/5-minute-llms/#5-minutes-llms.026.jpeg">#</a>
<p>Here's that Claude Sonnet 4.5 pelican from September for comparison. </p>
</div>
</div>
<div class="slide" id="5-minutes-llms.027.jpeg">
<img loading="lazy" src="https://static.simonwillison.net/static/2026/5-minutes-llms/5-minutes-llms.027.jpeg" alt="The themes of the past 6 months:
Coding agents got really good
Local models wildly outperform expectations
" style="max-width: 100%" />
<div><a style="float: right; text-decoration: none; border-bottom: none; padding-left: 1em;" href="https://simonwillison.net/2026/May/19/5-minute-llms/#5-minutes-llms.027.jpeg">#</a>
<p>So those were the two main themes of the past six months. The coding agents got really good... and the laptop-available models, while a lot weaker than the frontier, have started wildly outperforming expectations.</p>
</div>
</div> |
| blogmark |
9465 |
2026-05-17 15:59:41+00:00 |
GDS weighs in on the NHS's decision to retreat from Open Source - |
Terence Eden continues his coverage of the NHS' [poorly considered decision](https://shkspr.mobi/blog/2026/05/nhs-goes-to-war-against-open-source/) to close down access to their open source repositories in response to vulnerabilities reported to them as part of [Project Glasswing](https://simonwillison.net/2026/Apr/7/project-glasswing/).
Now the Government Digital Service have joined the conversation with [AI, open code and vulnerability risk in the public sector](https://www.gov.uk/guidance/ai-open-code-and-vulnerability-risk-in-the-public-sector), published May 14th. Their key recommendation:
> Keep open by default. Making everything private adds additional delivery and policy costs, and can reduce reuse and scrutiny. Openness should remain the default posture, with closure used sparingly and deliberately.
While they don't mention the NHS by name, Terence speaks the language of the civil service and interprets this as a major escalation:
> Within the UK's Civil Service you occasionally hear the expression "being invited to a meeting *without biscuits*". It implies a rather frosty discussion without any of the polite niceties of a normal meeting. In general though, even when people have severe disagreements, it is rare for tempers to fray. It is even rarer for those internal disagreements to spill over into public. |
| quotation |
2200 |
2026-05-16 16:45:37+00:00 |
[...] in the last 10 years I’ve learned to really love and respect CSS as a technology.
So I decided years ago that I wanted to react to “CSS is hard” by getting better at CSS and taking it seriously as a technology, instead of devaluing it. Doing that changed everything for me: I learned that so many of my frustrations (“centering is impossible”) had been addressed in CSS a long time ago, and that also what “centering” means is not always straightforward and it makes sense that there are many ways to do it. CSS is hard because it’s solving a hard problem! - Julia Evans |
|
| quotation |
2199 |
2026-05-14 22:31:20+00:00 |
[...] On the interesting side is how fungible programming languages are nowadays. Programming languages used to be LOCK IN, and they're increasingly not so. You think the Bun rewrite in Rust is good for Rust? Bun has shown they can be in probably any language they want in roughly a week or two. Rust is expendable. Its useful until its not then it can be thrown out. That's interesting! - Mitchell Hashimoto |
|
| blogmark |
9464 |
2026-05-13 23:59:39+00:00 |
Welcome to the Datasette blog - |
We have a bunch of neat Datasette announcements in the pipeline so we decided it was time the project grew an official blog.
I built this using OpenAI Codex desktop, which turns out to have the Markdown session transcript export feature I've always wanted. Here's [the session that built the blog](https://gist.github.com/simonw/885b11eee46822622b8031a1f4e5f3a3). See also [issue 179](https://github.com/simonw/datasette.io/issues/179). |
| quotation |
2198 |
2026-05-13 16:15:50+00:00 |
“11 AI agents” is meaningless as a phrase.
If I said “I have 11 spreadsheets” or “I have 11 browser tabs” to do my work, it means about the same thing. - Boris Mann |
|
| quotation |
2197 |
2026-05-12 22:59:58+00:00 |
Now, if your CEO has never heard the phrase Ralph Loop, oh man, you are less than 30 days away from your next promotion. I'm not even exaggerating. Walk into his office, close the door, and say, hey chief, been experimenting with something. It's called Ralph Loops. And I think it could change literally everything. And he's gonna say, what's a Ralph loop? And you will say, give me $18,000 worth of API credits and I'll show you. Now you won't actually do anything, because you can't do anything. Because nobody can, because nobody knows what they're doing. But by the time he figures that out, you'll have a new title, and equity bump. [...]
Talk about automation constantly. Nothing arouses the slumbering capitalists than the mention of automation. Drop names too, bro. Like talk about specific team members you can automate out of existence. Be like, yo, I automated Gary, bro. Tag Gary in the message. Tag him in Slack in a very public channel. Be like, yo, I just automated @Gary. His function has been Ralph Looped. And tag your CEO in the same message. You think you're getting laid off after that? - Mo Bitar |
|
| quotation |
2196 |
2026-05-12 22:21:51+00:00 |
The thing about 90% of TDMs [Technical Decision Makers] is that they're motivated primarily by NOT GETTING FIRED. These aren't people who browser Lobsters or push to GH on the weekend. These are people that work 9 to 5, get paid, go home, and NEVER THINK ABOUT WORK AGAIN. So to achieve all that, they follow secular trends supported by analysts and broad public sentiment. Oh, Gartner said that "AI strategy" is most important? McKinsey said "context" needs to be managed? Well, "Context Engine for AI Apps" is going to be defensible. Buy it. - Mitchell Hashimoto |
|
| blogmark |
9463 |
2026-05-11 23:58:55+00:00 |
GitLab Act 2 - Hacker News |
There's a lot going on in this announcement from GitLab about the "workforce reduction" and "structural and strategic decisions" they are making with respect to the agentic era.
- They're "planning to reduce the number of countries by up to 30% where we have small teams". One of the most interesting things about GitLab is that they have employees spread across a large number of countries - 18 are listed [in their public employee handbook](https://gitlab.com/gitlab-com/content-sites/handbook/-/blob/7ce61c4be88b04061f9ad9ab5eb64db91ce89d2a/content/handbook/people-group/employment-solutions.md) but this post says they are "operating in nearly 60 countries". That handbook used to document their payroll workflows for those countries too - they stopped publishing that in 2023 but [the last public version](https://gitlab.com/gitlab-com/content-sites/handbook/-/blob/82ad50d380b11751645eedc733f7d663cf908d1f/content/handbook/finance/payroll.md) (hooray for version control) remains a fascinating read. Since we don't know which of those 60 countries have small teams, we can't calculate how many countries that 30% applies to.
- "We're planning to flatten the organization, removing up to three layers of management in some functions so leaders are closer to the work." - this isn't the first announcement of this type I've seen that's trimming management. Coinbase [recently announced](https://twitter.com/brian_armstrong/status/2051616759145185723) a much more aggressive version of this: they were "flattening our org structure to 5 layers max below" and "No pure managers: Every leader at Coinbase must also be a strong and active individual contributor. Managers should be like player-coaches".
- In terms of team structure: "We're re-organizing R&D to create roughly 60 smaller, more empowered teams with end-to-end ownership, nearly doubling the number of independent teams." I've always loved the idea of individual teams that can ship features unblocked by other teams, and it makes sense to me that agentic engineering can increase the capability of such teams. The 37signals public employee handbook used to have a section on working [In self-sufficient, independent teams](https://github.com/basecamp/handbook/blob/9504494a6daa555837ee2cc2d9134ca43ab36301/how-we-work.md#in-self-sufficient-independent-teams) which perfectly captured this for me, I'm sad to see they [removed that detail](https://github.com/basecamp/handbook/commit/1db14f83913163f4e2e72130524269ae6ba3d757) in January 2024!
- Tucked away towards the bottom: "*We will be retiring CREDIT as our values framework*" - that's the values framework [described on this page](https://gitlab.com/gitlab-com/content-sites/handbook/-/blob/7ce61c4be88b04061f9ad9ab5eb64db91ce89d2a/content/handbook/values/_index.md): "Collaboration, Results for Customers, Efficiency, Diversity, Inclusion & Belonging, Iteration, and Transparency". The new values are "Speed with Quality, Ownership Mindset, Customer Outcomes". The fact that "Diversity" is no longer in there is likely to attract a whole lot of attention, so it's worth noting that a sub-bullet under Customer Outcomes reads "Interpersonal excellence: individuals who are good humans, embrace diversity, inclusion and belonging, assume good intent and treat everyone with respect".
Here's the part of their new strategy that most resonated with me:
> **The agentic era multiplies demand for software**. Software has been the force multiplier behind nearly every business transformation of the last two decades. The constraint was the cost and time of producing and managing it. That constraint is collapsing. As the cost of producing software collapses, demand for it will expand. Last year, the developer platform market used to be measured in tens of dollars per user per month, this year it is hundreds/user/month and headed to thousands. *Not only is the value of software for builders increasing, but we believe there will be more software and builders than ever, and we will serve an increasing volume of both*.
That very much encapsulates my own optimistic, [Jevons-paradox](https://simonwillison.net/tags/jevons-paradox/)-inspired hope for how this will all work out.
Their opinion on this does need to be taken with a big grain of salt though. GitLab's stock price was ~$52 a year ago and is ~$26 today, and it's plausible that the drop corresponds to uncertainty about GitLab's continued growth as agentic engineering eats its way through their core market.
If your entire business depends on software engineering growing as a field and producing larger volumes of more lucrative seats, you have a strong incentive to believe that agents will have that effect! |
| quotation |
2195 |
2026-05-11 19:48:32+00:00 |
Your AI coding agent, the one you use to write code, needs to reduce your maintenance costs. Not by a little bit, either. You write code twice as quick now? Better hope you’ve halved your maintenance costs. Three times as productive? One third the maintenance costs. Otherwise, you’re screwed. You’re trading a temporary speed boost for permanent indenture. [...]
The math only works if the LLM *decreases* your maintenance costs, and by exactly the inverse of the rate it adds code. If you double your output and your cost of maintaining that output, two times two means you’ve quadrupled your maintenance costs. If you double your output and hold your maintenance costs steady, two times one means you’ve *still* doubled your maintenance costs. - James Shore |
|
| blogmark |
9462 |
2026-05-11 19:21:27+00:00 |
Your AI Use Is Breaking My Brain - @jasonkoebler.bsky.social |
Excellent, angry piece by Jason Koebler on how AI writing online is becoming impossible to avoid, filtering it is mentally exhausting and it's even starting to distort regular human writing styles.
I particularly liked his use of the term "Zombie Internet" to define a different, more insidious alternative to the "Dead Internet" (which is just bots talking to each other):
> I called it the Zombie Internet because the truth is that large parts of the internet are not just bots talking to bots or bots talking to people. It’s people talking to bots, people talking to people, people creating “AI agents” and then instructing them to interact with people. It’s people using AI talking to people who are not using AI, and it’s people using AI talking to other people who are using AI. It’s influencer hustlebros who are teaching each other how to make AI influencers and have spun up automated YouTube channels and blogs and social media accounts that are spamming the internet for the sole purpose of making money. It is whatever the fuck “Moltbook” is and whatever the fuck X and LinkedIn have become. It’s AI summaries of real books being sold as the book itself and inspirational Reddit posts and comment threads in which people give heartfelt advice to some account that’s actually being run by a marketing firm. [...] |
| blogmark |
9461 |
2026-05-11 15:46:36+00:00 |
Learning on the Shop floor - |
Tobias Lütke describes Shopify's internal coding agent tool, River, which operates entirely in public on their Slack:
> River does not respond to direct messages. She politely declines and suggests to create a public channel for you and her to start working in. I myself work with river in `#tobi_river` channel and many followed this pattern. Every conversation is therefore searchable. Anyone at Shopify can jump in. In my own channel, there are over 100 people who, react to threads, add color and add context, pick up the torch, help with the reviews, remind me how rusty I am, and importantly, learn from watching. [...]
>
> As so often with German, there is a word for the kind of environment: *Lehrwerkstatt*. Literally: **A teaching workshop**. The whole shop floor is the classroom. You learn by being near the work. Being a constant learner is one of the core values of the firm.
>
> Shopify wants to be a Lehrwerkstatt at scale and River has now gotten us closer to this ideal than ever. It’s *osmosis learning*, because it does not require a curriculum, a training plan, or a manager. It just requires everyone's work to be visible to the maximum extent possible. Everyone learns from each other.
I'm reminded of how Midjourney spent its first few years with the primary interface being public Discord channels, forcing users to share their prompts and learn from each other's experiments. I continue to believe that the early success of Midjourney was tied to this mechanism, helping to compensate for how weird and finicky text-to-image prompting is. |
| quotation |
2194 |
2026-05-10 23:58:49+00:00 |
*This article was updated after The Times learned that a remark attributed to Pierre Poilievre, the Conservative leader, was in fact an A.I.-generated summary of his views about Canadian politics that A.I. rendered as a quotation. The reporter should have checked the accuracy of what the A.I. tool returned. The article now accurately quotes from a speech delivered by Mr. Poilievre in April. [...] He did not refer to politicians who changed allegiances as turncoats in that speech.* - New York Times Editors’ Note |
|
| blogmark |
9460 |
2026-05-10 15:36:19+00:00 |
Mythical Man Month - |
Martin Fowler highlights this key idea from The Mythical Man-Month (Fred Brooks, 1975, still impressively relevant 50 years later):
> I will contend that conceptual integrity is the most important consideration in system design. It is better to have a system omit certain anomalous features and improvements, but to reflect one set of design ideas, than to have one that contains many good but independent and uncoordinated ideas.
**Conceptual integrity** is exactly the missing piece I've been trying to nail down in understanding why being able to spit out new features so quickly offers new challenges when working with coding agents. |
| blogmark |
9459 |
2026-05-10 15:31:32+00:00 |
Mythical Man Month - |
Martin Fowler highlights this key idea from The Mythical Man-Month (Fred Brooks, 1975, still impressively relevant 50 years later):
> I will contend that conceptual integrity is the most important consideration in system design. It is better to have a system omit certain anomalous features and improvements, but to reflect one set of design ideas, than to have one that contains many good but independent and uncoordinated ideas.
**Conceptual integrity** is exactly the missing piece I've been trying to nail down in understanding why being able to spit out new features so quickly offers new challenges when working with coding agents. |
| quotation |
2193 |
2026-05-10 14:59:17+00:00 |
One could say in the first quarter-century of my life, that while I was always fascinated by programming, I could never overcome the guilt of not really knowing whether the tool I am building right now isn’t already superceded by some much better implementation someone else has already written 30 or 40 years ago; I could write a TSV-aware search and replace, or I could find out about `awk` and solve that entire class of problems in one fell swoop, for example. My central conceit is that *this is a trap*. You *need* to reinvent a couple of wheels to get to the edge of what we know about wheel-making, not a thousand wheels, and not zero; probably four or five is sufficient in most domains, maybe closer to twenty or thirty in the most epistemically rigorous and developed fields like mathematics or computer science. Each wheel you reinvent, and every directed question you ask along the way, will propel you faster to the true frontier than that same amount of time spend in idle study, or even five times that amount. - Andrew Quinn |
|
| quotation |
2192 |
2026-05-09 01:03:58+00:00 |
WebRTC is designed to **degrade and drop my prompt** during poor network conditions.
wtf my dude
WebRTC aggressively drops audio packets to keep latency low. If you’ve ever heard distorted audio on a conference call, that’s WebRTC baybee. The idea is that conference calls depend on rapid back-and-forth, so pausing to wait for audio is unacceptable.
…but as a user, I would much rather wait an extra 200ms for my slow/expensive prompt to be accurate. After all, I’m paying good money to boil the ocean, and a garbage prompt means a garbage response. It’s not like LLMs are particularly responsive anyway.
**But I’m not allowed to wait**. It’s *impossible* to even retransmit a WebRTC audio packet within a browser; we tried at Discord. The *implementation* is hard-coded for real-time latency **or else**. - Luke Curley |
|