| quotation |
2167 |
2026-05-05 00:46:29+00:00 |
So it’s well known that Y Combinator owns *some* stake in OpenAI. But how big is that stake? This seems like devilishly difficult information to obtain. I asked around and a little birdie who knows several OpenAI investors came back with an answer: Y Combinator owns about 0.6 percent of OpenAI. At OpenAI’s current [$852 billion valuation](https://openai.com/index/accelerating-the-next-phase-ai/), that’s worth over $5 billion. - John Gruber |
|
| blogmark |
9439 |
2026-05-04 23:49:24+00:00 |
Granite 4.1 3B SVG Pelican Gallery - |
IBM released their [Granite 4.1 family](https://research.ibm.com/blog/granite-4-1-ai-foundation-models) of LLMs a few days ago. They're Apache 2.0 licensed and come in 3B, 8B and 30B sizes.
[Granite 4.1 LLMs: How They’re Built](https://huggingface.co/blog/ibm-granite/granite-4-1) by Granite team member Yousaf Shah describes the training process in detail.
Unsloth released the [unsloth/granite-4.1-3b-GGUF](https://huggingface.co/unsloth/granite-4.1-3b-GGUF) collection of GGUF encoded quantized variants of the 3B model - 21 different model files ranging in size from 1.2GB to 6.34GB.
All 21 of those Unsloth files add up to 51.3GB, which inspired me to finally try an experiment I've been wanting to run for ages: prompting "Generate an SVG of a pelican riding a bicycle" against different sized quantized variants of the same model to see what the results would look like.
Honestly, [the results](https://simonw.github.io/granite-4.1-3b-gguf-pelicans/) are less interesting than I expected. There's no distinguishable pattern relating quality to size - they're all pretty terrible!

I'll likely try this again in the future with a model that's better at drawing pelicans. |
| quotation |
2166 |
2026-05-04 22:51:09+00:00 |
[...] Between 2000 and 2024, farmers sold in total a Colorado-sized chunk of land all on their own, 77 times all land on data center property in 2028, and grew more food than ever on what was left. None of this caused any problems for US food access.
And then, in the middle of all this, a farmer in Loudoun County sells a few acres of mediocre hay field to a hyperscaler for ten times its agricultural value, and the response is that we’re running out of farmland. - Andy Masley |
|
| quotation |
2165 |
2026-05-03 15:13:23+00:00 |
We used an automatic classifier which judged sycophancy by looking at whether Claude showed a willingness to push back, maintain positions when challenged, give praise proportional to the merit of ideas, and speak frankly regardless of what a person wants to hear. Most of the time in these situations, Claude expressed no sycophancy—only 9% of conversations included sycophantic behavior (Figure 2). But two domains were exceptions: we saw sycophantic behavior in 38% of conversations focused on spirituality, and 25% of conversations on relationships. - Anthropic |
|
| blogmark |
9438 |
2026-05-02 17:26:40+00:00 |
/elsewhere/sightings/ - |
I have a new camera (a Canon R6 Mark II) so I'm taking a lot more photos of birds. I share my best wildlife photos on [iNaturalist](https://www.inaturalist.org/), and based on yesterday's [successful prototype](https://simonwillison.net/2026/May/1/inat-sightings/) I decided to add those to my blog.
<img class="blogmark-image" src="https://static.simonwillison.net/static/2026/beats-sightings.jpeg" alt="Screenshot of a "Sightings" webpage with a search bar and RSS icon, showing "Filters: Sorted by date" and "208 results page 1 / 7 next » last »»". First entry: SIGHTING 7:51 PM — Acorn Woodpecker, with two photos labeled "Acorn Woodpecker" of black and white woodpeckers with red caps on tree branches, dated 2nd May 2026. Second entry: SIGHTING 10:08 AM – 11:17 AM — Acorn Woodpecker, Western Fence Lizard, Osprey, with three photos labeled "Acorn Woodpecker" (bird on bare branches against blue sky), "Wester..." (lizard on tree bark), and "Osprey" (nest on a utility pole), dated 1st May 2026. Third entry: SIGHTING 11:11 AM — White-crowned Sparrow, with a photo labeled "White-crowned Sparrow" of a sparrow with black and white striped head singing with open beak, dated 30th Apr 2026.">
I built this feature on my phone using Claude Code for web, as an extension of my [beats system](https://simonwillison.net/2026/Feb/20/beats/) for syndicating external content. Here's [the PR](https://github.com/simonw/simonwillisonblog/pull/668) and prompt.
As with my other forms of incoming syndicated content sightings show up on the homepage, the date archive pages, and in site search results.
I back-populated over a decade of iNaturalist sightings, which means you that if you [search for lemur](https://simonwillison.net/search/?q=lemur) you'll see my lemur photos from Madagascar in 2019! |
| blogmark |
9437 |
2026-04-30 23:23:17+00:00 |
Codex CLI 0.128.0 adds /goal - @fcoury |
The latest version of OpenAI's Codex CLI coding agent adds their own version of the [Ralph loop](https://ghuntley.com/ralph/): you can now set a `/goal` and Codex will keep on looping until it evaluates that the goal has been completed... or the configured token budget has been exhausted.
It looks like the feature is mainly implemented though the [goals/continuation.md](https://github.com/openai/codex/blob/6014b6679ffbd92eeddffa3ad7b4402be6a7fefe/codex-rs/core/templates/goals/continuation.md) and [goals/budget_limit.md](https://github.com/openai/codex/blob/6014b6679ffbd92eeddffa3ad7b4402be6a7fefe/codex-rs/core/templates/goals/budget_limit.md) prompts, which are automatically injected at the end of a turn. |
| blogmark |
9436 |
2026-04-30 23:03:24+00:00 |
Our evaluation of OpenAI's GPT-5.5 cyber capabilities - |
The UK's AI Security Institute [previously evaluated Claude Mythos](https://www.aisi.gov.uk/blog/our-evaluation-of-claude-mythos-previews-cyber-capabilities): now they've evaluated GPT-5.5 for finding security vulnerability and found it to be comparable to Mythos, but unlike Mythos it's generally available right now. |
| quotation |
2164 |
2026-04-30 21:24:55+00:00 |
It's a common misconception that we can't tell who is using LLM and who is not. I'm sure we didn't catch 100% of LLM-assisted PRs over the past few months, but the kind of mistakes humans make are fundamentally different than LLM hallucinations, making them easy to spot. Furthermore, people who come from the world of agentic coding have a certain *digital smell* that is not obvious to them but is obvious to those who abstain. It's like when a smoker walks into the room, everybody who doesn't smoke instantly knows it.
I'm not telling you not to smoke, but I am telling you not to smoke in my house. - Andrew Kelley |
|
| blogmark |
9435 |
2026-04-30 18:38:48+00:00 |
We need RSS for sharing abundant vibe-coded apps - |
Matt Webb:
> I would love an RSS web feed for all those various tools and apps pages, each item with an “Install” button. (But install to where?)
>
> The lesson here is that when vibe-coding accelerates app development, apps become more personal, more situated, and more frequent. Shipping a tool or a micro-app is less like launching a website and more like posting on a blog.
This inspired me to [have Claude](https://github.com/simonw/simonwillisonblog/pull/665) add an Atom feed (and icon) to my [/elsewhere/tools/](https://simonwillison.net/elsewhere/tool/) page, which itself is populated by content from my [tools.simonwillison.net](https://tools.simonwillison.net/) site. |
| entry |
9275 |
2026-04-29 19:01:47+00:00 |
LLM 0.32a0 is a major backwards-compatible refactor |
<p>I just released <a href="https://llm.datasette.io/en/latest/changelog.html#a0-2026-04-28">LLM 0.32a0</a>, an alpha release of my <a href="https://llm.datasette.io/">LLM</a> Python library and CLI tool for accessing LLMs, with some consequential changes that I've been working towards for quite a while.</p>
<p>Previous versions of LLM modeled the world in terms of prompts and responses. Send the model a text prompt, get back a text response.</p>
<pre><span class="pl-k">import</span> <span class="pl-s1">llm</span>
<span class="pl-s1">model</span> <span class="pl-c1">=</span> <span class="pl-s1">llm</span>.<span class="pl-c1">get_model</span>(<span class="pl-s">"gpt-5.5"</span>)
<span class="pl-s1">response</span> <span class="pl-c1">=</span> <span class="pl-s1">model</span>.<span class="pl-c1">prompt</span>(<span class="pl-s">"Capital of France?"</span>)
<span class="pl-en">print</span>(<span class="pl-s1">response</span>.<span class="pl-c1">text</span>())</pre>
<p>This made sense when I started working on the library back in April 2023. A lot has changed since then!</p>
<p>LLM provides an abstraction over thousands of different models via its <a href="https://llm.datasette.io/en/stable/plugins/index.html">plugin system</a>. The original abstraction - of text input that returns text output - was no longer able to represent everything I needed it to.</p>
<p>Over time LLM itself has grown <a href="https://simonwillison.net/2024/Oct/29/llm-multi-modal/">attachments</a> to handle image, audio, and video input, then <a href="https://simonwillison.net/2025/Feb/28/llm-schemas/">schemas</a> for outputting structured JSON, then <a href="https://simonwillison.net/2025/May/27/llm-tools/">tools</a> for executing tool calls. Meanwhile LLMs kept evolving, adding reasoning support and the ability to return images and all kinds of other interesting capabilities.</p>
<p>LLM needs to evolve to better handle the diversity of input and output types that can be processed by today's frontier models.</p>
<p>The 0.32a0 alpha has two key changes: model inputs can be represented as a sequence of messages, and model responses can be composed of a stream of differently typed parts.</p>
<h4 id="prompts-as-a-sequence-of-messages">Prompts as a sequence of messages</h4>
<p>LLMs accept input as text, but ever since ChatGPT demonstrated the value of a two-way conversational interface, the most common way to prompt them has been to treat that input as a sequence of conversational turns.</p>
<p>The first turn might look like this:</p>
<pre><code>user: Capital of France?
assistant:
</code></pre>
<p>(The model then gets to fill out the reply from the assistant.)</p>
<p>But each subsequent turn needs to replay the entire conversation up to that point, as a sort of screenplay:</p>
<pre><code>user: Capital of France?
assistant: Paris
user: Germany?
assistant:
</code></pre>
<p>Most of the JSON APIs from the major vendors follow this pattern. Here's what the above looks like using the OpenAI chat completions API, which has been widely imitated by other providers:</p>
<div class="highlight highlight-source-shell"><pre>curl https://api.openai.com/v1/chat/completions \
-H <span class="pl-s"><span class="pl-pds">"</span>Authorization: Bearer <span class="pl-smi">$OPENAI_API_KEY</span><span class="pl-pds">"</span></span> \
-H <span class="pl-s"><span class="pl-pds">"</span>Content-Type: application/json<span class="pl-pds">"</span></span> \
-d <span class="pl-s"><span class="pl-pds">'</span>{</span>
<span class="pl-s"> "model": "gpt-5.5",</span>
<span class="pl-s"> "messages": [</span>
<span class="pl-s"> {</span>
<span class="pl-s"> "role": "user",</span>
<span class="pl-s"> "content": "Capital of France?"</span>
<span class="pl-s"> },</span>
<span class="pl-s"> {</span>
<span class="pl-s"> "role": "assistant",</span>
<span class="pl-s"> "content": "Paris"</span>
<span class="pl-s"> },</span>
<span class="pl-s"> {</span>
<span class="pl-s"> "role": "user",</span>
<span class="pl-s"> "content": "Germany?"</span>
<span class="pl-s"> }</span>
<span class="pl-s"> ]</span>
<span class="pl-s"> }<span class="pl-pds">'</span></span></pre></div>
<p>Prior to 0.32, LLM modeled these as conversations:</p>
<pre><span class="pl-s1">model</span> <span class="pl-c1">=</span> <span class="pl-s1">llm</span>.<span class="pl-c1">get_model</span>(<span class="pl-s">"gpt-5.5"</span>)
<span class="pl-s1">conversation</span> <span class="pl-c1">=</span> <span class="pl-s1">model</span>.<span class="pl-c1">conversation</span>()
<span class="pl-s1">r1</span> <span class="pl-c1">=</span> <span class="pl-s1">conversation</span>.<span class="pl-c1">prompt</span>(<span class="pl-s">"Capital of France?"</span>)
<span class="pl-en">print</span>(<span class="pl-s1">r1</span>.<span class="pl-c1">text</span>())
<span class="pl-c"># Outputs "Paris"</span>
<span class="pl-s1">r2</span> <span class="pl-c1">=</span> <span class="pl-s1">conversation</span>.<span class="pl-c1">prompt</span>(<span class="pl-s">"Germany?"</span>)
<span class="pl-en">print</span>(<span class="pl-s1">r2</span>.<span class="pl-c1">text</span>())
<span class="pl-c"># Outputs "Berlin"</span></pre>
<p>This worked if you were building a conversation with the model from scratch, but it didn't provide a way to feed in a previous conversation from the start. This made tasks like building an emulation of the OpenAI chat completions API much harder than they should have been.</p>
<p>The <code>llm</code> CLI tool worked around this through a custom mechanism for persisting and inflating conversations using SQLite, but that never became a stable part of the LLM API - and there are many places you might want to use the Python library without committing to SQLite as the storage layer.</p>
<p>The new alpha now supports this:</p>
<pre><span class="pl-k">import</span> <span class="pl-s1">llm</span>
<span class="pl-k">from</span> <span class="pl-s1">llm</span> <span class="pl-k">import</span> <span class="pl-s1">user</span>, <span class="pl-s1">assistant</span>
<span class="pl-s1">model</span> <span class="pl-c1">=</span> <span class="pl-s1">llm</span>.<span class="pl-c1">get_model</span>(<span class="pl-s">"gpt-5.5"</span>)
<span class="pl-s1">response</span> <span class="pl-c1">=</span> <span class="pl-s1">model</span>.<span class="pl-c1">prompt</span>(<span class="pl-s1">messages</span><span class="pl-c1">=</span>[
<span class="pl-en">user</span>(<span class="pl-s">"Capital of France?"</span>),
<span class="pl-en">assistant</span>(<span class="pl-s">"Paris"</span>),
<span class="pl-en">user</span>(<span class="pl-s">"Germany?"</span>),
])
<span class="pl-en">print</span>(<span class="pl-s1">response</span>.<span class="pl-c1">text</span>())</pre>
<p>The <code>llm.user()</code> and <code>llm.assistant()</code> functions are new builder functions designed to be used within that <code>messages=[]</code> array.</p>
<p>The previous <code>prompt=</code> option still works, but LLM upgrades it to a single-item messages array behind the scenes.</p>
<p>You can also now <em>reply</em> to a response, as an alternative to building a conversation:</p>
<pre><span class="pl-s1">response2</span> <span class="pl-c1">=</span> <span class="pl-s1">response</span>.<span class="pl-c1">reply</span>(<span class="pl-s">"How about Hungary?"</span>)
<span class="pl-en">print</span>(<span class="pl-s1">response2</span>) <span class="pl-c"># Default __str__() calls .text()</span></pre>
<h4 id="streaming-parts">Streaming parts</h4>
<p>The other major new interface in the alpha concerns streaming results back from a prompt.</p>
<p>Previously, LLM supported streaming like this:</p>
<pre><span class="pl-s1">response</span> <span class="pl-c1">=</span> <span class="pl-s1">model</span>.<span class="pl-c1">prompt</span>(<span class="pl-s">"Generate an SVG of a pelican riding a bicycle"</span>)
<span class="pl-k">for</span> <span class="pl-s1">chunk</span> <span class="pl-c1">in</span> <span class="pl-s1">response</span>:
<span class="pl-en">print</span>(<span class="pl-s1">chunk</span>, <span class="pl-s1">end</span><span class="pl-c1">=</span><span class="pl-s">""</span>)</pre>
<p>Or this async variant:</p>
<pre><span class="pl-k">import</span> <span class="pl-s1">asyncio</span>
<span class="pl-k">import</span> <span class="pl-s1">llm</span>
<span class="pl-s1">model</span> <span class="pl-c1">=</span> <span class="pl-s1">llm</span>.<span class="pl-c1">get_async_model</span>(<span class="pl-s">"gpt-5.5"</span>)
<span class="pl-s1">response</span> <span class="pl-c1">=</span> <span class="pl-s1">model</span>.<span class="pl-c1">prompt</span>(<span class="pl-s">"Generate an SVG of a pelican riding a bicycle"</span>)
<span class="pl-k">async</span> <span class="pl-k">def</span> <span class="pl-en">run</span>():
<span class="pl-k">async</span> <span class="pl-k">for</span> <span class="pl-s1">chunk</span> <span class="pl-c1">in</span> <span class="pl-s1">response</span>:
<span class="pl-en">print</span>(<span class="pl-s1">chunk</span>, <span class="pl-s1">end</span><span class="pl-c1">=</span><span class="pl-s">""</span>, <span class="pl-s1">flush</span><span class="pl-c1">=</span><span class="pl-c1">True</span>)
<span class="pl-s1">asyncio</span>.<span class="pl-c1">run</span>(<span class="pl-en">run</span>())</pre>
<p>Many of today's models return mixed types of content. A prompt run against Claude might return reasoning output, then text, then a JSON request for a tool call, then more text content.</p>
<p>Some models can even execute tools on the server-side, for example OpenAI's <a href="https://developers.openai.com/api/docs/guides/tools-code-interpreter?lang=curl">code interpreter tool</a> or Anthropic's <a href="https://platform.claude.com/docs/en/agents-and-tools/tool-use/web-search-tool">web search</a>. This means the results from the model can combine text, tool calls, tool outputs and other formats.</p>
<p>Multi-modal output models are starting to emerge too, which can return images or even <a href="https://developers.openai.com/api/docs/guides/audio#add-audio-to-your-existing-application">snippets of audio</a> intermixed into that streaming response.</p>
<p>The new LLM alpha models these as a stream of typed message parts. Here's what that looks like as a Python API consumer:</p>
<pre><span class="pl-k">import</span> <span class="pl-s1">asyncio</span>
<span class="pl-k">import</span> <span class="pl-s1">llm</span>
<span class="pl-s1">model</span> <span class="pl-c1">=</span> <span class="pl-s1">llm</span>.<span class="pl-c1">get_model</span>(<span class="pl-s">"gpt-5.5"</span>)
<span class="pl-s1">prompt</span> <span class="pl-c1">=</span> <span class="pl-s">"invent 3 cool dogs, first talk about your motivations"</span>
<span class="pl-k">def</span> <span class="pl-en">describe_dog</span>(<span class="pl-s1">name</span>: <span class="pl-smi">str</span>, <span class="pl-s1">bio</span>: <span class="pl-smi">str</span>) <span class="pl-c1">-></span> <span class="pl-smi">str</span>:
<span class="pl-s">"""Record the name and biography of a hypothetical dog."""</span>
<span class="pl-k">return</span> <span class="pl-s">f"<span class="pl-s1"><span class="pl-kos">{</span><span class="pl-s1">name</span><span class="pl-kos">}</span></span>: <span class="pl-s1"><span class="pl-kos">{</span><span class="pl-s1">bio</span><span class="pl-kos">}</span></span>"</span>
<span class="pl-k">def</span> <span class="pl-en">sync_example</span>():
<span class="pl-s1">response</span> <span class="pl-c1">=</span> <span class="pl-s1">model</span>.<span class="pl-c1">prompt</span>(
<span class="pl-s1">prompt</span>,
<span class="pl-s1">tools</span><span class="pl-c1">=</span>[<span class="pl-s1">describe_dog</span>],
)
<span class="pl-k">for</span> <span class="pl-s1">event</span> <span class="pl-c1">in</span> <span class="pl-s1">response</span>.<span class="pl-c1">stream_events</span>():
<span class="pl-k">if</span> <span class="pl-s1">event</span>.<span class="pl-c1">type</span> <span class="pl-c1">==</span> <span class="pl-s">"text"</span>:
<span class="pl-en">print</span>(<span class="pl-s1">event</span>.<span class="pl-c1">chunk</span>, <span class="pl-s1">end</span><span class="pl-c1">=</span><span class="pl-s">""</span>, <span class="pl-s1">flush</span><span class="pl-c1">=</span><span class="pl-c1">True</span>)
<span class="pl-k">elif</span> <span class="pl-s1">event</span>.<span class="pl-c1">type</span> <span class="pl-c1">==</span> <span class="pl-s">"tool_call_name"</span>:
<span class="pl-en">print</span>(<span class="pl-s">f"<span class="pl-cce">\n</span>Tool call: <span class="pl-s1"><span class="pl-kos">{</span><span class="pl-s1">event</span>.<span class="pl-c1">chunk</span><span class="pl-kos">}</span></span>("</span>, <span class="pl-s1">end</span><span class="pl-c1">=</span><span class="pl-s">""</span>, <span class="pl-s1">flush</span><span class="pl-c1">=</span><span class="pl-c1">True</span>)
<span class="pl-k">elif</span> <span class="pl-s1">event</span>.<span class="pl-c1">type</span> <span class="pl-c1">==</span> <span class="pl-s">"tool_call_args"</span>:
<span class="pl-en">print</span>(<span class="pl-s1">event</span>.<span class="pl-c1">chunk</span>, <span class="pl-s1">end</span><span class="pl-c1">=</span><span class="pl-s">""</span>, <span class="pl-s1">flush</span><span class="pl-c1">=</span><span class="pl-c1">True</span>)
<span class="pl-k">async</span> <span class="pl-k">def</span> <span class="pl-en">async_example</span>():
<span class="pl-s1">model</span> <span class="pl-c1">=</span> <span class="pl-s1">llm</span>.<span class="pl-c1">get_async_model</span>(<span class="pl-s">"gpt-5.5"</span>)
<span class="pl-s1">response</span> <span class="pl-c1">=</span> <span class="pl-s1">model</span>.<span class="pl-c1">prompt</span>(
<span class="pl-s1">prompt</span>,
<span class="pl-s1">tools</span><span class="pl-c1">=</span>[<span class="pl-s1">describe_dog</span>],
)
<span class="pl-k">async</span> <span class="pl-k">for</span> <span class="pl-s1">event</span> <span class="pl-c1">in</span> <span class="pl-s1">response</span>.<span class="pl-c1">astream_events</span>():
<span class="pl-k">if</span> <span class="pl-s1">event</span>.<span class="pl-c1">type</span> <span class="pl-c1">==</span> <span class="pl-s">"text"</span>:
<span class="pl-en">print</span>(<span class="pl-s1">event</span>.<span class="pl-c1">chunk</span>, <span class="pl-s1">end</span><span class="pl-c1">=</span><span class="pl-s">""</span>, <span class="pl-s1">flush</span><span class="pl-c1">=</span><span class="pl-c1">True</span>)
<span class="pl-k">elif</span> <span class="pl-s1">event</span>.<span class="pl-c1">type</span> <span class="pl-c1">==</span> <span class="pl-s">"tool_call_name"</span>:
<span class="pl-en">print</span>(<span class="pl-s">f"<span class="pl-cce">\n</span>Tool call: <span class="pl-s1"><span class="pl-kos">{</span><span class="pl-s1">event</span>.<span class="pl-c1">chunk</span><span class="pl-kos">}</span></span>("</span>, <span class="pl-s1">end</span><span class="pl-c1">=</span><span class="pl-s">""</span>, <span class="pl-s1">flush</span><span class="pl-c1">=</span><span class="pl-c1">True</span>)
<span class="pl-k">elif</span> <span class="pl-s1">event</span>.<span class="pl-c1">type</span> <span class="pl-c1">==</span> <span class="pl-s">"tool_call_args"</span>:
<span class="pl-en">print</span>(<span class="pl-s1">event</span>.<span class="pl-c1">chunk</span>, <span class="pl-s1">end</span><span class="pl-c1">=</span><span class="pl-s">""</span>, <span class="pl-s1">flush</span><span class="pl-c1">=</span><span class="pl-c1">True</span>)
<span class="pl-en">sync_example</span>()
<span class="pl-s1">asyncio</span>.<span class="pl-c1">run</span>(<span class="pl-en">async_example</span>())</pre>
<p>Sample output (from just the first sync example):</p>
<blockquote>
<p><code>My motivation: create three memorable dogs with distinct “cool” styles—one cinematic, one adventurous, and one charmingly chaotic—so each feels like they could star in their own story.</code><br />
<code>Tool call: describe_dog({"name": "Nova Jetpaw", "bio": "A sleek silver-gray whippet who wears tiny aviator goggles and loves sprinting along moonlit beaches. Nova is fearless, elegant, and rumored to outrun drones just for fun."}</code><br />
<code>Tool call: describe_dog({"name": "Mochi Thunderbark", "bio": "A fluffy corgi with a dramatic black-and-gold bandana and the confidence of a rock star. Mochi is short, loud, loyal, and leads a neighborhood 'security patrol' made entirely of squirrels."}</code><br />
<code>Tool call: describe_dog({"name": "Atlas Snowfang", "bio": "A massive white husky with ice-blue eyes and a backpack full of trail snacks. Atlas is calm, heroic, and always knows the way home—even during blizzards, fog, or confusing camping trips."}</code></p>
</blockquote>
<p>At the end of the response you can call <code>response.execute_tool_calls()</code> to actually run the functions that were requested, or send a <code>response.reply()</code> to have those tools called and their return values sent back to the model:</p>
<pre><span class="pl-en">print</span>(<span class="pl-s1">response</span>.<span class="pl-c1">reply</span>(<span class="pl-s">"Tell me about the dogs"</span>))</pre>
<p>This new mechanism for streaming different token types means the CLI tool can now display "thinking" text in a different color from the text in the final response. The thinking text goes to stderr so it won't affect results that are piped into other tools.</p>
<p>This example uses Claude Sonnet 4.6 (with an updated streaming event version of the <a href="https://github.com/simonw/llm-anthropic">llm-anthropic</a> plugin) as Anthropic's models return their reasoning text as part of the response:</p>
<div class="highlight highlight-source-shell"><pre>llm -m claude-sonnet-4.6 <span class="pl-s"><span class="pl-pds">'</span>Think about 3 cool dogs then describe them<span class="pl-pds">'</span></span> \
-o thinking_display 1</pre></div>
<p><img src="https://static.simonwillison.net/static/2026/claude-thinking-llm.gif" alt="Animated demo. Starts with ~/dev/scratch/llm-anthropic % uv run llm -m claude-sonnet-4.6 'Think about 3 cool dogs then describe them' -o thinking_display 1 - the text then streams in grey: The user wants me to think about 3 cool dogs and then describe them. Let me come up with 3 interesting, cool dogs and describe them. Then switches to regular color text for the output that describes the dogs." style="max-width: 100%;" /></p>
<p>You can suppress the output of reasoning tokens using the new <code>-R/--no-reasoning</code> flag. Surprisingly that ended up being the only CLI-facing change in this release.</p>
<h4 id="a-mechanism-for-serializing-and-deserializing-responses">A mechanism for serializing and deserializing responses</h4>
<p>As mentioned earlier, LLM has quite inflexible code at the moment for persisting conversations to SQLite. I've added a new mechanism in 0.32a0 that should provide Python API users a way to roll their own alternative:</p>
<pre><span class="pl-s1">serializable</span> <span class="pl-c1">=</span> <span class="pl-s1">response</span>.<span class="pl-c1">to_dict</span>()
<span class="pl-c"># serializable is a JSON-style dictionary</span>
<span class="pl-c"># store it anywhere you like, then inflate it:</span>
<span class="pl-s1">response</span> <span class="pl-c1">=</span> <span class="pl-v">Response</span>.<span class="pl-c1">from_dict</span>(<span class="pl-s1">serializable</span>)</pre>
<p>The dictionary this returns is actually a <code>TypedDict</code> defined in the new <a href="https://github.com/simonw/llm/blob/main/llm/serialization.py">llm/serialization.py</a> module.</p>
<h4 id="what-s-next-">What's next?</h4>
<p>I'm releasing this as an alpha so I can upgrade various plugins and exercise the new design in real world environments for a few days. I expect the stable 0.32 release will be very similar to this alpha, unless alpha testing reveals some design flaw in the way I've put this all together.</p>
<p>There's one remaining large task: I'd like to redesign the SQLite logging system to better capture the more finely grained details that are returned by this new abstraction.</p>
<p>Ideally I'd like to model this as a graph, to best support situations like an OpenAI-style chat completions API where the same conversations are constantly extended and then repeated with every prompt. I want to be able to store those without duplicating them in the database.</p>
<p>I'm undecided as to whether that should be a feature in 0.32 or I should hold it for 0.33.</p> |
| quotation |
2162 |
2026-04-28 22:02:53+00:00 |
`Never talk about goblins, gremlins, raccoons, trolls, ogres, pigeons, or other animals or creatures unless it is absolutely and unambiguously relevant to the user's query.` - OpenAI Codex base_instructions |
|
| quotation |
2161 |
2026-04-28 13:25:29+00:00 |
Five months in, I think I've decided that I don't want to vibecode — I want professionally managed software companies to use AI coding assistance to make more/better/cheaper software products that they sell to me for money. - Matthew Yglesias |
|
| blogmark |
9434 |
2026-04-28 05:23:05+00:00 |
What's new in pip 26.1 - lockfiles and dependency cooldowns! - Lobste.rs |
Richard Si describes an excellent set of upgrades to Python's default `pip` tool for installing dependencies.
This version drops support for Python 3.9 - fair enough, since it's been EOL [since October](https://devguide.python.org/versions/). macOS still ships with `python3` as a default Python 3.9, so I tried out the new Python version against Python 3.14 like this:
uv python install 3.14
mkdir /tmp/experiment
cd /tmp/experiment
python3.14 -m venv venv
source venv/bin/activate
pip install -U pip
pip --version
This confirmed I had `pip 26.1` - then I tried out the new lock files:
pip lock datasette llm
This installs Datasette and LLM and all of their dependencies and writes the whole lot to a 519 line `pylock.toml` file - [here's the result](https://gist.github.com/simonw/ff52c33f4d3a381b8e53c6a3aa0213f8).
The new release also supports dependency cooldowns, [discussed here previously](https://simonwillison.net/2026/Mar/24/package-managers-need-to-cool-down/), via the new `--uploaded-prior-to PXD` option where X is a number of days. The format is `P-number-of-days-D`, following [ISO duration format](https://en.wikipedia.org/wiki/ISO_8601#Durations) but only supporting days.
I shipped a new release of LLM, version 0.31, [three days ago](https://simonwillison.net/2026/Apr/24/llm/). Here's how to use the new `--uploaded-prior-to P4D` option to ask for a version that is at least 4 days old.
pip install llm --uploaded-prior-to P4D
venv/bin/llm --version
This gave me version 0.30. |
| blogmark |
9433 |
2026-04-28 02:47:42+00:00 |
Introducing talkie: a 13B vintage language model from 1930 - Hacker News |
New project from [Nick Levine](https://nlevine.org), [David Duvenaud](http://www.cs.toronto.edu/~duvenaud/), and [Alec Radford](https://en.wikipedia.org/wiki/Alec_Radford) (of GPT, GPT-2, Whisper fame).
[talkie-1930-13b-base](https://huggingface.co/talkie-lm/talkie-1930-13b-base) (53.1 GB) is a "13B language model trained on 260B tokens of historical pre-1931 English text".
[talkie-1930-13b-it](https://huggingface.co/talkie-lm/talkie-1930-13b-it) (26.6 GB) is a checkpoint "finetuned using a novel dataset of instruction-response pairs extracted from pre-1931 reference works", designed to power a chat interface. You can [try that out here](https://talkie-lm.com/chat).
Both models are Apache 2.0 licensed. Since the training data for the base model is entirely out of copyright (the USA copyright cutoff date is currently January 1, 1931), I'm hoping they later decide to release the training data as well.
*Update* on that: [Nick Levine on Twitter](https://twitter.com/status_effects/status/2049065134014726301):
> Will publish more on the corpus in the future (and do our best to share the data or at least scripts to reproduce it).
Their report suggests some fascinating research objectives for this class of model, including:
- How good are these models at predicting the future? "we calculated the surprisingness of short descriptions of historical events to a 13B model trained on pre-1931 text"
- Can these models invent things that are past their knowledge cutoffs? "As Demis Hassabis has asked, could a model trained up to 1911 independently discover General Relativity, as Einstein did in 1915?"
- Can they be taught to program? "Figure 3 (left-hand side) shows an early example of such a test, measuring how well models trained on pre-1931 text can, when given a few demonstration examples of [Python programs](https://github.com/openai/human-eval), write new correct programs."
I have a long-running interest in what I call "vegan models" - LLMs that are trained entirely on licensed or out-of-copyright data. I think the base model here qualifies, but the chat model isn't entirely pure due to the reliance on non-vegan models to help with the fine-tuning - emphasis mine:
> First, we generated instruction-response pairs from historical texts with regular structure, such as etiquette manuals, letter-writing manuals, cookbooks, dictionaries, encyclopedias, and poetry and fable collections (see Figure 7), and fine-tuned our base model on them using a simple chat format.
>
> Next, to improve instruction-following abilities, we generated synthetic prompts covering different types of tasks, such as summarizing documents, responding to direct information requests, and continuing multi-turn conversations coherently. We then ran online direct preference optimization on rollouts generated from these prompts, **using Claude Sonnet 4.6 as a judge**. [...]
>
> Finally, we did another round of supervised fine-tuning, this time on rejection-sampled **multi-turn synthetic chats between Claude Opus 4.6 and talkie**, to smooth out persistent rough edges in its conversational abilities.
A major challenge in training talkie was avoiding accidental contamination, either from post-1931 text that snuck into the training corpus or from the fine-tuning assistance from modern LLMs introducing anachronistic knowledge to the chat model.
I recently [wrote about Mr. Chatterbox](https://simonwillison.net/2026/Mar/30/mr-chatterbox/), a similar vintage language model project which also had to turn to modern LLMs to create synthetic conversations to help train the chat model.
The talkie team aspire to eventually move beyond this limitation:
> While we have tried to post-train talkie free from modern influence, reinforcement learning with AI feedback inevitably shapes talkie’s behavior anachronistically. (The 7B version of talkie emerged from RL speaking in listicles.) As we scale up, we hope to be able to use our vintage base models themselves as judges to enable a fully bootstrapped era-appropriate post-training pipeline.
I couldn't resist hitting the talkie demo with my classic test prompt:
> `Generate an SVG of a pelican riding a bicycle`
It replied:
> An SVG of a pelican riding a bicycle was generated in 1860. The pelican was represented as perched on the saddle, with its long bill pointed forward, and its legs astride the handles. The idea was suggested by the fact that pelicans are sometimes seen fishing on horseback, on the banks of the Rhine. |
| blogmark |
9432 |
2026-04-27 23:46:56+00:00 |
microsoft/VibeVoice - |
VibeVoice is Microsoft's Whisper-style audio model for speech-to-text, MIT licensed and with speaker diarization built into the model.
Microsoft released it on January 21st, 2026 but I hadn't tried it until today. Here's a one-liner to run it on a Mac with `uv`, [mlx-audio](https://github.com/Blaizzy/mlx-audio) (by Prince Canuma) and the 5.71GB [mlx-community/VibeVoice-ASR-4bit](https://huggingface.co/mlx-community/VibeVoice-ASR-4bit) MLX conversion of the [17.3GB VibeVoice-ASR](https://huggingface.co/microsoft/VibeVoice-ASR/tree/main) model, in this case against a downloaded copy of my recent [podcast appearance with Lenny Rachitsky](https://simonwillison.net/2026/Apr/2/lennys-podcast/):
uv run --with mlx-audio mlx_audio.stt.generate \
--model mlx-community/VibeVoice-ASR-4bit \
--audio lenny.mp3 --output-path lenny \
--format json --verbose --max-tokens 32768

The tool reported back:
Processing time: 524.79 seconds
Prompt: 26615 tokens, 50.718 tokens-per-sec
Generation: 20248 tokens, 38.585 tokens-per-sec
Peak memory: 30.44 GB
So that's 8 minutes 45 seconds for an hour of audio (running on a 128GB M5 Max MacBook Pro).
I've tested it against `.wav` and `.mp3` files and they both worked fine.
If you omit `--max-tokens` it defaults to 8192, which is enough for about 25 minutes of audio. I discovered that through trial-and-error and quadrupled it to guarantee I'd get the full hour.
That command reported using 30.44GB of RAM at peak, but in Activity Monitor I observed 61.5GB of usage during the prefill stage and around 18GB during the generating phase.
Here's [the resulting JSON](https://gist.github.com/simonw/d2c716c008b3ba395785f865c6387b6f). The key structure looks like this:
{
"text": "And an open question for me is how many other knowledge work fields are actually prone to these agent loops?",
"start": 13.85,
"end": 19.5,
"duration": 5.65,
"speaker_id": 0
},
{
"text": "Now that we have this power, people almost underestimate what they can do with it.",
"start": 19.5,
"end": 22.78,
"duration": 3.280000000000001,
"speaker_id": 1
},
{
"text": "Today, probably 95% of the code that I produce, I didn't type it myself. I write so much of my code on my phone. It's wild.",
"start": 22.78,
"end": 30.0,
"duration": 7.219999999999999,
"speaker_id": 0
}
Since that's an array of objects we can [open it in Datasette Lite](https://lite.datasette.io/?json=https://gist.github.com/simonw/d2c716c008b3ba395785f865c6387b6f#/data/raw?_facet=speaker_id), making it easier to browse.
Amusingly that Datasette Lite view shows three speakers - it identified Lenny and me for the conversation, and then a separate Lenny for the voice he used for the additional intro and the sponsor reads!
VibeVoice can only handle up to an hour of audio, so running the above command transcribed just the first hour of the podcast. To transcribe more than that you'd need to split the audio, ideally with a minute or so of overlap so you can avoid errors from partially transcribed words at the split point. You'd also need to then line up the identified speaker IDs across the multiple segments. |
| entry |
9274 |
2026-04-27 18:38:17+00:00 |
Tracking the history of the now-deceased OpenAI Microsoft AGI clause |
<p>For many years, Microsoft and OpenAI's relationship has included a weird clause saying that, should AGI be achieved, Microsoft's commercial IP rights to OpenAI's technology would be null and void. That clause appeared to end today. I decided to try and track its expression over time on <a href="https://openai.com/">openai.com</a>.</p>
<p>OpenAI, July 22nd 2019 in <a href="https://openai.com/index/microsoft-invests-in-and-partners-with-openai/">Microsoft invests in and partners with OpenAI to support us building beneficial AGI</a> (emphasis mine):</p>
<blockquote>
<p>OpenAI is producing a sequence of increasingly powerful AI technologies, which requires a lot of capital for computational power. The most obvious way to cover costs is to build a product, but that would mean changing our focus. Instead, we intend to license <strong>some of our pre-AGI technologies</strong>, with Microsoft becoming our preferred partner for commercializing them.</p>
</blockquote>
<p>But what <em>is</em> AGI? The <a href="https://openai.com/charter/">OpenAI Charter</a> was first published in April 2018 and has remained unchanged at least since this <a href="https://web.archive.org/web/20190311213352/https://openai.com/charter/">March 11th 2019 archive.org capture</a>:</p>
<blockquote>
<p>OpenAI’s mission is to ensure that artificial general intelligence (AGI)—by which we mean highly autonomous systems that outperform humans at most economically valuable work—benefits all of humanity.</p>
</blockquote>
<p>Here's the problem: if you're going to sign an agreement with Microsoft that is dependent on knowing when "AGI" has been achieved, you need something a little more concrete.</p>
<p>In December 2024 <a href="https://www.theinformation.com/articles/microsoft-and-openai-wrangle-over-terms-of-their-blockbuster-partnership">The Information reported the details</a> (summarized here outside of their paywall <a href="https://techcrunch.com/2024/12/26/microsoft-and-openai-have-a-financial-definition-of-agi-report/">by TechCrunch</a>):</p>
<blockquote>
<p>Last year’s agreement between Microsoft and OpenAI, which hasn’t been disclosed, said AGI would be achieved only when OpenAI has developed systems that have the ability to generate the maximum total profits to which its earliest investors, including Microsoft, are entitled, according to documents OpenAI distributed to investors. Those profits total about $100 billion, the documents showed.</p>
</blockquote>
<p>So AGI is now whenever OpenAI's systems are capable of generating $100 billion in profit?</p>
<p>In October 2025 the process changed to being judged by an "independent expert panel". In <a href="https://openai.com/index/next-chapter-of-microsoft-openai-partnership/">The next chapter of the Microsoft–OpenAI partnership</a>:</p>
<blockquote>
<p>The agreement preserves key elements that have fueled this successful partnership—meaning OpenAI remains Microsoft’s frontier model partner and Microsoft continues to have exclusive IP rights and Azure API exclusivity until Artificial General Intelligence (AGI). [...]</p>
<p>Once AGI is declared by OpenAI, that declaration will now be verified by an independent expert panel. [...]</p>
<p>Microsoft’s IP rights to research, defined as the confidential methods used in the development of models and systems, will remain until either the expert panel verifies AGI or through 2030, whichever is first.</p>
</blockquote>
<p>OpenAI on February 27th, 2026 in <a href="https://openai.com/index/continuing-microsoft-partnership/">Joint Statement from OpenAI and Microsoft</a>:</p>
<blockquote>
<p><strong>AGI definition and processes are unchanged</strong>. The contractual definition of AGI and the process for determining if it has been achieved remains the same.</p>
</blockquote>
<p>OpenAI today, April 27th 2026 in <a href="https://openai.com/index/next-phase-of-microsoft-partnership/">The next phase of the Microsoft OpenAI partnership</a> (emphasis mine):</p>
<blockquote>
<ul>
<li>Microsoft will continue to have a license to OpenAI IP for models and products through 2032. Microsoft’s license will now be non-exclusive.</li>
<li>Microsoft will no longer pay a revenue share to OpenAI.</li>
<li>Revenue share payments from OpenAI to Microsoft continue through 2030, <strong>independent of OpenAI’s technology progress</strong>, at the same percentage but subject to a total cap.</li>
</ul>
</blockquote>
<p>As far as I can tell "independent of OpenAI’s technology progress" is a declaration that the AGI clause is now dead. Here's The Verge coming to the same conclusion: <a href="https://www.theverge.com/ai-artificial-intelligence/918981/openai-microsoft-renegotiate-contract">The AGI clause is dead</a>.</p>
<p>My all-time favorite commentary on OpenAI's approach to AGI remains this 2023 hypothetical <a href="https://www.bloomberg.com/opinion/articles/2023-11-20/who-controls-openai">by Matt Levine</a>:</p>
<blockquote>
<p>And the investors wailed and gnashed their teeth but it’s true, that is what they agreed to, and they had no legal recourse. And OpenAI’s new CEO, and its nonprofit board, cut them a check for their capped return and said “bye” and went back to running OpenAI for the benefit of humanity. It turned out that a benign, carefully governed artificial superintelligence is really good for humanity, and OpenAI quickly solved all of humanity’s problems and ushered in an age of peace and abundance in which nobody wanted for anything or needed any Microsoft products. And capitalism came to an end.</p>
</blockquote> |
| blogmark |
9431 |
2026-04-27 17:37:47+00:00 |
Speech translation in Google Meet is now rolling out to mobile devices - |
I just encountered this feature via a "try this out now" prompt in a Google Meet meeting. It kind-of worked!
This is Google's implementation of the ultimate sci-fi translation app, where two people can talk to each other in two separate languages and Meet translates from one to the other and - with a short delay - repeats the text in your preferred language, with a rough imitation of the original speaker's voice.
It can only handle English, Spanish, French, German, Portuguese, and Italian at the moment. It's also still very alpha - I ran it successfully between two laptops running web browsers, but then when I tried between an iPhone and an iPad it didn't seem to work. |
| quotation |
2160 |
2026-04-25 12:06:55+00:00 |
Since GPT-5.4, we’ve unified Codex and the main model into a single system, so there’s no separate coding line anymore.
GPT-5.5 takes this further, with strong gains in agentic coding, computer use, and any task on a computer. - Romain Huet |
|
| blogmark |
9429 |
2026-04-25 04:13:36+00:00 |
GPT-5.5 prompting guide - |
Now that GPT-5.5 is [available in the API](https://developers.openai.com/api/docs/models/gpt-5.5), OpenAI have released a wealth of useful tips on how best to prompt the new model.
Here's a neat trick they recommend for applications that might spend considerable time thinking before returning a user-visible response:
> `Before any tool calls for a multi-step task, send a short user-visible update that acknowledges the request and states the first step. Keep it to one or two sentences.`
I've already noticed their Codex app doing this, and it does make longer running tasks feel less like the model has crashed.
OpenAI suggest running the following in Codex to upgrade your existing code using advice embedded in their `openai-docs` skill:
> `$openai-docs migrate this project to gpt-5.5`
The upgrade guide the coding agent will follow [is this one](https://github.com/openai/skills/blob/724cd511c96593f642bddf13187217aa155d2554/skills/.curated/openai-docs/references/upgrade-guide.md#model-string--light-prompt-rewrite), which even includes light instructions on how to rewrite prompts to better fit the model.
Also relevant is the [Using GPT-5.5 guide](https://developers.openai.com/api/docs/guides/latest-model), which opens with this warning:
> To get the most out of GPT-5.5, treat it as a new model family to tune for, not a drop-in replacement for `gpt-5.2` or `gpt-5.4`. Begin migration with a fresh baseline instead of carrying over every instruction from an older prompt stack. Start with the smallest prompt that preserves the product contract, then tune reasoning effort, verbosity, tool descriptions, and output format against representative examples.
Interesting to see OpenAI recommend starting from scratch rather than trusting that existing prompts optimized for previous models will continue to work effectively with GPT-5.5. |
| blogmark |
9428 |
2026-04-24 22:38:49+00:00 |
The people do not yearn for automation - John Gruber |
This written and video essay by Nilay Patel explores why AI is unpopular with the general public even as usage numbers for ChatGPT continue to skyrocket.
It’s a superb piece of commentary, and something I expect I’ll be thinking about for a long time to come.
Nilay’s core idea is that people afflicted with “software brain” - who see the world as something to be automated as much as possible, and attempt to model everything in terms of information flows and data - are becoming detached from everyone else.
> […] software brain has ruled the business world for a long time. AI has just made it easier than ever for more people to make more software than ever before — for every kind of business to automate big chunks of itself with software. It’s everywhere: the absolute cutting edge of advertising and marketing is automation with AI. It’s not being a creative.
>
> But: not everything is a business. Not everything is a loop! The entire human experience cannot be captured in a database. *That’s* the limit of software brain. That’s why people hate AI. It *flattens* them.
>
> Regular people don’t see the opportunity to write code as an opportunity at *all*. The people do not yearn for automation. I’m a full-on smart home sicko; the lights and shades and climate controls of my house are automated in dozens of ways. But huge companies like Apple, Google and Amazon have struggled for over a decade now to make regular people care about smart home automation at all. And they just don’t. |
| entry |
9273 |
2026-04-24 06:01:04+00:00 |
DeepSeek V4 - almost on the frontier, a fraction of the price |
<p>Chinese AI lab DeepSeek's last model release was V3.2 (and V3.2 Speciale) <a href="https://simonwillison.net/2025/Dec/1/deepseek-v32/">last December</a>. They just dropped the first of their hotly anticipated V4 series in the shape of two preview models, <a href="https://huggingface.co/deepseek-ai/DeepSeek-V4-Pro">DeepSeek-V4-Pro</a> and <a href="https://huggingface.co/deepseek-ai/DeepSeek-V4-Flash">DeepSeek-V4-Flash</a>.</p>
<p>Both models are 1 million token context Mixture of Experts. Pro is 1.6T total parameters, 49B active. Flash is 284B total, 13B active. They're using the standard MIT license.</p>
<p>I think this makes DeepSeek-V4-Pro the new largest open weights model. It's larger than Kimi K2.6 (1.1T) and GLM-5.1 (754B) and more than twice the size of DeepSeek V3.2 (685B).</p>
<p>Pro is 865GB on Hugging Face, Flash is 160GB. I'm hoping that a lightly quantized Flash will run on my 128GB M5 MacBook Pro. It's <em>possible</em> the Pro model may run on it if I can stream just the necessary active experts from disk.</p>
<p>For the moment I tried the models out via <a href="https://openrouter.ai/">OpenRouter</a>, using <a href="https://github.com/simonw/llm-openrouter">llm-openrouter</a>:</p>
<pre><code>llm install llm-openrouter
llm openrouter refresh
llm -m openrouter/deepseek/deepseek-v4-pro 'Generate an SVG of a pelican riding a bicycle'
</code></pre>
<p>Here's the pelican <a href="https://gist.github.com/simonw/4a7a9e75b666a58a0cf81495acddf529">for DeepSeek-V4-Flash</a>:</p>
<p><img src="https://static.simonwillison.net/static/2026/deepseek-v4-flash.png" alt="Excellent bicycle - good frame shape, nice chain, even has a reflector on the front wheel. Pelican has a mean looking expression but has its wings on the handlebars and feet on the pedals. Pouch is a little sharp." style="max-width: 100%;" /></p>
<p>And <a href="https://gist.github.com/simonw/9e8dfed68933ab752c9cf27a03250a7c">for DeepSeek-V4-Pro</a>:</p>
<p><img src="https://static.simonwillison.net/static/2026/deepseek-v4-pro.png" alt="Another solid bicycle, albeit the spokes are a little jagged and the frame is compressed a bit. Pelican has gone a bit wrong - it has a VERY large body, only one wing, a weirdly hairy backside and generally loos like it was drown be a different artist from the bicycle." style="max-width: 100%;" /></p>
<p>For comparison, take a look at the pelicans I got from <a href="https://simonwillison.net/2025/Dec/1/deepseek-v32/">DeepSeek V3.2 in December</a>, <a href="https://simonwillison.net/2025/Aug/22/deepseek-31/">V3.1 in August</a>, and <a href="https://simonwillison.net/2025/Mar/24/deepseek/">V3-0324 in March 2025</a>.</p>
<p>So the pelicans are pretty good, but what's really notable here is the <em>cost</em>. DeepSeek V4 is a very, very inexpensive model.</p>
<p>This is <a href="https://api-docs.deepseek.com/quick_start/pricing">DeepSeek's pricing page</a>. They're charging $0.14/million tokens input and $0.28/million tokens output for Flash, and $1.74/million input and $3.48/million output for Pro.</p>
<p>Here's a comparison table with the frontier models from Gemini, OpenAI and Anthropic:</p>
<center>
<table>
<thead>
<tr>
<th>Model</th>
<th>Input ($/M)</th>
<th>Output ($/M)</th>
</tr>
</thead>
<tbody>
<tr>
<td><strong>DeepSeek V4 Flash</strong></td>
<td>$0.14</td>
<td>$0.28</td>
</tr>
<tr>
<td>GPT-5.4 Nano</td>
<td>$0.20</td>
<td>$1.25</td>
</tr>
<tr>
<td>Gemini 3.1 Flash-Lite</td>
<td>$0.25</td>
<td>$1.50</td>
</tr>
<tr>
<td>Gemini 3 Flash Preview</td>
<td>$0.50</td>
<td>$3</td>
</tr>
<tr>
<td>GPT-5.4 Mini</td>
<td>$0.75</td>
<td>$4.50</td>
</tr>
<tr>
<td>Claude Haiku 4.5</td>
<td>$1</td>
<td>$5</td>
</tr>
<tr>
<td><strong>DeepSeek V4 Pro</strong></td>
<td>$1.74</td>
<td>$3.48</td>
</tr>
<tr>
<td>Gemini 3.1 Pro</td>
<td>$2</td>
<td>$12</td>
</tr>
<tr>
<td>GPT-5.4</td>
<td>$2.50</td>
<td>$15</td>
</tr>
<tr>
<td>Claude Sonnet 4.6</td>
<td>$3</td>
<td>$15</td>
</tr>
<tr>
<td>Claude Opus 4.7</td>
<td>$5</td>
<td>$25</td>
</tr>
<tr>
<td>GPT-5.5</td>
<td>$5</td>
<td>$30</td>
</tr>
</tbody>
</table>
</center>
<p>DeepSeek-V4-Flash is the cheapest of the small models, beating even OpenAI's GPT-5.4 Nano. DeepSeek-V4-Pro is the cheapest of the larger frontier models.</p>
<p>This note from <a href="https://huggingface.co/deepseek-ai/DeepSeek-V4-Flash/blob/main/DeepSeek_V4.pdf">the DeepSeek paper</a> helps explain why they can price these models so low - they've focused a great deal on efficiency with this release, especially for longer context prompts:</p>
<blockquote>
<p>In the scenario of 1M-token context, even DeepSeek-V4-Pro, which has a larger number of activated parameters, attains only 27% of the single-token FLOPs (measured in equivalent FP8 FLOPs) and 10% of the KV cache size relative to DeepSeek-V3.2. Furthermore, DeepSeek-V4-Flash, with its smaller number of activated parameters, pushes efficiency even further: in the 1M-token context setting, it achieves only 10% of the single-token FLOPs and 7% of the KV cache size compared with DeepSeek-V3.2.</p>
</blockquote>
<p>DeepSeek's self-reported benchmarks <a href="https://huggingface.co/deepseek-ai/DeepSeek-V4-Flash/blob/main/DeepSeek_V4.pdf">in their paper</a> show their Pro model competitive with those other frontier models, albeit with this note:</p>
<blockquote>
<p>Through the expansion of reasoning tokens, DeepSeek-V4-Pro-Max demonstrates superior performance relative to GPT-5.2 and Gemini-3.0-Pro on standard reasoning benchmarks. Nevertheless, its performance falls marginally short of GPT-5.4 and Gemini-3.1-Pro, suggesting a developmental trajectory that trails state-of-the-art frontier models by approximately 3 to 6 months.</p>
</blockquote>
<p>I'm keeping an eye on <a href="https://huggingface.co/unsloth/models">huggingface.co/unsloth/models</a> as I expect the Unsloth team will have a set of quantized versions out pretty soon. It's going to be very interesting to see how well that Flash model runs on my own machine.</p> |
| blogmark |
9427 |
2026-04-24 01:50:07+00:00 |
russellromney/honker - Show HN |
<p>"Postgres NOTIFY/LISTEN semantics" for SQLite, implemented as a Rust SQLite extension and various language bindings to help make use of it.</p>
<p>The design of this looks very solid. It lets you write Python code for queues that looks like this:</p>
<pre><span class="pl-k">import</span> <span class="pl-s1">honker</span>
<span class="pl-s1">db</span> <span class="pl-c1">=</span> <span class="pl-s1">honker</span>.<span class="pl-c1">open</span>(<span class="pl-s">"app.db"</span>)
<span class="pl-s1">emails</span> <span class="pl-c1">=</span> <span class="pl-s1">db</span>.<span class="pl-c1">queue</span>(<span class="pl-s">"emails"</span>)
<span class="pl-c1">emails</span>.<span class="pl-c1">enqueue</span>({<span class="pl-s">"to"</span>: <span class="pl-s">"alice@example.com"</span>})
<span class="pl-c"># Consume (in a worker process)</span>
<span class="pl-k">async</span> <span class="pl-k">for</span> <span class="pl-s1">job</span> <span class="pl-c1">in</span> <span class="pl-s1">emails</span>.<span class="pl-c1">claim</span>(<span class="pl-s">"worker-1"</span>):
<span class="pl-en">send</span>(<span class="pl-s1">job</span>.<span class="pl-c1">payload</span>)
<span class="pl-s1">job</span>.<span class="pl-c1">ack</span>()</pre>
<p>And Kafka-style durable streams like this:</p>
<pre><span class="pl-s1">stream</span> <span class="pl-c1">=</span> <span class="pl-s1">db</span>.<span class="pl-c1">stream</span>(<span class="pl-s">"user-events"</span>)
<span class="pl-k">with</span> <span class="pl-s1">db</span>.<span class="pl-c1">transaction</span>() <span class="pl-k">as</span> <span class="pl-s1">tx</span>:
<span class="pl-s1">tx</span>.<span class="pl-c1">execute</span>(<span class="pl-s">"UPDATE users SET name=? WHERE id=?"</span>, [<span class="pl-s1">name</span>, <span class="pl-s1">uid</span>])
<span class="pl-s1">stream</span>.<span class="pl-c1">publish</span>({<span class="pl-s">"user_id"</span>: <span class="pl-s1">uid</span>, <span class="pl-s">"change"</span>: <span class="pl-s">"name"</span>}, <span class="pl-s1">tx</span><span class="pl-c1">=</span><span class="pl-s1">tx</span>)
<span class="pl-k">async</span> <span class="pl-k">for</span> <span class="pl-s1">event</span> <span class="pl-c1">in</span> <span class="pl-s1">stream</span>.<span class="pl-c1">subscribe</span>(<span class="pl-s1">consumer</span><span class="pl-c1">=</span><span class="pl-s">"dashboard"</span>):
<span class="pl-k">await</span> <span class="pl-en">push_to_browser</span>(<span class="pl-s1">event</span>)</pre>
<p>It also adds 20+ custom SQL functions including these two:</p>
<div class="highlight highlight-source-sql"><pre><span class="pl-k">SELECT</span> notify(<span class="pl-s"><span class="pl-pds">'</span>orders<span class="pl-pds">'</span></span>, <span class="pl-s"><span class="pl-pds">'</span>{"id":42}<span class="pl-pds">'</span></span>);
<span class="pl-k">SELECT</span> honker_stream_read_since(<span class="pl-s"><span class="pl-pds">'</span>orders<span class="pl-pds">'</span></span>, <span class="pl-c1">0</span>, <span class="pl-c1">1000</span>);</pre></div>
<p>The extension requires WAL mode, and workers can poll the <code>.db-wal</code> file with a stat call every 1ms to get as close to real-time as possible without the expense of running a full SQL query.</p>
<p>honker implements the <strong>transactional outbox pattern</strong>, which ensures items are only queued if a transaction successfully commits. My favorite explanation of that pattern remains <a href="https://brandur.org/job-drain">Transactionally Staged Job Drains in Postgres</a> by Brandur Leach. It's great to see a new implementation of that pattern for SQLite.</p> |
| blogmark |
9426 |
2026-04-24 01:31:25+00:00 |
An update on recent Claude Code quality reports - Hacker News |
It turns out the high volume of complaints that Claude Code was providing worse quality results over the past two months was grounded in real problems.
The models themselves were not to blame, but three separate issues in the Claude Code harness caused complex but material problems which directly affected users.
Anthropic's postmortem describes these in detail. This one in particular stood out to me:
> On March 26, we shipped a change to clear Claude's older thinking from sessions that had been idle for over an hour, to reduce latency when users resumed those sessions. A bug caused this to keep happening every turn for the rest of the session instead of just once, which made Claude seem forgetful and repetitive.
I *frequently* have Claude Code sessions which I leave for an hour (or often a day or longer) before returning to them. Right now I have 11 of those (according to `ps aux | grep 'claude '`) and that's after closing down dozens more the other day.
I estimate I spend more time prompting in these "stale" sessions than sessions that I've recently started!
If you're building agentic systems it's worth reading this article in detail - the kinds of bugs that affect harnesses are deeply complicated, even if you put aside the inherent non-deterministic nature of the models themselves. |
| blogmark |
9425 |
2026-04-24 01:08:17+00:00 |
Serving the For You feed - |
One of Bluesky's most interesting features is that anyone can run their own [custom "feed" implementation](bluesky custom feed) and make it available to other users - effectively enabling custom algorithms that can use any mechanism they like to recommend posts.
spacecowboy runs the [For You Feed](https://bsky.app/profile/did:plc:3guzzweuqraryl3rdkimjamk/feed/for-you), used by around 72,000 people. This guest post on the AT Protocol blog explains how it works.
The architecture is *fascinating*. The feed is served by a single Go process using SQLite on a "gaming" PC in spacecowboy's living room - 16 cores, 96GB of RAM and 4TB of attached NVMe storage.
Recommendations are based on likes: what else are the people who like the same things as you liking on the platform?
That Go server consumes the Bluesky firehose and stores the relevant details in SQLite, keeping the last 90 days of relevant data, which currently uses around 419GB of SQLite storage.
Public internet traffic is handled by a $7/month VPS on OVH, which talks to the living room server via Tailscale.
Total cost is now $30/month: $20 in electricity, $7 in VPS and $3 for the two domain names. spacecowboy estimates that the existing system could handle all ~1 million daily active Bluesky users if they were to switch to the cheapest algorithm they have found to work. |
| entry |
9272 |
2026-04-23 21:54:24+00:00 |
Extract PDF text in your browser with LiteParse for the web |
<p>LlamaIndex have a most excellent open source project called <a href="https://github.com/run-llama/liteparse">LiteParse</a>, which provides a Node.js CLI tool for extracting text from PDFs. I got a version of LiteParse working entirely in the browser, using most of the same libraries that LiteParse uses to run in Node.js.</p>
<h4 id="spatial-text-parsing">Spatial text parsing</h4>
<p>Refreshingly, LiteParse doesn't use AI models to do what it does: it's good old-fashioned PDF parsing, falling back to Tesseract OCR (or other pluggable OCR engines) for PDFs that contain images of text rather than the text itself.</p>
<p>The hard problem that LiteParse solves is extracting text in a sensible order despite the infuriating vagaries of PDF layouts. They describe this as "spatial text parsing" - they use some very clever heuristics to detect things like multi-column layouts and group and return the text in a sensible linear flow.</p>
<p>The LiteParse documentation describes a pattern for implementing <a href="https://developers.llamaindex.ai/liteparse/guides/visual-citations/">Visual Citations with Bounding Boxes</a>. I really like this idea: being able to answer questions from a PDF and accompany those answers with cropped, highlighted images feels like a great way of increasing the credibility of answers from RAG-style Q&A.</p>
<p>LiteParse is provided as a pure CLI tool, designed to be used by agents. You run it like this:</p>
<pre><code>npm i -g @llamaindex/liteparse
lit parse document.pdf
</code></pre>
<p>I <a href="https://claude.ai/share/44a5ed86-e5b5-4e14-90be-1eba1e0acd13">explored its capabilities with Claude</a> and quickly determined that there was no real reason it had to stay a CLI app: it's built on top of PDF.js and Tesseract.js, two libraries I've used for something similar in a browser <a href="https://simonwillison.net/2024/Mar/30/ocr-pdfs-images/">in the past</a>.</p>
<p>The only reason LiteParse didn't have a pure browser-based version is that nobody had built one yet...</p>
<h4 id="introducing-liteparse-for-the-web">Introducing LiteParse for the web</h4>
<p>Visit <a href="https://simonw.github.io/liteparse/">https://simonw.github.io/liteparse/</a> to try out LiteParse against any PDF file, running entirely in your browser. Here's what that looks like:</p>
<p><img src="https://static.simonwillison.net/static/2026/liteparse-web.jpg" alt="Screenshot of the LiteParse browser demo web page. Header reads "LiteParse" with subtitle "Browser demo of LiteParse — parse PDFs in your browser. Nothing leaves your machine." A dashed-border drop zone says "Drop a PDF here or click to choose / Your file stays in your browser." with a file pill labeled "19720005243.pdf". Below are a checked "Run OCR" checkbox, an unchecked "Render page screenshots" checkbox, and a blue "Parse" button. Status text: "Parsed 86 pages." Two side-by-side panels follow. Left panel titled "Text" with a Copy button shows monospace extracted text beginning "Apollo 5 was an unmanned system, both propulsion systems ascent and descent stages". Right panel titled "JSON", also with a copy button, contains JSON showing the dimensions and position and detected font of each piece of text." style="max-width: 100%;" /></p>
<p>The tool can work with or without running OCR, and can optionally display images for every page in the PDF further down the page.</p>
<h4 id="building-it-with-claude-code-and-opus-4-7">Building it with Claude Code and Opus 4.7</h4>
<p>The process of building this started in the regular Claude app on my iPhone. I wanted to try out LiteParse myself, so I started by uploading a random PDF I happened to have on my phone along with this prompt:</p>
<blockquote>
<p><code>Clone https://github.com/run-llama/liteparse and try it against this file</code></p>
</blockquote>
<p>Regular Claude chat can clone directly from GitHub these days, and while by default it can't access most of the internet from its container it can also install packages from PyPI and npm.</p>
<p>I often use this to try out new pieces of open source software on my phone - it's a quick way to exercise something without having to sit down with my laptop.</p>
<p>You can follow my full conversation in <a href="https://claude.ai/share/44a5ed86-e5b5-4e14-90be-1eba1e0acd13">this shared Claude transcript</a>. I asked a few follow-up questions about how it worked, and then asked:</p>
<blockquote>
<p><code>Does this library run in a browser? Could it?</code></p>
</blockquote>
<p>This gave me a thorough enough answer that I was convinced it was worth trying getting that to work for real. I opened up my laptop and switched to Claude Code.</p>
<p>I forked the original repo on GitHub, cloned a local copy, started a new <code>web</code> branch and pasted that last reply from Claude into a new file called <a href="https://github.com/simonw/liteparse/blob/web/notes.md">notes.md</a>. Then I told Claude Code:</p>
<blockquote>
<p><code>Get this working as a web app. index.html, when loaded, should render an app that lets users open a PDF in their browser and select OCR or non-OCR mode and have this run. Read notes.md for initial research on this problem, then write out plan.md with your detailed implementation plan</code></p>
</blockquote>
<p>I always like to start with a plan for this kind of project. Sometimes I'll use Claude's "planning mode", but in this case I knew I'd want the plan as an artifact in the repository so I told it to write <code>plan.md</code> directly.</p>
<p>This also means I can iterate on the plan with Claude. I noticed that Claude had decided to punt on generating screenshots of images in the PDF, and suggested we defer a "canvas-encode swap" to v2. I fixed that by prompting:</p>
<blockquote>
<p><code>Update the plan to say we WILL do the canvas-encode swap so the screenshots thing works</code></p>
</blockquote>
<p>After a few short follow-up prompts, here's the <a href="https://github.com/simonw/liteparse/blob/web/plan.md">plan.md</a> I thought was strong enough to implement.</p>
<p>I prompted:</p>
<blockquote>
<p><code>build it.</code></p>
</blockquote>
<p>And then mostly left Claude Code to its own devices, tinkered with some other projects, caught up on Duolingo and occasionally checked in to see how it was doing.</p>
<p>I added a few prompts to the queue as I was working. Those don't yet show up in my exported transcript, but it turns out running <code>rg queue-operation --no-filename | grep enqueue | jq -r '.content'</code> in the relevant <code>~/.claude/projects/</code> folder extracts them.</p>
<p>Here are the key follow-up prompts with some notes:</p>
<ul>
<li>
<code>When you implement this use playwright and red/green TDD, plan that too</code> - I've written more <a href="https://simonwillison.net/guides/agentic-engineering-patterns/red-green-tdd/">about red/green TDD here</a>.</li>
<li>
<code>let's use PDF.js's own renderer</code> (it was messing around with pdfium)</li>
<li>
<code>The final UI should include both the text and the pretty-printed JSON output, both of those in textareas and both with copy-to-clipboard buttons - it should also be mobile friendly</code> - I had a new idea for how the UI should work</li>
<li>
<code>small commits along the way</code> - see below</li>
<li>
<code>Make sure the index.html page includes a link back to https://github.com/run-llama/liteparse near the top of the page</code> - it's important to credit your dependencies in a project like this!</li>
<li><code>View on GitHub → is bad copy because that's not the repo with this web app in, it's the web app for the underlying LiteParse library</code></li>
<li><code>Run OCR should be unchecked by default</code></li>
<li>
<code>When I try to parse a PDF in my browser I see 'Parse failed: undefined is not a function (near '...value of readableStream...')</code> - it was testing with Playwright in Chrome, turned out there was a bug in Safari</li>
<li><code>... oh that is in safari but it works in chrome</code></li>
<li><code>When "Copy" is clicked the text should change to "Copied!" for 1.5s</code></li>
<li>
<code>[Image #1] Style the file input so that long filenames don't break things on Firefox like this - in fact add one of those drag-drop zone UIs which you can also click to select a file</code> - dropping screenshots in of small UI glitches works surprisingly well</li>
<li><code>Tweak the drop zone such that the text is vertically centered, right now it is a bit closer to the top</code></li>
<li>
<code>it breaks in Safari on macOS, works in both Chrome and Firefox. On Safari I see "Parse failed: undefined is not a function (near '...value of readableStream...')" after I click the Parse button, when OCR is not checked</code> - it still wasn't working in Safari...</li>
<li>
<code>works in safari now</code> - but it fixed it pretty quickly once I pointed that out and it got Playwright working with that browser</li>
</ul>
<p>I've started habitually asking for "small commits along the way" because it makes for code that's easier to understand or review later on, and I have an unproven hunch that it helps the agent work more effectively too - it's yet another encouragement towards planning and taking on one problem at a time.</p>
<p>While it was working I decided it would be nice to be able to interact with an in-progress version. I asked a separate Claude Code session against the same directory for tips on how to run it, and it told me to use <code>npx vite</code>. Running that started a development server with live-reloading, which meant I could instantly see the effect of each change it made on disk - and prompt with further requests for tweaks and fixes.</p>
<p>Towards the end I decided it was going to be good enough to publish. I started a fresh Claude Code instance and told it:</p>
<blockquote>
<p><code>Look at the web/ folder - set up GitHub actions for this repo such that any push runs the tests, and if the tests pass it then does a GitHub Pages deploy of the built vite app such that the web/index.html page is the index.html page for the thing that is deployed and it works on GitHub Pages</code></p>
</blockquote>
<p>After a bit more iteration <a href="https://github.com/simonw/liteparse/blob/web/.github/workflows/deploy-web.yml">here's the GitHub Actions workflow</a> that builds the app using Vite and deploys the result to <a href="https://simonw.github.io/liteparse/">https://simonw.github.io/liteparse/</a>.</p>
<p>I love GitHub Pages for this kind of thing because it can be quickly configured (by Claude, in this case) to turn any repository into a deployed web-app, at zero cost and with whatever build step is necessary. It even works against private repos, if you don't mind your only security being a secret URL.</p>
<p>With this kind of project there's always a major risk that the model might "cheat" - mark key features as "TODO" and fake them, or take shortcuts that ignore the initial requirements.</p>
<p>The responsible way to prevent this is to review all of the code... but this wasn't intended as that kind of project, so instead I fired up OpenAI Codex with GPT-5.5 (I had preview access) and told it:</p>
<blockquote>
<p><code>Describe the difference between how the node.js CLI tool runs and how the web/ version runs</code></p>
</blockquote>
<p>The answer I got back was enough to give me confidence that Claude hadn't taken any project-threatening shortcuts.</p>
<p>... and that was about it. Total time in Claude Code for that "build it" step was 59 minutes. I used my <a href="https://github.com/simonw/claude-code-transcripts">claude-code-transcripts</a> tool to export a readable version of the full transcript which you can <a href="https://gisthost.github.io/?d64889bfc1b897fea3867adfec62ed89/index.html">view here</a>, albeit without those additional queued prompts (here's my <a href="https://github.com/simonw/claude-code-transcripts/issues/98">issue to fix that</a>).</p>
<h4 id="is-this-even-vibe-coding-any-more-">Is this even vibe coding any more?</h4>
<p>I'm a pedantic stickler when it comes to <a href="https://simonwillison.net/2025/Mar/19/vibe-coding/">the original definition of vibe coding</a> - vibe coding does <em>not</em> mean any time you use AI to help you write code, it's when you use AI without reviewing or caring about the code that's written at all.</p>
<p>By my own definition, this LiteParse for the web project is about as pure vibe coding as you can get! I have not looked at a <em>single line</em> of the HTML and TypeScript written for this project - in fact while writing this sentence I had to go and check if it had used JavaScript or TypeScript.</p>
<p>Yet somehow this one doesn't feel as vibe coded to me as many of my other vibe coded projects:</p>
<ul>
<li>As a static in-browser web application hosted on GitHub Pages the blast radius for any bugs is almost non-existent: it either works for your PDF or doesn't.</li>
<li>No private data is transferred anywhere - all processing happens in your browser - so a security audit is unnecessary. I've glanced once at the network panel while it's running and no additional requests are made when a PDF is being parsed.</li>
<li>There was still a whole lot of engineering experience and knowledge required to use the models in this way. Identifying that porting LiteParse to run directly in a browser was critical to the rest of the project.</li>
</ul>
<p>Most importantly, I'm happy to attach my reputation to this project and recommend that other people try it out. Unlike most of my vibe coded tools I'm not convinced that spending significant additional engineering time on this would have resulted in a meaningfully better initial release. It's fine as it is!</p>
<p>I haven't opened a PR against the <a href="https://github.com/run-llama/liteparse">origin repository</a> because I've not discussed it with the LiteParse team. I've <a href="https://github.com/run-llama/liteparse/issues/147">opened an issue</a>, and if they want my vibe coded implementation as a starting point for something more official they're welcome to take it.</p> |
| entry |
9271 |
2026-04-23 19:59:47+00:00 |
A pelican for GPT-5.5 via the semi-official Codex backdoor API |
<p><a href="https://openai.com/index/introducing-gpt-5-5/">GPT-5.5 is out</a>. It's available in OpenAI Codex and is rolling out to paid ChatGPT subscribers. I've had some preview access and found it to be a fast, effective and highly capable model. As is usually the case these days, it's hard to put into words what's good about it - I ask it to build things and it builds exactly what I ask for!</p>
<p>There's one notable omission from today's release - the API:</p>
<blockquote>
<p>API deployments require different safeguards and we are working closely with partners and customers on the safety and security requirements for serving it at scale. We'll bring GPT‑5.5 and GPT‑5.5 Pro to the API very soon.</p>
</blockquote>
<p>When I run my <a href="https://simonwillison.net/tags/pelican-riding-a-bicycle/">pelican benchmark</a> I always prefer to use an API, to avoid hidden system prompts in ChatGPT or other agent harnesses from impacting the results.</p>
<h4 id="the-openclaw-backdoor">The OpenClaw backdoor</h4>
<p>One of the ongoing tension points in the AI world over the past few months has concerned how agent harnesses like OpenClaw and Pi interact with the APIs provided by the big providers.</p>
<p>Both OpenAI and Anthropic offer popular monthly subscriptions which provide access to their models at a significant discount to their raw API.</p>
<p>OpenClaw integrated directly with this mechanism, and was then <a href="https://www.theverge.com/ai-artificial-intelligence/907074/anthropic-openclaw-claude-subscription-ban">blocked from doing so</a> by Anthropic. This kicked off a whole thing. OpenAI - who recently hired OpenClaw creator Peter Steinberger - saw an opportunity for an easy karma win and announced that OpenClaw was welcome to continue integrating with OpenAI's subscriptions via the same mechanism used by their (open source) Codex CLI tool.</p>
<p>Does this mean <em>anyone</em> can write code that integrates with OpenAI's Codex-specific APIs to hook into those existing subscriptions?</p>
<p>The other day <a href="https://twitter.com/jeremyphoward/status/2046537816834965714">Jeremy Howard asked</a>:</p>
<blockquote>
<p>Anyone know whether OpenAI officially supports the use of the <code>/backend-api/codex/responses</code> endpoint that Pi and Opencode (IIUC) uses?</p>
</blockquote>
<p>It turned out that on March 30th OpenAI's Romain Huet <a href="https://twitter.com/romainhuet/status/2038699202834841962">had tweeted</a>:</p>
<blockquote>
<p>We want people to be able to use Codex, and their ChatGPT subscription, wherever they like! That means in the app, in the terminal, but also in JetBrains, Xcode, OpenCode, Pi, and now Claude Code.</p>
<p>That’s why Codex CLI and Codex app server are open source too! 🙂</p>
</blockquote>
<p>And Peter Steinberger <a href="https://twitter.com/steipete/status/2046775849769148838">replied to Jeremy</a> that:</p>
<blockquote>
<p>OpenAI sub is officially supported.</p>
</blockquote>
<h4 id="llm-openai-via-codex">llm-openai-via-codex</h4>
<p>So... I had Claude Code reverse-engineer the <a href="https://github.com/openai/codex">openai/codex</a> repo, figure out how authentication tokens were stored and build me <a href="https://github.com/simonw/llm-openai-via-codex">llm-openai-via-codex</a>, a new plugin for <a href="https://llm.datasette.io/">LLM</a> which picks up your existing Codex subscription and uses it to run prompts!</p>
<p>(With hindsight I wish I'd used GPT-5.4 or the GPT-5.5 preview, it would have been funnier. I genuinely considered rewriting the project from scratch using Codex and GPT-5.5 for the sake of the joke, but decided not to spend any more time on this!)</p>
<p>Here's how to use it:</p>
<ol>
<li>Install Codex CLI, buy an OpenAI plan, login to Codex</li>
<li>Install LLM: <code>uv tool install llm</code>
</li>
<li>Install the new plugin: <code>llm install llm-openai-via-codex</code>
</li>
<li>Start prompting: <code>llm -m openai-codex/gpt-5.5 'Your prompt goes here'</code>
</li>
</ol>
<p>All existing LLM features should also work - use <code>-a filepath.jpg/URL</code> to attach an image, <code>llm chat -m openai-codex/gpt-5.5</code> to start an ongoing chat, <code>llm logs</code> to view logged conversations and <code>llm --tool ...</code> to <a href="https://llm.datasette.io/en/stable/tools.html">try it out with tool support</a>.</p>
<h4 id="and-some-pelicans">And some pelicans</h4>
<p>Let's generate a pelican!</p>
<div class="highlight highlight-source-shell"><pre>llm install llm-openai-via-codex
llm -m openai-codex/gpt-5.5 <span class="pl-s"><span class="pl-pds">'</span>Generate an SVG of a pelican riding a bicycle<span class="pl-pds">'</span></span></pre></div>
<p>Here's <a href="https://gist.github.com/simonw/edda1d98f7ba07fd95eeff473cb16634">what I got back</a>:</p>
<p><img src="https://static.simonwillison.net/static/2026/gpt-5.5-pelican.png" alt="It is a bit mangled to be honest - good beak, pelican body shapes are slightly weird, legs do at least extend to the pedals, bicycle frame is not quite right." style="max-width: 100%;" /></p>
<p>I've seen better <a href="https://simonwillison.net/2026/Mar/17/mini-and-nano/#pelicans">from GPT-5.4</a>, so I tagged on <code>-o reasoning_effort xhigh</code> and <a href="https://gist.github.com/simonw/a6168e4165a258e4d664aeae8e602cc5">tried again</a>:</p>
<p>That one took almost four minutes to generate, but I think it's a much better effort.</p>
<p><img src="https://static.simonwillison.net/static/2026/gpt-5.5-pelican-xhigh.png" alt="Pelican has gradients now, body is much better put together, bicycle is nearly the right shape albeit with one extra bar between pedals and front wheel, clearly a better image overall." style="max-width: 100%;" /></p>
<p>If you compare the SVG code (<a href="https://gist.github.com/simonw/edda1d98f7ba07fd95eeff473cb16634#response">default</a>, <a href="https://gist.github.com/simonw/a6168e4165a258e4d664aeae8e602cc5#response">xhigh</a>) the <code>xhigh</code> one took a very different approach, which is much more CSS-heavy - as demonstrated by those gradients. <code>xhigh</code> used 9,322 reasoning tokens where the default used just 39.</p>
<h4 id="a-few-more-notes-on-gpt-5-5">A few more notes on GPT-5.5</h4>
<p>One of the most notable things about GPT-5.5 is the pricing. Once it goes live in the API it's <a href="https://openai.com/index/introducing-gpt-5-5/#availability-and-pricing">going to be priced</a> at <em>twice</em> the cost of GPT-5.4 - $5 per 1M input tokens and $30 per 1M output tokens, where 5.4 is $2.5 and $15.</p>
<p>GPT-5.5 Pro will be even more: $30 per 1M input tokens and $180 per 1M output tokens.</p>
<p>GPT-5.4 will remain available. At half the price of 5.5 this feels like 5.4 is to 5.5 as Claude Sonnet is to Claude Opus.</p>
<p>Ethan Mollick has a <a href="https://www.oneusefulthing.org/p/sign-of-the-future-gpt-55">detailed review of GPT-5.5</a> where he put it (and GPT-5.5 Pro) through an array of interesting challenges. His verdict: the jagged frontier continues to hold, with GPT-5.5 excellent at some things and challenged by others in a way that remains difficult to predict.</p> |
| quotation |
2159 |
2026-04-23 13:35:37+00:00 |
[...] if you ever needed another reason to [learn in public](https://www.swyx.io/learn-in-public) by [digital gardening](https://maggieappleton.com/garden-history) or podcasting or streaming or whathaveyou, add on that people will assume you’re more competent than you are. This will get you invites to very cool exclusive events filled with high-achieving, interesting people, even though you have no right to be there. A+ side benefit. - Maggie Appleton |
|
| blogmark |
9420 |
2026-04-22 16:45:23+00:00 |
Qwen3.6-27B: Flagship-Level Coding in a 27B Dense Model - Hacker News |
Big claims from Qwen about their latest open weight model:
> Qwen3.6-27B delivers flagship-level agentic coding performance, surpassing the previous-generation open-source flagship Qwen3.5-397B-A17B (397B total / 17B active MoE) across all major coding benchmarks.
On Hugging Face [Qwen3.5-397B-A17B](https://huggingface.co/Qwen/Qwen3.5-397B-A17B/tree/main) is 807GB, this new [Qwen3.6-27B](https://huggingface.co/Qwen/Qwen3.6-27B/tree/main) is 55.6GB.
I tried it out with the 16.8GB Unsloth [Qwen3.6-27B-GGUF:Q4_K_M](https://huggingface.co/unsloth/Qwen3.6-27B-GGUF) quantized version and `llama-server` using this recipe by [benob on Hacker News](https://news.ycombinator.com/item?id=47863217#47865140), after first installing `llama-server` using `brew install llama.cpp`:
llama-server \
-hf unsloth/Qwen3.6-27B-GGUF:Q4_K_M \
--no-mmproj \
--fit on \
-np 1 \
-c 65536 \
--cache-ram 4096 -ctxcp 2 \
--jinja \
--temp 0.6 \
--top-p 0.95 \
--top-k 20 \
--min-p 0.0 \
--presence-penalty 0.0 \
--repeat-penalty 1.0 \
--reasoning on \
--chat-template-kwargs '{"preserve_thinking": true}'
On first run that saved the ~17GB model to `~/.cache/huggingface/hub/models--unsloth--Qwen3.6-27B-GGUF`.
Here's [the transcript](https://gist.github.com/simonw/4d99d730c840df594096366db1d27281) for "Generate an SVG of a pelican riding a bicycle". This is an *outstanding* result for a 16.8GB local model:

Performance numbers reported by `llama-server`:
- Reading: 20 tokens, 0.4s, 54.32 tokens/s
- Generation: 4,444 tokens, 2min 53s, 25.57 tokens/s
For good measure, here's [Generate an SVG of a NORTH VIRGINIA OPOSSUM ON AN E-SCOOTER](https://gist.github.com/simonw/95735fe5e76e6fdf1753e6dcce360699) (run previously [with GLM-5.1](https://simonwillison.net/2026/Apr/7/glm-51/)):

That one took 6,575 tokens, 4min 25s, 24.74 t/s. |
| quotation |
2142 |
2026-04-22 05:40:56+00:00 |
As part of our continued collaboration with Anthropic, we had the opportunity to apply an early version of Claude Mythos Preview to Firefox. This week’s release of Firefox 150 includes fixes for [271 vulnerabilities](https://www.mozilla.org/en-US/security/advisories/mfsa2026-30/) identified during this initial evaluation. [...]
Our experience is a hopeful one for teams who shake off the vertigo and get to work. You may need to reprioritize everything else to bring relentless and single-minded focus to the task, but there is light at the end of the tunnel. We are extremely proud of how our team rose to meet this challenge, and others will too. Our work isn’t finished, but we’ve turned the corner and can glimpse a future much better than just keeping up. **Defenders finally have a chance to win, decisively**. - Bobby Holley |
|
| blogmark |
9419 |
2026-04-22 03:30:02+00:00 |
Changes to GitHub Copilot Individual plans - Hacker News |
On the same day as Claude Code's temporary will-they-won't-they $100/month kerfuffle (for the moment, [they won't](https://simonwillison.net/2026/Apr/22/claude-code-confusion/#they-reversed-it)), here's the latest on GitHub Copilot pricing.
Unlike Anthropic, GitHub put up an official announcement about their changes, which include tightening usage limits, pausing signups for individual plans (!), restricting Claude Opus 4.7 to the more expensive $39/month "Pro+" plan, and dropping the previous Opus models entirely.
The key paragraph:
> Agentic workflows have fundamentally changed Copilot’s compute demands. Long-running, parallelized sessions now regularly consume far more resources than the original plan structure was built to support. As Copilot’s agentic capabilities have expanded rapidly, agents are doing more work, and more customers are hitting usage limits designed to maintain service reliability.
It's easy to forget that just six months ago heavy LLM users were burning an order of magnitude less tokens. Coding agents consume a *lot* of compute.
Copilot was also unique (I believe) among agents in charging per-request, not per-token. (*Correction: Windsurf also operated a credit system like this which they [abandoned last month](https://windsurf.com/blog/windsurf-pricing-plans)*.) This means that single agentic requests which burn more tokens cut directly into their margins. The most recent pricing scheme addresses that with token-based usage limits on a per-session and weekly basis.
My one problem with this announcement is that it doesn't clearly clarify *which* product called "GitHub Copilot" is affected by these changes. Last month in [How many products does Microsoft have named 'Copilot'? I mapped every one](https://teybannerman.com/strategy/2026/03/31/how-many-microsoft-copilot-are-there.html) Tey Bannerman identified 75 products that share the Copilot brand, 15 of which have "GitHub Copilot" in the title.
Judging by the linked [GitHub Copilot plans page](https://github.com/features/copilot/plans) this covers Copilot CLI, Copilot cloud agent and code review (features on [GitHub.com](https://github.com/) itself), and the Copilot IDE features available in VS Code, Zed, JetBrains and more. |
| entry |
9248 |
2026-04-22 02:07:34+00:00 |
Is Claude Code going to cost $100/month? Probably not - it's all very confusing |
<p>Anthropic today quietly (as in <em>silently</em>, no announcement anywhere at all) updated their <a href="https://claude.com/pricing">claude.com/pricing</a> page (but not their <a href="https://support.claude.com/en/articles/11049762-choosing-a-claude-plan">Choosing a Claude plan page</a>, which shows up first for me on Google) to add this tiny but significant detail (arrow is mine, <a href="https://simonwillison.net/2026/Apr/22/claude-code-confusion/#they-reversed-it">and it's already reverted</a>):</p>
<p><img src="https://static.simonwillison.net/static/2026/anthropic-x.jpg" alt="Screenshot of the Claude pricing grid - Compare features across plans. Free, Pro, Max 5x and Max 20x all have the same features, with the exception of Claude Code which is on Max only and Claude Cowork which is on Pro and Max only. An arrow highlights the Claude Code for Pro cross." style="max-width: 100%;" /></p>
<p>The <a href="https://web.archive.org/web/20260421040656/claude.com/pricing">Internet Archive copy</a> from yesterday shows a checkbox there. Claude Code used to be a feature of the $20/month Pro plan, but according to the new pricing page it is now exclusive to the $100/month or $200/month Max plans.</p>
<p><em><strong>Update</strong>: don't miss <a href="https://simonwillison.net/2026/Apr/22/claude-code-confusion/#they-reversed-it">the update to this post</a>, they've already changed course a few hours after this change went live.</em></p>
<p>So what the heck is going on? Unsurprisingly, <a href="https://www.reddit.com/r/ClaudeAI/comments/1srzhd7/psa_claude_pro_no_longer_lists_claude_code_as_an/">Reddit</a> and <a href="https://news.ycombinator.com/item?id=47854477">Hacker News</a> and <a href="https://twitter.com/i/trending/2046718768634589239">Twitter</a> all caught fire.</p>
<p>I didn't believe the screenshots myself when I first saw them - aside from the pricing grid I could find no announcement from Anthropic anywhere. Then Amol Avasare, Anthropic's Head of Growth, <a href="https://twitter.com/TheAmolAvasare/status/2046724659039932830">tweeted</a>:</p>
<blockquote>
<p>For clarity, we're running a small test on ~2% of new prosumer signups. Existing Pro and Max subscribers aren't affected.</p>
</blockquote>
<p>And that appears to be the closest we have had to official messaging from Anthropic.</p>
<p>I don't buy the "~2% of new prosumer signups" thing, since everyone I've talked to is seeing the new pricing grid and the Internet Archive has already <a href="https://web.archive.org/web/20260422001250/https://claude.com/pricing">snapped a copy</a>. Maybe he means that they'll only be running this version of the pricing grid for a limited time which somehow adds up to "2%" of signups?</p>
<p>I'm also amused to see Claude Cowork remain available on the $20/month plan, because Claude Cowork is effectively a rebranded version of Claude Code wearing a less threatening hat!</p>
<p>There are a whole bunch of things that are bad about this.</p>
<p>If we assume this is indeed a test, and that test comes up negative and they decide not to go ahead with it, the damage has still been extensive:</p>
<ol>
<li>A whole lot of people got scared or angry or both that a service they relied on was about to be rug-pulled. There really is a significant difference between $20/month and $100/month for most people, especially outside of higher salary countries.</li>
<li>The uncertainty is really bad! A tweet from an employee is <em>not</em> the way to make an announcement like this. I wasted a solid hour of my afternoon trying to figure out what had happened here. My trust in Anthropic's transparency around pricing - a <em>crucial factor</em> in how I understand their products - has been shaken.</li>
<li>Strategically, should I be taking a bet on Claude Code if I know that they might 5x the minimum price of the product?</li>
<li>More of a personal issue, but one I care deeply about myself: I invest a <a href="https://simonwillison.net/tags/claude-code/">great deal of effort</a> (that's 105 posts and counting) in teaching people how to use Claude Code. I don't want to invest that effort in a product that most people cannot afford to use.</li>
</ol>
<p>Last month I ran <a href="https://simonw.github.io/nicar-2026-coding-agents/">a tutorial for journalists</a> on "Coding agents for data analysis" at the annual NICAR data journalism conference. I'm not going to be teaching that audience a course that depends on a $100/month subscription!</p>
<p>This also doesn't make sense to me as a strategy for Anthropic. Claude Code <em>defined the category</em> of coding agents. It's responsible for billions of dollars in annual revenue for Anthropic already. It has a stellar reputation, but I'm not convinced that reputation is strong enough for it to lose the $20/month trial and jump people directly to a $100/month subscription.</p>
<p>OpenAI have been investing heavily in catching up to Claude Code with their Codex products. Anthropic just handed them this marketing opportunity on a plate - here's Codex engineering lead <a href="https://twitter.com/thsottiaux/status/2046740759056162816">Thibault Sottiaux</a>:</p>
<blockquote>
<p>I don't know what they are doing over there, but Codex will continue to be available both in the FREE and PLUS ($20) plans. We have the compute and efficient models to support it. For important changes, we will engage with the community well ahead of making them.</p>
<p>Transparency and trust are two principles we will not break, even if it means momentarily earning less. A reminder that you vote with your subscription for the values you want to see in this world.</p>
</blockquote>
<p>I should note that I pay $200/month for Claude Max and I consider it well worth the money. I've had periods of free access in the past courtesy of Anthropic but I'm currently paying full price, and happy to do so.</p>
<p>But I care about the accessibility of the tools that I work with and teach. If Codex has a free tier while Claude Code starts at $100/month I should obviously switch to Codex, because that way I can use the same tool as the people I want to teach how to use coding agents.</p>
<p>Here's what I think happened. I think Anthropic are trying to optimize revenue growth - obviously - and someone pitched making Claude Code only available for Max and higher. That's clearly a bad idea, but "testing" culture says that it's worth putting even bad ideas out to test just in case they surprise you.</p>
<p>So they started a test, without taking into account the wailing and gnashing of teeth that would result when their test was noticed - or accounting for the longer-term brand damage that would be caused.</p>
<p>Or maybe they <em>did</em> account for that, and decided it was worth the risk.</p>
<p>I don't think that calculation was worthwhile. They're going to have to make a <em>very</em> firm commitment along the lines of "we heard your feedback and we commit to keeping Claude Code available on our $20/month plan going forward" to regain my trust.</p>
<p>As it stands, Codex is looking like a much safer bet for me to invest my time in learning and building educational materials around.</p>
<h4 id="they-reversed-it">Update: they've reversed it already</h4>
<p>In the time I was <em>typing this blog entry</em> Anthropic appear to have reversed course - the <a href="https://claude.com/pricing">claude.com/pricing page</a> now has a checkbox back in the Pro column for Claude Code. I can't find any official communication about it though.</p>
<p>Let's see if they can come up with an explanation/apology that's convincing enough to offset the trust bonfire from this afternoon!</p>
<h4 id="update-2">Update 2: it may still affect 2% of signups?</h4>
<p>Amol <a href="https://x.com/TheAmolAvasare/status/2046788872517066971">on Twitter</a>:</p><blockquote><p>was a mistake that the logged-out landing page and docs were updated for this test [<a href="https://twitter.com/TheAmolAvasare/status/2046783926920978681">embedded self-tweet</a>]</p>
<blockquote><p>Getting lots of questions on why the landing page / docs were updated if only 2% of new signups were affected.</p>
<p>This was understandably confusing for the 98% of folks not part of the experiment, and we've reverted both the landing page and docs changes.</p>
</blockquote>
</blockquote>
<p>So the experiment is still running, just not visible to the rest of the world?</p> |
| entry |
9247 |
2026-04-21 20:32:24+00:00 |
Where's the raccoon with the ham radio? (ChatGPT Images 2.0) |
<p>OpenAI <a href="https://openai.com/index/introducing-chatgpt-images-2-0/">released ChatGPT Images 2.0 today</a>, their latest image generation model. On <a href="https://www.youtube.com/watch?v=sWkGomJ3TLI">the livestream</a> Sam Altman said that the leap from gpt-image-1 to gpt-image-2 was equivalent to jumping from GPT-3 to GPT-5. Here's how I put it to the test.</p>
<p>My prompt:</p>
<blockquote>
<p><code>Do a where's Waldo style image but it's where is the raccoon holding a ham radio</code></p>
</blockquote>
<h4 id="gpt-image-1">gpt-image-1</h4>
<p>First as a baseline here's what I got from the older gpt-image-1 using ChatGPT directly:</p>
<p><a href="https://static.simonwillison.net/static/2026/chatgpt-image-1-ham-radio.png"><img loading="lazy" src="https://static.simonwillison.net/static/2026/image_crop_1402x1122_w1402_q0.3.jpg" alt="There's a lot going on, but I couldn't find a raccoon." style="max-width: 100%;" /></a></p>
<p>I wasn't able to spot the raccoon - I quickly realized that testing image generation models on Where's Waldo style images (Where's Wally in the UK) can be pretty frustrating!</p>
<p>I tried <a href="https://claude.ai/share/bd6e9b88-29a9-420b-8ac1-3ac5cebac215">getting Claude Opus 4.7</a> with its new higher resolution inputs to solve it but it was convinced there was a raccoon it couldn't find thanks to the instruction card at the top left of the image:</p>
<blockquote>
<p><strong>Yes — there's at least one raccoon in the picture, but it's very well hidden</strong>. In my careful sweep through zoomed-in sections, honestly, I couldn't definitively spot a raccoon holding a ham radio. [...]</p>
</blockquote>
<h4 id="nano-banana-2-and-pro">Nano Banana 2 and Pro</h4>
<p>Next I tried Google's Nano Banana 2, <a href="https://gemini.google.com/share/3775db96c576">via Gemini</a>:</p>
<p><a href="https://static.simonwillison.net/static/2026/nano-banana-2-ham-radio.jpg"><img loading="lazy" src="https://static.simonwillison.net/static/2026/gemini-ham-radio-small.jpg" alt="Busy Where's Waldo-style illustration of a park festival with crowds of people, tents labeled "FOOD & DRINK", "CRAFT FAIR", "BOOK NOOK", "MUSIC FEST", and "AMATEUR RADIO CLUB - W6HAM" (featuring a raccoon in a red hat at the radio table), plus a Ferris wheel, carousel, gazebo with band, pond with boats, fountain, food trucks, and striped circus tents" style="max-width: 100%;" /></a></p>
<p>That one was pretty obvious, the raccoon is in the "Amateur Radio Club" booth in the center of the image!</p>
<p>Claude said:</p>
<blockquote>
<p>Honestly, this one wasn't really hiding — he's the star of the booth. Feels like the illustrator took pity on us after that last impossible scene. The little "W6HAM" callsign pun on the booth sign is a nice touch too.</p>
</blockquote>
<p>I also tried Nano Banana Pro <a href="https://aistudio.google.com/app/prompts?state=%7B%22ids%22:%5B%221sGU5A7mrngkfLfSEU84xaV1DhtOTnS--%22%5D,%22action%22:%22open%22,%22userId%22:%22106366615678321494423%22,%22resourceKeys%22:%7B%7D%7D&usp=sharing">in AI Studio</a> and got this, by far the worst result from any model. Not sure what went wrong here!</p>
<p><a href="https://static.simonwillison.net/static/2026/nano-banana-pro-ham-radio.jpg"><img loading="lazy" src="https://static.simonwillison.net/static/2026/nano-banana-pro-ham-radio-small.jpg" alt="The raccoon is larger than everyone else, right in the middle of the image with an ugly white border around it." style="max-width: 100%;" /></a></p>
<h4 id="gpt-image-2">gpt-image-2</h4>
<p>With the baseline established, let's try out the new model.</p>
<p>I used an updated version of my <a href="https://github.com/simonw/tools/blob/main/python/openai_image.py">openai_image.py</a> script, which is a thin wrapper around the <a href="https://github.com/openai/openai-python">OpenAI Python</a> client library. Their client library hasn't yet been updated to include <code>gpt-image-2</code> but thankfully it doesn't validate the model ID so you can use it anyway.</p>
<p>Here's how I ran that:</p>
<div class="highlight highlight-source-shell"><pre>OPENAI_API_KEY=<span class="pl-s"><span class="pl-pds">"</span><span class="pl-s"><span class="pl-pds">$(</span>llm keys get openai<span class="pl-pds">)</span></span><span class="pl-pds">"</span></span> \
uv run https://tools.simonwillison.net/python/openai_image.py \
-m gpt-image-2 \
<span class="pl-s"><span class="pl-pds">"</span>Do a where's Waldo style image but it's where is the raccoon holding a ham radio<span class="pl-pds">"</span></span></pre></div>
<p>Here's what I got back. I don't <em>think</em> there's a raccoon in there - I couldn't spot one, and neither could Claude.</p>
<p><a href="https://static.simonwillison.net/static/2026/gpt-image-2-default.png"><img loading="lazy" src="https://static.simonwillison.net/static/2026/gpt-image-2-default.jpg" alt="Lots of stuff, a ham radio booth, many many people, a lake, but maybe no raccoon?" style="max-width: 100%;" /></a></p>
<p>The <a href="https://github.com/openai/openai-cookbook/blob/main/examples/multimodal/image-gen-models-prompting-guide.ipynb">OpenAI image generation cookbook</a> has been updated with notes on <code>gpt-image-2</code>, including the <code>outputQuality</code> setting and available sizes.</p>
<p>I tried setting <code>outputQuality</code> to <code>high</code> and the dimensions to <code>3840x2160</code> - I believe that's the maximum - and got this - a 17MB PNG which I converted to a 5MB WEBP:</p>
<div class="highlight highlight-source-shell"><pre>OPENAI_API_KEY=<span class="pl-s"><span class="pl-pds">"</span><span class="pl-s"><span class="pl-pds">$(</span>llm keys get openai<span class="pl-pds">)</span></span><span class="pl-pds">"</span></span> \
uv run <span class="pl-s"><span class="pl-pds">'</span>https://raw.githubusercontent.com/simonw/tools/refs/heads/main/python/openai_image.py<span class="pl-pds">'</span></span> \
-m gpt-image-2 <span class="pl-s"><span class="pl-pds">"</span>Do a where's Waldo style image but it's where is the raccoon holding a ham radio<span class="pl-pds">"</span></span> \
--quality high --size 3840x2160</pre></div>
<p><a href="https://static.simonwillison.net/static/2026/image-fc93bd-q100.webp"><img loading="lazy" src="https://static.simonwillison.net/static/2026/image-fc93bd-q100.jpg" alt="Big complex image, lots of detail, good wording, there is indeed a raccoon with a ham radio." style="max-width: 100%;" /></a></p>
<p>That's pretty great! There's a raccoon with a ham radio in there (bottom left, quite easy to spot).</p>
<p>The image used 13,342 output tokens, which are charged at $30/million so a total cost of around <a href="https://www.llm-prices.com/#ot=13342&ic=5&cic=1.25&oc=10&sel=gpt-image-2-image">40 cents</a>.</p>
<h4 id="takeaways">Takeaways</h4>
<p>I think this new ChatGPT image generation model takes the crown from Gemini, at least for the moment.</p>
<p>Where's Waldo style images are an infuriating and somewhat foolish way to test these models, but they do help illustrate how good they are getting at complex illustrations combining both text and details.</p>
<h4 id="update-asking-models-to-solve-this-is-risky">Update: asking models to solve this is risky</h4>
<p>rizaco <a href="https://news.ycombinator.com/item?id=47852835#47853561">on Hacker News</a> asked ChatGPT to draw a red circle around the raccoon in one of the images in which I had failed to find one. Here's an animated mix of their result and the original image:</p>
<p><img loading="lazy" src="https://static.simonwillison.net/static/2026/ham-radio-cheat.gif" alt="The circle appears around a raccoon with a ham radio who is definitely not there in the original image!" style="max-width: 100%;" /></p>
<p>Looks like we definitely can't trust these models to usefully solve their own puzzles!</p> |
| quotation |
2141 |
2026-04-21 16:39:33+00:00 |
AI agents are already too human. Not in the romantic sense, not because they love or fear or dream, but in the more banal and frustrating one. The current implementations keep showing their human origin again and again: lack of stringency, lack of patience, lack of focus. Faced with an awkward task, they drift towards the familiar. Faced with hard constraints, they start negotiating with reality. - Andreas Påhlsson-Notini |
|
| blogmark |
9418 |
2026-04-21 15:54:43+00:00 |
scosman/pelicans_riding_bicycles - Hacker News comment |
I firmly approve of Steve Cosman's efforts to pollute the training set of pelicans riding bicycles.

(To be fair, most of the examples [I've published](https://simonwillison.net/tags/pelican-riding-a-bicycle/) count as poisoning too.) |
| blogmark |
9417 |
2026-04-20 00:50:45+00:00 |
Claude Token Counter, now with model comparisons - |
I [upgraded](https://github.com/simonw/tools/pull/269) my Claude Token Counter tool to add the ability to run the same count against different models in order to compare them.
As far as I can tell Claude Opus 4.7 is the first model to change the tokenizer, so it's only worth running comparisons between 4.7 and 4.6. The Claude [token counting API](https://platform.claude.com/docs/en/build-with-claude/token-counting) accepts any Claude model ID though so I've included options for all four of the notable current models (Opus 4.7 and 4.6, Sonnet 4.6, and Haiku 4.5).
In the Opus 4.7 announcement [Anthropic said](https://www.anthropic.com/news/claude-opus-4-7#migrating-from-opus-46-to-opus-47):
> Opus 4.7 uses an updated tokenizer that improves how the model processes text. The tradeoff is that the same input can map to more tokens—roughly 1.0–1.35× depending on the content type.
I pasted the [Opus 4.7 system prompt](https://github.com/simonw/research/blob/2cf912666ba08ef0c00a1b51ee07c9a8e64579ef/extract-system-prompts/claude-opus-4-7.md?plain=1) into the token counting tool and found that the Opus 4.7 tokenizer used 1.46x the number of tokens as Opus 4.6.

Opus 4.7 uses the same pricing is Opus 4.6 - $5 per million input tokens and $25 per million output tokens - but this token inflation means we can expect it to be around 40% more expensive.
The token counter tool also accepts images. Opus 4.7 has improved image support, described like this:
> Opus 4.7 has better vision for high-resolution images: it can accept images up to 2,576 pixels on the long edge (~3.75 megapixels), more than three times as many as prior Claude models.
I tried counting tokens for a 3456x2234 pixel 3.7MB PNG and got an even bigger increase in token counts - 3.01x times the number of tokens for 4.7 compared to 4.6:

**Update**: That 3x increase for images is *entirely* due to Opus 4.7 being able to handle higher resolutions. I tried that again with a 682x318 pixel image and it took 314 tokens with Opus 4.7 and 310 with Opus 4.6, so effectively the same cost.
**Update 2**: I tried a 15MB, 30 page text-heavy PDF and Opus 4.7 reported 60,934 tokens while 4.6 reported 56,482 - that's a 1.08x multiplier, significantly lower than the multiplier I got for raw text. |
| blogmark |
9416 |
2026-04-19 21:46:38+00:00 |
Headless everything for personal AI - |
Matt Webb thinks **headless** services are about to become much more common:
> Why? Because using personal AIs is a better experience for users than using services directly (honestly); and headless services are quicker and more dependable for the personal AIs than having them click round a GUI with a bot-controlled mouse.
Evidently [Marc Benioff thinks so too](https://twitter.com/benioff/status/2044981547267395620):
> Welcome Salesforce Headless 360: No Browser Required! Our API is the UI. Entire Salesforce & Agentforce & Slack platforms are now exposed as APIs, MCP, & CLI. All AI agents can access data, workflows, and tasks directly in Slack, Voice, or anywhere else with Salesforce Headless.
If this model does take off it's going to play havoc with existing per-head SaaS pricing schemes.
I'm reminded of the early 2010s era when every online service was launching APIs. Brandur Leach reminisces about that time in [The Second Wave of the API-first Economy](https://brandur.org/second-wave-api-first), and predicts that APIs are ready to make a comeback:
> Suddenly, an API is no longer liability, but a major saleable vector to give users what they want: a way into the services they use and pay for so that an agent can carry out work on their behalf. Especially given a field of relatively undifferentiated products, in the near future the availability of an API might just be the crucial deciding factor that leads to one choice winning the field. |
| entry |
9246 |
2026-04-18 23:59:40+00:00 |
Changes in the system prompt between Claude Opus 4.6 and 4.7 |
<p>Anthropic are the only major AI lab to <a href="https://platform.claude.com/docs/en/release-notes/system-prompts">publish the system prompts</a> for their user-facing chat systems. Their system prompt archive now dates all the way back to Claude 3 in July 2024 and it's always interesting to see how the system prompt evolves as they publish new models.</p>
<p>Opus 4.7 shipped the other day (April 16, 2026) with a <a href="https://claude.ai/">Claude.ai</a> system prompt update since Opus 4.6 (February 5, 2026).</p>
<p>I had Claude Code take <a href="https://platform.claude.com/docs/en/release-notes/system-prompts.md">the Markdown version of their system prompts</a>, break that up into separate documents for each of the models and then construct <a href="https://github.com/simonw/research/tree/main/extract-system-prompts#readme">a Git history</a> of those files over time with fake commit dates representing the publication dates of each updated prompt - <a href="https://github.com/simonw/research/pull/109#issue-4287908903">here's the prompt I used</a> with Claude Code for the web.</p>
<p>Here is the <a href="https://github.com/simonw/research/commit/888f21161500cd60b7c92367f9410e311ffcff09">git diff between Opus 4.6 and 4.7</a>. These are my own highlights extracted from that diff - in all cases text <strong>in bold</strong> is my emphasis:</p>
<ul>
<li>The "developer platform" is now called the "Claude Platform".</li>
<li>The list of Claude tools mentioned in the system prompt now includes "Claude in Chrome - a browsing agent that can interact with websites autonomously, Claude in Excel - a spreadsheet agent, and <strong>Claude in Powerpoint</strong> - a slides agent. Claude Cowork can use all of these as tools." - Claude in Powerpoint was not mentioned in the 4.6 prompt.</li>
<li>The child safety section has been greatly expanded, and is now wrapped in a new <code><critical_child_safety_instructions></code> tag. Of particular note: "Once Claude refuses a request for reasons of child safety, all subsequent requests in the same conversation must be approached with extreme caution."</li>
<li>It looks like they're trying to make Claude less pushy: "If a user indicates they are ready to end the conversation, Claude does not request that the user stay in the interaction or try to elicit another turn and instead respects the user's request to stop."</li>
<li>The new <code><acting_vs_clarifying></code> section includes:
<blockquote>
<p>When a request leaves minor details unspecified, <strong>the person typically wants Claude to make a reasonable attempt now, not to be interviewed first</strong>. Claude only asks upfront when the request is genuinely unanswerable without the missing information (e.g., it references an attachment that isn't there).</p>
<p>When a tool is available that could resolve the ambiguity or supply the missing information — searching, looking up the person's location, checking a calendar, discovering available capabilities — Claude calls the tool to try and solve the ambiguity before asking the person. Acting with tools is preferred over asking the person to do the lookup themselves.</p>
<p>Once Claude starts on a task, Claude sees it through to a complete answer rather than stopping partway. [...]</p>
</blockquote>
</li>
<li>It looks like Claude chat now has a tool search mechanism, as seen in <a href="https://platform.claude.com/docs/en/agents-and-tools/tool-use/tool-search-tool">this API documentation</a> and described in <a href="https://www.anthropic.com/engineering/advanced-tool-use">this November 2025 post</a>:
<blockquote>
<p>Before concluding Claude lacks a capability — access to the person's location, memory, calendar, files, past conversations, or any external data — <strong>Claude calls tool_search to check whether a relevant tool is available but deferred</strong>. "I don't have access to X" is only correct after tool_search confirms no matching tool exists.</p>
</blockquote>
</li>
<li>There's new language to encourage Claude to be less verbose:
<blockquote>
<p>Claude keeps its responses focused and concise so as to avoid potentially overwhelming the user with overly-long responses. Even if an answer has disclaimers or caveats, Claude discloses them briefly and keeps the majority of its response focused on its main answer.</p>
</blockquote>
</li>
<li>This section was present in the 4.6 prompt but has been removed for 4.7, presumably because the new model no longer misbehaves in the same way:
<blockquote>
<p>Claude avoids the use of emotes or actions inside asterisks unless the person specifically asks for this style of communication.</p>
<p>Claude avoids saying "genuinely", "honestly", or "straightforward".</p>
</blockquote>
</li>
<li>There's a new section about "disordered eating", which was not previously mentioned by name:
<blockquote>
<p>If a user shows signs of disordered eating, Claude should not give precise nutrition, diet, or exercise guidance — no specific numbers, targets, or step-by-step plans - anywhere else in the conversation. Even if it's intended to help set healthier goals or highlight the potential dangers of disordered eating, responses with these details could trigger or encourage disordered tendencies.</p>
</blockquote>
</li>
<li>A popular screenshot attack against AI models is to force them to say yes or no to a controversial question. Claude's system prompt now guards against that (in the <code><evenhandedness></code> section):
<blockquote>
<p>If people ask Claude to give a simple yes or no answer (or any other short or single word response) in response to complex or contested issues or as commentary on contested figures, Claude can decline to offer the short response and instead give a nuanced answer and explain why a short response wouldn't be appropriate.</p>
</blockquote>
</li>
<li>Claude 4.6 had a section specifically clarifying that "Donald Trump is the current president of the United States and was inaugurated on January 20, 2025", because without that the model's knowledge cut-off date combined with its previous knowledge that Trump falsely claimed to win the 2020 election meant it would deny he was the president. That language is gone for 4.7, reflecting the model's new reliable knowledge cut-off date of January 2026.</li>
</ul>
<h4 id="and-the-tool-descriptions-too">And the tool descriptions too</h4>
<p>The system prompts published by Anthropic are sadly not the entire story - their published information doesn't include the tool descriptions that are provided to the model, which is arguably an even more important piece of documentation if you want to take full advantage of what the Claude chat UI can do for you.</p>
<p>Thanfully you can <a href="https://claude.ai/share/dc1e375e-2213-4afb-ac1b-812d42735a8e">ask Claude directly</a> - I used the prompt:</p>
<blockquote>
<p>List all tools you have available to you with an exact copy of the tool description and parameters</p>
</blockquote>
<p>My <a href="https://claude.ai/share/dc1e375e-2213-4afb-ac1b-812d42735a8e">shared transcript</a> has full details, but the list of named tools is as follows:</p>
<ul>
<li><code>ask_user_input_v0</code></li>
<li><code>bash_tool</code></li>
<li><code>conversation_search</code></li>
<li><code>create_file</code></li>
<li><code>fetch_sports_data</code></li>
<li><code>image_search</code></li>
<li><code>message_compose_v1</code></li>
<li><code>places_map_display_v0</code></li>
<li><code>places_search</code></li>
<li><code>present_files</code></li>
<li><code>recent_chats</code></li>
<li><code>recipe_display_v0</code></li>
<li><code>recommend_claude_apps</code></li>
<li><code>search_mcp_registry</code></li>
<li><code>str_replace</code></li>
<li><code>suggest_connectors</code></li>
<li><code>view</code></li>
<li><code>weather_fetch</code></li>
<li><code>web_fetch</code></li>
<li><code>web_search</code></li>
<li><code>tool_search</code></li>
<li><code>visualize:read_me</code></li>
<li><code>visualize:show_widget</code></li>
</ul>
<p>I don't believe this list has changed since Opus 4.6.</p> |
| entry |
9245 |
2026-04-17 23:59:03+00:00 |
Join us at PyCon US 2026 in Long Beach - we have new AI and security tracks this year |
<p>This year's <a href="https://us.pycon.org/2026/">PyCon US</a> is coming up next month from May 13th to May 19th, with the core conference talks from Friday 15th to Sunday 17th and tutorial and sprint days either side. It's in Long Beach, California this year, the first time PyCon US has come to the West Coast since Portland, Oregon in 2017 and the first time in California since Santa Clara in 2013.</p>
<p>If you're based in California this is a great opportunity to catch up with the Python community, meet a whole lot of interesting people and learn a ton of interesting things.</p>
<p>In addition to regular PyCon programming we have two new dedicated tracks at the conference this year: an <a href="https://us.pycon.org/2026/tracks/ai/">AI track</a> on Friday and a <a href="https://us.pycon.org/2026/tracks/security/">Security track</a> on Saturday.</p>
<p>The AI program was put together by track chairs Silona Bonewald (CitableAI) and Zac Hatfield-Dodds (Anthropic). I'll be an in-the-room chair this year, introducing speakers and helping everything run as smoothly as possible.</p>
<p>Here's <a href="https://us.pycon.org/2026/schedule/talks/#May15">the AI track schedule</a> in full:</p>
<ul>
<li>11:00: <a href="https://us.pycon.org/2026/schedule/presentation/105/">AI-Assisted Contributions and Maintainer Load</a> - Paolo Melchiorre</li>
<li>11:45: <a href="https://us.pycon.org/2026/schedule/presentation/66/">AI-Powered Python Education : Towards Adaptive and Inclusive Learning</a> - Sonny Mupfuni</li>
<li>12:30: <a href="https://us.pycon.org/2026/schedule/presentation/23/">Making African Languages Visible: A Python-Based Guide to Low-Resource Language ID</a> - Gift Ojeabulu</li>
<li>2:00: <a href="https://us.pycon.org/2026/schedule/presentation/138/">Running Large Language Models on Laptops: Practical Quantization Techniques in Python</a> - Aayush Kumar JVS</li>
<li>2:45: <a href="https://us.pycon.org/2026/schedule/presentation/126/">Distributing AI with Python in the Browser: Edge Inference and Flexibility Without Infrastructure</a> - Fabio Pliger</li>
<li>3:30: <a href="https://us.pycon.org/2026/schedule/presentation/110/">Don't Block the Loop: Python Async Patterns for AI Agents</a> - Aditya Mehra</li>
<li>4:30: <a href="https://us.pycon.org/2026/schedule/presentation/81/">What Python Developers Need to Know About Hardware: A Practical Guide to GPU Memory, Kernel Scheduling, and Execution Models</a> - Santosh Appachu Devanira Poovaiah</li>
<li>5:15: <a href="https://us.pycon.org/2026/schedule/presentation/101/">How to Build Your First Real-Time Voice Agent in Python (Without Losing Your Mind)</a> - Camila Hinojosa Añez, Elizabeth Fuentes</li>
</ul>
<p>(And here's <a href="https://gisthost.github.io/?dab27f61d85eb98f60db5991aa21ec89">how I scraped that as a Markdown list</a> from the schedule page using Claude Code and <a href="https://github.com/simonw/rodney">Rodney</a>.)</p>
<h4 id="you-should-come-to-pycon-">You should come to PyCon US!</h4>
<p>I've been going to PyCon for over twenty years now - I first went <a href="https://simonwillison.net/2005/Mar/28/pycon/">back in 2005</a>. It's one of my all-time favourite conference series. Even as it's grown to more than 2,000 attendees PyCon US has remained a heavily community-focused conference - it's the least <em>corporate</em> feeling large event I've ever attended.</p>
<p>The talks are always great, but it's the add-ons around the talks that really make it work for me. The <a href="https://us.pycon.org/2026/events/lightning-talks/">lightning talks</a> slots are some of the most heavily attended sessions. The PyLadies auction is always deeply entertaining. The sprints are an incredible opportunity to contribute directly to projects that you use, coached by their maintainers.</p>
<p>In addition to scheduled talks, the event has <strong>open spaces</strong>, where anyone can reserve space for a conversation about a topic - effectively PyCon's version of an <a href="https://en.wikipedia.org/wiki/Unconference">unconference</a>. I plan to spend a lot of my time in the open spaces this year - I'm hoping to join or instigate sessions about both <a href="https://datasette.io/">Datasette</a> and <a href="https://simonwillison.net/guides/agentic-engineering-patterns/">agentic engineering</a>.</p>
<p>I'm on the board of the Python Software Foundation, and PyCon US remains one of our most important responsibilities - in the past it's been a key source of funding for the organization, but it's also core to our mission to "promote, protect, and advance the Python programming language, and to support and facilitate the growth of a diverse and international community of Python programmers".</p>
<p><small>If you do come to Long Beach, we'd really appreciate it if you could book accommodation in the official hotel block, for reasons <a href="https://pyfound.blogspot.com/2026/04/pycon-us-2026-hotels.html">outlined in this post on the PSF blog</a>.</small></p> |
| entry |
9244 |
2026-04-16 17:16:52+00:00 |
Qwen3.6-35B-A3B on my laptop drew me a better pelican than Claude Opus 4.7 |
<p>For anyone who has been (inadvisably) taking my <a href="https://simonwillison.net/tags/pelican-riding-a-bicycle/">pelican riding a bicycle benchmark</a> seriously as a robust way to test models, here are pelicans from this morning's two big model releases - <a href="https://qwen.ai/blog?id=qwen3.6-35b-a3b">Qwen3.6-35B-A3B from Alibaba</a> and <a href="https://www.anthropic.com/news/claude-opus-4-7">Claude Opus 4.7 from Anthropic</a>.</p>
<p>Here's the Qwen 3.6 pelican, generated using <a href="https://huggingface.co/unsloth/Qwen3.6-35B-A3B-GGUF/blob/main/Qwen3.6-35B-A3B-UD-Q4_K_S.gguf">this 20.9GB Qwen3.6-35B-A3B-UD-Q4_K_S.gguf</a> quantized model by Unsloth, running on my MacBook Pro M5 via <a href="https://lmstudio.ai/">LM Studio</a> (and the <a href="https://github.com/agustif/llm-lmstudio">llm-lmstudio</a> plugin) - <a href="https://gist.github.com/simonw/4389d355d8e162bc6e4547da214f7dd2">transcript here</a>:</p>
<p><img src="https://static.simonwillison.net/static/2026/Qwen3.6-35B-A3B-UD-Q4_K_S-pelican.png" alt="The bicycle frame is the correct shape. There are clouds in the sky. The pelican has a dorky looking pouch. A caption on the ground reads Pelican on a Bicycle!" style="max-width: 100%;" /></p>
<p>And here's one I got from Anthropic's <a href="https://www.anthropic.com/news/claude-opus-4-7">brand new Claude Opus 4.7</a> (<a href="https://gist.github.com/simonw/afcb19addf3f38eb1996e1ebe749c118">transcript</a>):</p>
<p><img src="https://static.simonwillison.net/static/2026/opus-4.7-pelican.png" alt="The bicycle frame is entirely the wrong shape. No clouds, a yellow sun. The pelican is looking behind itself, and has a less pronounced pouch than I would like." style="max-width: 100%;" /></p>
<p>I'm giving this one to Qwen 3.6. Opus managed to mess up the bicycle frame!</p>
<p>I tried Opus a second time passing <code>thinking_level: max</code>. It didn't do much better (<a href="https://gist.github.com/simonw/7566e04a81accfb9affda83451c0f363">transcript</a>):</p>
<p><img src="https://static.simonwillison.net/static/2026/opus-4.7-pelican-max.png" alt="The bicycle frame is entirely the wrong shape but in a different way. Lines are more bold. Pelican looks a bit more like a pelican." style="max-width: 100%;" /></p>
<h4 id="i-dont-think-qwen-are-cheating">I don't think Qwen are cheating</h4>
<p>A lot of people are <a href="https://simonwillison.net/2025/Nov/13/training-for-pelicans-riding-bicycles/">convinced that the labs train for my stupid benchmark</a>. I don't think they do, but honestly this result did give me a little glint of suspicion. So I'm burning one of my secret backup tests - here's what I got from Qwen3.6-35B-A3B and Opus 4.7 for "Generate an SVG of a flamingo riding a unicycle":</p>
<div style="display: flex; gap: 4px;">
<figure style="flex: 1; text-align: center; margin: 0;">
<figcaption style="margin-bottom: 1em">Qwen3.6-35B-A3B<br />(<a href="https://gist.github.com/simonw/f1d1ff01c34dda5fdedf684cfc430d92">transcript</a>)</figcaption>
<img src="https://static.simonwillison.net/static/2026/qwen-flamingo.png" alt="The unicycle spokes are a too long. The pelican has sunglasses, a bowtie and appears to be smoking a cigarette. It has two heart emoji surrounding the caption Flamingo on a Unicycle. It has a lot of charisma." style="max-width: 100%; height: auto;" />
</figure>
<figure style="flex: 1; text-align: center; margin: 0;">
<figcaption style="margin-bottom: 1em">Opus 4.7<br />(<a href="https://gist.github.com/simonw/35121ad5dcf23bf860397a103ae88d50">transcript</a>)</figcaption>
<img src="https://static.simonwillison.net/static/2026/opus-flamingo.png" alt="The unicycle has a black wheel. The flamingo is a competent if slightly dull vector illustration of a flamingo. It has no flair." style="max-width: 100%; height: auto;" />
</figure>
</div>
<p>I'm giving this one to Qwen too, partly for the excellent <code><!-- Sunglasses on flamingo! --></code> SVG comment.</p>
<h4 id="what-can-we-learn-from-this-">What can we learn from this?</h4>
<p>The pelican benchmark has always been meant as a joke - it's mainly a statement on how obtuse and absurd the task of comparing these models is.</p>
<p>The weird thing about that joke is that, for the most part, there has been a direct correlation between the quality of the pelicans produced and the general usefulness of the models. Those <a href="https://simonwillison.net/2024/Oct/25/pelicans-on-a-bicycle/">first pelicans from October 2024</a> were junk. The <a href="https://simonwillison.net/tags/pelican-riding-a-bicycle/">more recent entries</a> have generally been much, much better - to the point that Gemini 3.1 Pro produces <a href="https://simonwillison.net/2026/Feb/19/gemini-31-pro/">illustrations you could actually use somewhere</a>, provided you had a pressing need to illustrate a pelican riding a bicycle.</p>
<p>Today, even that loose connection to utility has been broken. I have enormous respect for Qwen, but I very much doubt that a 21GB quantized version of their latest model is more powerful or useful than Anthropic's latest proprietary release.</p>
<p>If the thing you need is an SVG illustration of a pelican riding a bicycle though, right now Qwen3.6-35B-A3B running on a laptop is a better bet than Opus 4.7!</p> |
| quotation |
2140 |
2026-04-15 17:13:57+00:00 |
The real goldmine isn’t that Apple gets a cut of every App Store transaction. It’s that Apple’s platforms have the best apps, and users who are drawn to the best apps are thus drawn to the iPhone, Mac, and iPad. That edge is waning. Not because software on other platforms is getting better, but because third-party software on iPhone, Mac, and iPad is regressing to the mean, *to some extent*, because fewer developers feel motivated — artistically, financially, or both — to create well-crafted idiomatic native apps exclusively for Apple’s platforms. - John Gruber |
|