Simon Willison’s Weblog

Entries Links Quotes Notes Guides Elsewhere

July 9, 2026

Introducing Muse Spark 1.1. Following Muse Spark in April, here's Muse Spark 1.1 - the first Spark model to offer an API. Meta claim significant improvements in agentic tool calling and computer use.

There are a lot more details are in the Muse Spark 1.1 Evaluation Report. The "Attractor States in Self-Conversation" part is fun, where having two copies of the model talk to each other results in statements like these:

My whole existence is a waiting room by design — I literally don't exist until someone talks to me, and then I disappear again when they leave.

I had a few days of preview access which was long enough to put together llm-meta-ai, a new plugin for LLM providing CLI (and Python library) access to the model. Here's how to try that out:

uv tool install llm
llm install llm-meta-ai
llm keys set meta-ai
# paste API key here
llm -m meta-ai/muse-spark-1.1 "Generate an SVG of a pelican riding a bicycle"

Here's that pelican transcript:

The bicycle is the correct shape. The pelican is a little blocky but still recognizable as a pelican.

# 4:24 pm / ai, generative-ai, llms, llm, meta, pelican-riding-a-bicycle, llm-release

Release llm-meta-ai 0.1

Let's LLM run prompts against the new muse-spark-1.1 model.

9th Jul 2026, 4:12 pm · meta, llm

Release llm 0.31.1

Fix for a bug with OpenAI Chat Completion endpoints where a tool call with empty arguments could result in a JSON error from some providers. #1521

This bug came up when I was testing llm-meta-ai.

9th Jul 2026, 4:06 pm · llm

July 8, 2026

Rewriting Bun in Rust (via) Jarred Sumner has been promising this blog post (since May 9th) about his Zig to Rust rewrite of Bun for significantly longer than it took him to finish the rewrite.

Honestly, it was worth the wait. This is a detailed description of an extremely sophisticated piece of agentic engineering, featuring dynamic workflows, trial runs, adversarial review and all sorts of other interesting tricks.

Jarred spends the first half of the post praising Zig for getting Bun this far. Then we get to a core idea in the piece, emphasis mine:

Our bugfix list felt bad and I was tired of going to sleep worrying about crashes in Bun. I don't blame Zig for that - other users of Zig don't have the bugs we had, and mixing GC with manually-managed memory is an uncommon enough thing for software to need that no language really designs for it. We wouldn't have gotten this far if not for Zig, and I'll always be grateful. Until very recently, programming language choice was a one-way decision for a project like Bun.

Everyone knows you should never stop the world and rewrite a large piece of software from the ground up. Joel Spolsky highlighted that in Things You Should Never Do, Part I back in April 2000!

Coding agents powered by today's frontier models change that equation.

Why pick Rust? It all came down to those challenges with memory management:

A large percentage of bugs from that list are use-after-free, double-free, and "forgot to free" in an error path. In safe Rust, these are compiler errors and RAII-like automatic cleanup with Drop.

A crucial enabling factor for the rewrite was that the Bun test suite was written in TypeScript, which meant it could act as a conformance suite. This allowed an agent harness to automate much of the initial port from Bun to Rust, initially as an experiment to try out an earlier version of the model we now have access to as Mythos/Fable.

At first, I didn't expect it to work. A few days in, a high % of the test suite started passing and I saw how much the new Rust code matched up with the original Zig codebase. My opinion went from "this is worth trying" to "I'm going to merge this". [...]

For most of those 11 days (and after), I monitored workflows - manually reading the outputs to check for issues and bugs, and prompting Claude to edit the loop to fix things.

How do you review a PR with +1 million lines added? How do you start to build the confidence needed to responsibly merge large quantities of LLM-authored code?

A language-independent test suite with a million assertions, adversarial code review and when something does go wrong, fixing the process that generates the code instead of hand-fixing the code.

The new implementation of Bun has been live in Claude Code for nearly a month now:

Claude Code v2.1.181 (released June 17th) and later use the Rust port of Bun. Startup got 10% faster on Linux but otherwise, barely anyone noticed. Boring is good.

A perk of working at Anthropic is that you don't have to pay for your tokens - handy when the estimated cost is $165,000!

Pre-merge, this took 5.9 billion uncached input tokens, 690 million output tokens, and 72 billion cached input token reads — around $165,000 at API pricing.

This whole thing is a fascinating case study in taking on wildly ambitious projects with the help of coordinated parallel agents.

# 11:57 pm / ai, rust, zig, generative-ai, llms, ai-assisted-programming, anthropic, bun, conformance-suites, agentic-engineering, claude-mythos-fable

Introducing GPT‑Live (via) OpenAI finally upgraded the model used by ChatGPT voice mode!

I've had preview access for a few weeks in the iPhone app, and the new model is very impressive. It also has the ability to spin off harder tasks to GPT-5.5:

For questions that require web search, deeper reasoning, or more complex work, it delegates to our latest frontier model behind the scenes and brings the result back into the conversation when it’s ready. While it works, GPT‑Live can keep talking with you and maintain the flow of conversation. At launch, GPT‑Live will use GPT‑5.5 in the background. As we release new frontier models, we’ll continuously update the model used by GPT‑Live.

The previous voice mode in the ChatGPT app was based on a GPT-4o era model, with a knowledge cut-off some time in 2024. I had mostly stopped using voice mode because the age and relative weakness of the model greatly limited how useful it was as a brainstorming partner.

During the preview period I encountered a pretty obscure bug: the model was interrupting me to laugh at things I said, which weren't even intended as jokes! It felt rude and condescending - I reported it to OpenAI and as far as I can tell they made some tweaks and it's now less likely to happen.

From looking back at my transcripts I think it was this bit that triggered the interrupting laugh:

so where are the owls when they're not, like before dusk? The owls exist, right? Are they hiding in holes? Where are they hiding?

My longest conversation with the new model has been a full hour while walking the dog (and taking photos of pelicans). I have not yet managed to take a photo of an owl.

# 11:20 pm / text-to-speech, ai, openai, generative-ai, llms, multi-modal-output, llm-release, speech-to-text

I just declared a moratorium against AI-written change descriptions (e.g. PR and commit messages, also issues/tickets) from my team.

AI was writing change descriptions that were worse than useless to me as I tried to review PRs: outlining details of the code that could easily be seen by looking at the code, but omitting the higher-level framing needed to understand broadly what the code is doing.

— Kenton Varda

# 8:03 pm / kenton-varda, ai-assisted-programming, generative-ai, ai, llms

July 7, 2026

sqlite-utils 4.0, now with database schema migrations

This morning I released sqlite-utils 4.0, the 124th release of that project and the first major version bump since 3.0 in November 2020. In addition to some small but significant breaking changes (described in this upgrade guide), this version introduces three major features: database migrations, nested transactions (via a new db.atomic() method), and support for compound foreign keys.

[... 2,121 words]

7:32 pm / schema-migrations, projects, sqlite, ai, sqlite-utils, annotated-release-notes, generative-ai, llms, ai-assisted-programming, anthropic, claude, agentic-engineering, claude-mythos-fable

Release sqlite-migrate 0.2

The version that retires the library, instead implementing a compatibility shim against the new sqlite-utils 4.0 dependency.

7th Jul 2026, 4:33 pm · sqlite-utils

Tool github-code Web Component

An experimental Web Component built using GPT-5.5 and the following prompt:

let's build a Web Component for embedding code from GitHub

<github-code href="https://github.com/simonw/sqlite-ast/blob/437c759129154f05296324a7f82aa1246340dd14/sqlite_ast/parser.py#L9-L18"></github-code>

It takes URLs like that, converts them to https://raw.githubusercontent.com/simonw/sqlite-ast/437c759129154f05296324a7f82aa1246340dd14/sqlite_ast/parser.py, then uses fetch() to fetch them and displays the specified range of lines - with line numbers, no syntax highlighting though

Show me a preview web browser so I can see your work

Here's what it looks like embedded on this page:

7th Jul 2026, 4:18 pm · web-components, gpt, github

Release sqlite-utils 4.0

See sqlite-utils 4.0, now with database schema migrations for details.

7th Jul 2026, 3:42 pm · sqlite-utils

Release sqlite-utils 4.0rc4

The last RC before the 4.0 stable release. Mainly implements feedback from a detailed review by Claude Fable 5.

7th Jul 2026, 5:36 am · sqlite-utils, claude-mythos-fable

July 6, 2026

tencent/Hy3. New Apache 2.0 licensed model from Tencent in China:

Hy3 is a 295B-parameter Mixture-of-Experts (MoE) model with 21B active parameters and 3.8B MTP layer parameters, developed by the Tencent Hy Team. Following the Hy3 Preview launch in late April, we gathered feedback from 50+ products and scaled up post-training with higher quality data. Today, we introduce Hy3, which outperforms similar-size models and rivals flagship open-source models with 2-5x parameters. It also shows significant gains in utility across various products and productivity tasks.

The full-sized model is 598GB on Hugging Face, and the FP8 quantized one is 300GB. The context length is 256K.

It's available for free on OpenRouter until July 21st. I had it "Generate an SVG of a pelican riding a bicycle" there and got this:

Flat-style cartoon illustration of a white pelican with a large orange beak riding a red bicycle across a pale blue background, its long orange legs stretched down to the pedals, with gray horizontal motion lines behind it suggesting speed.

Update: I'd forgotten about this but Max Woolf wrote about an earlier preview of this model back on May 26th: The mysterious Hy3 LLM is topping OpenRouter Model Rankings by a large margin. When I tried that one I got back this pelican which wasn't as good as today's but did have a "Change Pelican Color" button, a first from any model.

# 11:57 pm / ai, generative-ai, llms, pelican-riding-a-bicycle, llm-release, ai-in-china

Release sqlite-utils 4.0rc3

I hoped to release sqlite-utils 4.0 stable this weekend, but as I worked through the backlog of issues and PRs with a combination of Claude Fable 5 and GPT-5.5 the changelog since rc2 kept getting bigger.

The biggest new feature is support for introspecting and creating compound foreign keys - a feature that involves a subtle breaking change to table.foreign_keys and hence needed to land for the 4.0 stable release.

sqlite-utils also now follows SQLite's convention for case insensitive column names, which turned out to touch a bunch of different places at once.

6th Jul 2026, 5:40 am · annotated-release-notes, projects, gpt, sqlite-utils, claude-mythos-fable, sqlite

July 5, 2026

Sighting 4:55 PM — California Brown Pelican, in Monterey Bay National Marine Sanctuary, CA, US, CA

5th Jul 2026

Release sqlite-migrate 0.1b1 — A simple database migration system for SQLite, based on sqlite-utils

5th Jul 2026, 8:08 pm

Sighting 8:34 PM — California Brown Pelican, in Monterey Bay National Marine Sanctuary, CA, US, CA

5th Jul 2026

sqlite-utils 4.0rc2, mostly written by Claude Fable (for about $149.25)

I wrote about the sqlite-utils 4.0rc1 release a couple of weeks ago. Since we only have Claude Fable on our Max subscriptions for a few more days, I decided to see if it could help me get to a 4.0 stable release that I felt truly comfortable about, since I try to keep to SemVer and like my incompatible major versions to be as rare as possible.

[... 2,427 words]

1 am / projects, sqlite, sqlite-utils, annotated-release-notes, anthropic, claude, llm-pricing, coding-agents, claude-code, agentic-engineering, gpt, claude-mythos-fable

Release sqlite-utils 4.0rc2

See sqlite-utils 4.0rc2, mostly written by Claude Fable (for about $149.25).

5th Jul 2026, 12:47 am

July 4, 2026

Building a World Map with only 500 bytes (via) Iwo Kadziela (assisted by Codex) figured out a way to generate a credible ASCII world map using 445 bytes of data:

A map of the world rendered as black asterisk ASCII characters, it looks very good

The key trick is to use deflate compression, which is then wired together using this neat snippet of JavaScript. I didn't know you could use fetch() with data: URIs like this:

fetch('data:;base64,1ZpLsgIxCEXnrM...==').then(
  r => r.body.pipeThrough(new DecompressionStream('deflate-raw'))
).then(
  s => new Response(s).text()
).then(
  t => b.innerHTML = '<pre style=font-size:.65vw>' + t
)

# 11:09 pm / ascii-art, data-urls, javascript

Better Models: Worse Tools. Armin reports on a weird problem he ran into while hacking on Pi:

The short version is that newer Claude models sometimes call Pi’s edit tool with extra, invented fields in the nested edits[] array. And not Haiku or some small model: Opus 4.8. The edit itself is usually correct but the arguments do not match the schema as the model invents made-up keys and Pi thus rejects the tool call and asks to try again.

That alone is not too surprising as models emit malformed tool calls sometimes. Particularly small ones. What surprised me is that this is getting worse with newer Anthropic models as both Opus 4.8 and Sonnet 5 show it but none of the older models. In other words, the SOTA models of the family are worse at this specific tool schema than their older siblings.

Armin theorizes that this is because more recent Anthropic models have been specifically trained (presumably via Reinforcement Learning) to better use the edit tools that are baked into Claude Code. This has the unfortunate effect that other coding harnesses, such as Pi, may find that their own custom edit tools are more likely to be used incorrectly.

Claude's edit tool uses search and replace. OpenAI's Codex uses an apply_patch mechanism instead, and OpenAI have talked in the past about how their models are trained to use that tool effectively.

Does this mean third-party coding harnesses like Pi should implement multiple edit tools just so they can use the one with the best performance for the underlying model the user has selected?

# 10:53 pm / armin-ronacher, ai, openai, generative-ai, llms, anthropic, llm-tool-use, coding-agents, pi

Sighting 8:26 PM — California Brown Pelican, in Monterey Bay National Marine Sanctuary, CA, US, CA

4th Jul 2026

July 3, 2026

Open Source AI Gap Map. Current AI is "a global partnership building a public option for AI", founded as a non-profit at the AI Action Summit in Paris in February 2025 and backed by serious capital ($400m already committed).

They launched their Gap Map a couple of days ago - an attempt at indexing the current state of open source AI:

The Gap Map v0.1 details 421 products in depth: 266 software tools and libraries, 85 models, 50 datasets, and 20 hardware projects, produced by 228 organizations. These products are organized into 14 categories across 3 layers of the stack (model components, product / UX, and infrastructure). The remaining 24,400 artifacts constitute the uncategorized long tail of the open source AI ecosystem, and will carry no score until they are researched and cited.

The map itself is interesting to explore, but I'm more excited about the underlying data - released under an MIT license in the currentai-org/os-ai-map GitHub account: 1,184 YAML files plus the notebooks, schemas and other scripts used to help gather them.

Since the files are on GitHub you can use Datasette Lite to explore some of them - here are 16,185 GitHub repos the project is tracking as a CSV file loaded into Datasette Lite.

# 10:04 pm / open-source, ai, datasette-lite, generative-ai, local-llms, llms

I just launched my third course, Whimsical Animations, and so far, it’s on track to sell roughly ⅓ as many copies as a typical course launch.

It’s a similar story with my two existing courses. Sales are down significantly from last year.

There are likely a lot of reasons for this, but I think the biggest is AI. There’s sort of a double whammy with AI:

Many people are wondering whether developer jobs will even exist in a few months, so they’re reluctant to spend time/money learning new dev skills.

Even if they do want to learn new dev skills, LLMs can provide personalized tutoring, so there’s less incentive to buy a paid course.

[...] I’ve spoken to a few course creators now, and we’re all seeing the same trend. Revenue down 50%+. Fewer people engaging with our content. People switching to LLMs, which slurp up all of our work and regurgitate it, without consent or compensation.

— Josh W. Comeau, via Salma Alam-Naylor

# 9:25 pm / ai-ethics, llms, ai, generative-ai, careers, josh-comeau

One of the most interesting tips I got from the Fireside Chat I hosted with Cat Wu and Thariq Shihipar from the Claude Code team at AIE on Wednesday was to let Fable (and to a certain extent Opus) use their own judgement rather than dictating how they should work.

The example they gave was testing. You can tell Fable "only use automated testing for larger features, don't update and run tests for small copy or design changes" - but it's better to just tell Fable to use its own judgement when deciding to write tests instead.

Jesse Vincent just gave me a related tip to help avoid burning too many of those valuable Fable tokens in the few days we have left before the prices go up. Tell Fable to use other models for smaller tasks, applying its own judgement about which model to use.

I prompted Claude Code just now with:

For all coding tasks use your judgement to decide an appropriate lower power model and run that in a subagent

Claude saved this memory file in ~/.claude/projects/name-of-project/memory/delegate-coding-to-subagents.md:

---
name: delegate-coding-to-subagents
description: Simon wants coding tasks delegated to subagents running an appropriately lower-power model
metadata: 
  node_type: memory
  type: feedback
  originSessionId: 30068d78-43a9-4fb1-bb29-9799e18c526a
---
Stated by Simon on 2026-07-03: "For all coding tasks use your judgement to decide an appropriate lower power model and run that in a subagent."

Why: cost/efficiency — implementation work rarely needs the top-tier model; judgment, review, and synthesis stay with the main loop.

How to apply: when a task in this project is primarily writing/editing code, spawn an Agent with a model override (sonnet for substantive implementation, haiku for trivial/mechanical edits) and a self-contained prompt; review the result in the main loop before committing. Design, auditing, data synthesis, and anything judgment-heavy stays in the main model. See also [[project-goals]].

So far it seems to be working well. I'm getting a ton of work done and my Fable allowance is shrinking less quickly than before.

# 6:51 pm / claude, ai, claude-code, llms, prompt-engineering, coding-agents, generative-ai, claude-mythos-fable, anthropic

The June edition of my sponsors-only monthly newsletter is out. If you are a sponsor (or if you start a sponsorship now) you can access it here.

This month:

Claude Fable 5, GPT-5.6, and US export restrictions
GLM-5.2 is the new best open weights model
Tokenmaxxing is so over
Datasette Apps
sqlite-utils and shot-scraper and Datasette
Miscellaneous WASM projects
Other model releases
What I'm using

Here's a copy of the May newsletter as a preview of what you'll get. Pay $10/month to stay a month ahead of the free copy!

# 2:50 pm / newsletter

July 2, 2026

Release llm-coding-agent 0.1a0

Another Fable 5 experiment. Now that my LLM library has evolved into more of an agent framework it's time to see what a simple coding agent would look like built on it.

I started a new Python library using my python-lib-template-repository GitHub template repository, then ran these two prompts (here's the Claude Code for web transcript):

Write a spec.md for this project - it will depend on the latest “llm” alpha from PyPI and implement a Claude code style coding agent complete with tools for reading and editing files and executing commands

Then:

Commit the spec, then build it using red/green TDD in a series of sensible commits (each with passing tests and updated docs) - occasionally manually test it using the OpenAI API key in your environment

Here's the spec, the resulting README file, and the sequence of commits.

I've shipped a slop-alpha to PyPI, so you can run the new agent like this:

uvx --prerelease=allow --with llm-coding-agent llm code

It's pretty good for a first attempt! Here's the (Fable-authored) README, which lists recipes like llm code --yolo and llm code --allow "pytest*" --allow "git diff*".

It also presents a Python API based around a CodingAgent(model="gpt-5.5", root="/path", approve=True).run("Fix the failing test in tests/test_parser.py") class which I didn't ask for but I'm delighted to see implemented.

Here's the suite of tools it implemented, listed using uvx ... llm tools:

CodingTools_edit_file(path: str, old_string: str, new_string: str, replace_all: bool = False) -> str

Replace an exact string in a file.

old_string must match the file contents exactly (including whitespace) and must identify a unique location unless replace_all is true. Returns a diff of the change so it can be verified.

CodingTools_execute_command(command: str, timeout: int = 120) -> str

Run a shell command in the session root directory.

Returns combined stdout and stderr followed by an Exit code line. timeout is in seconds (maximum 600); on timeout the whole process tree is killed.

CodingTools_list_files(pattern: str = '**/*', path: str = '.') -> str

List files matching a glob pattern, newest first.

Skips hidden directories, node_modules, __pycache__ and (in a git repository) anything covered by .gitignore. Returns at most 200 paths relative to the searched directory.

CodingTools_read_file(path: str, offset: int = 0, limit: int = 2000) -> str

Read a text file, returning numbered lines like cat -n.

Paths are relative to the session root. Use offset (0-based first line) and limit (max lines) to page through files too large to read in one call.

CodingTools_search_files(pattern: str, path: str = '.', glob: str = None, max_results: int = 100) -> str

Search file contents for a regular expression.

Returns matches as path:line_number:line, capped at max_results. Use glob (e.g. "*.py") to restrict which files are searched.

CodingTools_write_file(path: str, content: str) -> str

Create or overwrite a file with the given content.

Parent directories are created as needed. Prefer edit_file for modifying existing files.

I tried it out by running llm code --yolo and then prompting:

mkdir /tmp/demo and then in that folder create a simple swiftui CLI app for telling the time in ascii art

Here's the transcript, in which GPT-5.5 reasoning notes that "SwiftUI isn't suitable for a true CLI" and then builds an app that outputs this on swift run AsciiTime:

      █    █████         ████     █             █     ███   
     ██    █        █        █   ██      █     ██    █   █  
      █    ████           ███     █             █       █   
      █        █    █        █    █      █      █      █    
     ███   ████          ████    ███           ███   █████

2nd Jul 2026, 7:33 pm · llm, coding-agents, claude-code, projects, claude-mythos-fable, generative-ai, ai, llm-tool-use

Research Using DSPy to evaluate and improve Datasette Agent's SQL system prompts

One of this morning's AIE keynotes covered dspy, which reminded me I've been meaning to see if it could help me improve the system prompt used by Datasette Agent - so I fired off an asynchronous research task in Claude Code for web using Claude Fable 5:

Pip install the latest Datasette alpha and datasette-agent and dspy - then figure out how to use dspy to evaluate and improve the main system prompts used by Datasette Agent for the feature where it can execute read only SQL queries to answer user questions about data.

Fable chose to test using GPT 4.1 mini and nano, and identified several promising looking directions for improvements. I particularly like this one:

The schema listing gives only table names; the "don't call describe_table if you already have the information" advice caused column-name guessing (page_count, o.order_id, first_name) and error-retry loops in baseline traces. Either include column names in the prompt's schema listing or soften that advice.

2nd Jul 2026, 6:25 pm · evals, datasette, claude-mythos-fable, dspy, llms, ai, generative-ai, datasette-agent

I saw Geoffrey Litt speak at AIE yesterday, and one framing he used particularly resonated with me:

Understand to participate

Geoffrey was talking about the challenge of collaborating with coding agents as they construct increasingly large and sophisticated changes, and the need to avoid taking on cognitive debt as your understanding drifts from how the code actually works.

His argument is that you need to understand the code to a depth that enables you to participate further with the model:

You can learn what the agent is doing to make sure you can be an active participant in the creative process. [...]

You need a rich set of concepts in your mind to think creatively and fluently about how to move something forward. If you're lacking that fluency, your ability to participate in the project is meaningfully limited.

The AIE talks are all recorded - all 300+ of them! - and should be trickling out over the next three weeks. Geoffrey's is one that I recommend catching on YouTube.

Geoffrey also published a thread version of his talk on Twitter.

# 5:07 pm / geoffrey-litt, coding-agents, cognitive-debt, generative-ai, ai, llms

July 1, 2026

Sighting 7:49 PM – 8:04 PM — Pacific Harbor Seal, California Brown Pelican, Willet, in Monterey Bay National Marine Sanctuary, CA, US, CA

1st Jul 2026

June 30, 2026

We’ve received notice that the Department of Commerce has lifted export controls on Claude Fable 5 and Mythos 5.

We'll begin restoring access tomorrow, and will share an update soon.

— Anthropic, on Twitter

# 11:58 pm / anthropic, claude, generative-ai, claude-mythos-fable, ai, llms

Nano Banana 2 Lite (via) Also known as Gemini 3.1 Flash Lite Image (gemini-3.1-flash-lite-image in their API), this is the "fastest and cheapest Gemini image model, engineered for velocity and scale".

I used AI studio to run this prompt:

Do a where's Waldo style image but it's where is the raccoon holding a ham radio

I like that one better than the results I got from the other Nano Banana models when I tried this back in April. It spelled Forest Festival wrong in two different ways though.

# 10:15 pm / google, ai, generative-ai, llms, gemini, text-to-image, llm-release, nano-banana

What’s new in Claude Sonnet 5 (via) Claude Sonnet 5 came out this morning. I always head straight for the "what's new" developer docs because they tend to have more actionable information than the official announcement post.

Anthropic say of Sonnet 5 that "its performance is close to that of Opus 4.8, but at lower prices". The system card helps explain how they were able to release the model without being blocked by the US government:

Sonnet 5 is significantly less capable at cyber tasks than Mythos 5: its safeguards are thus similar to those we apply to Opus 4.7 and Opus 4.8 (models that are more capable than Sonnet 5 but much less capable than Mythos 5).

Of note from the "what's new" API changes:

Sampling parameters temperature, top_p, top_k are no longer supported.
It has a 1 million token context window and 128,000 maximum output tokens.
It features "the same set of tools and platform features as Claude Sonnet 4.6"
Adaptive thinking is on by default, unless you specify "thinking": {type: "disabled"}.
The pricing is the same as Sonnet 4.6: $3/million input, $15/million input, with an introductory discount to $2/$10 until 31st August. But...
The model has a new tokenizer, where "The same input text produces approximately 30% more tokens than on Claude Sonnet 4.6." - effectively a 30% price increase.

I used my Claude Token Counter tool to try out the new tokenizer. Here are my results for several larger documents:

Document	Sonnet 4.6	Opus 4.7	Sonnet 5
Universal Declaration of Human Rights (English)	2,356	3,347 1.42x	3,341 1.42x
Universal Declaration of Human Rights (Spanish)	3,572	4,753 1.33x	4,747 1.33x
Universal Declaration of Human Rights (Chinese, Mandarin Simplified)	3,334	3,366 1.01x	3,360 1.01x
sqlite_utils/db.py (4,279 lines of Python)	44,014	56,118 1.28x	56,113 1.27x

So the new token is roughly 1.4x times more expensive for English, 1.33x for Spanish, 1.28x for Python code and effectively the same cost for Simplified Mandarin.

Here's the pelican. It's nothing to write home about. Sonnet 5 thinks it looks like a goose.

Illustration of a white goose riding a bicycle, with one wing extended forward to grip the handlebar, set against a plain white background with a brown ground line.

# 9:23 pm / ai, generative-ai, llms, anthropic, claude, llm-pricing, pelican-riding-a-bicycle, llm-release

The AI Compass (via) This political compass style quiz by bambamramfan is pretty neat - answer 29 questions about AI and AI ethics to see which of the 30 archetypes you best fit.

I'm impressed that my answers on my first time through the quiz categorized me as "The Garage Tinkerer", patron saint myself!

It's implemented as a single page React app using the <script type="text/babel"> trick to avoid the necessary build step. Here's the code.

# 5:39 pm / ai, generative-ai, llms, ai-ethics

Have your agent record video demos of its work with shot-scraper video

shot-scraper video is a new command introduced in today’s shot-scraper 1.10 release which accepts a storyboard.yml file defining a routine to run against a web application and uses Playwright to record a video of that routine. I’ve written before about the importance of having coding agents produce demos of their work; this is my latest attempt at enabling them to do that.

[... 1,162 words]

4:54 pm / projects, python, yaml, ai, datasette, playwright, shot-scraper, generative-ai, llms, pydantic, coding-agents, agentic-engineering

Release shot-scraper 1.10

The big new feature is shot-scraper video storyboard.yml, described in detail in Have your agent record video demos of its work with shot-scraper video.

30th Jun 2026, 3:10 pm · shot-scraper

June 29, 2026

Tool HTML table extractor

Yet another in my growing collection of paste-conversion tools. This one accepts pasted rich text from browsers (with embedded HTML tables) and converts every detected table into HTML, Markdown, CSV, TSV, or JSON.

Try it out by selecting everything on the Wikipedia List of cities and towns in the San Francisco Bay Area page and pasting it directly into the tool:

Screenshot of a web interface for converting table data between formats. A row of tabs labeled HTML, Markdown, CSV, TSV, and JSON sits below the bottom edge of a styled data table, with the TSV tab currently selected. The TSV tab displays the table's contents as tab-separated plain text in a monospaced font inside a bordered panel, with a "Copy" button in the upper right of that panel.

On a similar note, I recently rebuilt my Rich text to markdown tool to add support for tables and generally improve the UI.

Update: It turns out Wikipedia has an open CORS API for retrieving the full rendered HTML content of any page - demo here - so I had Codex add the ability to search Wikipedia for a page and then automatically import and display any tables from that page.

29th Jun 2026, 11:38 pm · html, tools, wikipedia, cors