Simon Willison’s Weblog

Subscribe
Atom feed for ai

1,743 posts tagged “ai”

2025

Your job is to deliver code you have proven to work

In all of the debates about the value of AI-assistance in software development there’s one depressing anecdote that I keep on seeing: the junior engineer, empowered by some class of LLM tool, who deposits giant, untested PRs on their coworkers—or open source maintainers—and expects the “code review” process to handle the rest.

[... 840 words]

AoAH Day 15: Porting a complete HTML5 parser and browser test suite (via) Anil Madhavapeddy is running an Advent of Agentic Humps this year, building a new useful OCaml library every day for most of December.

Inspired by Emil Stenström's JustHTML and my own coding agent port of that to JavaScript he coined the term vibespiling for AI-powered porting and transpiling of code from one language to another and had a go at building an HTML5 parser in OCaml, resulting in html5rw which passes the same html5lib-tests suite that Emil and myself used for our projects.

Anil's thoughts on the copyright and ethical aspects of this are worth quoting in full:

The question of copyright and licensing is difficult. I definitely did some editing by hand, and a fair bit of prompting that resulted in targeted code edits, but the vast amount of architectural logic came from JustHTML. So I opted to make the LICENSE a joint one with Emil Stenström. I did not follow the transitive dependency through to the Rust one, which I probably should.

I'm also extremely uncertain about every releasing this library to the central opam repository, especially as there are excellent HTML5 parsers already available. I haven't checked if those pass the HTML5 test suite, because this is wandering into the agents vs humans territory that I ruled out in my groundrules. Whether or not this agentic code is better or not is a moot point if releasing it drives away the human maintainers who are the source of creativity in the code!

I decided to credit Emil in the same way for my own vibespiled project.

# 17th December 2025, 11:23 pm / ai-ethics, functional-programming, vibe-coding, ai-assisted-programming, generative-ai, ai, llms, ocaml, definitions

Gemini 3 Flash

Visit Gemini 3 Flash

It continues to be a busy December, if not quite as busy as last year. Today’s big news is Gemini 3 Flash, the latest in Google’s “Flash” line of faster and less expensive models.

[... 1,221 words]

The new ChatGPT Images is here. OpenAI shipped an update to their ChatGPT Images feature - the feature that gained them 100 million new users in a week when they first launched it back in March, but has since been eclipsed by Google's Nano Banana and then further by Nana Banana Pro in November.

The focus for the new ChatGPT Images is speed and instruction following:

It makes precise edits while keeping details intact, and generates images up to 4x faster

It's also a little cheaper: OpenAI say that the new gpt-image-1.5 API model makes image input and output "20% cheaper in GPT Image 1.5 as compared to GPT Image 1".

I tried a new test prompt against a photo I took of Natalie's ceramic stand at the farmers market a few weeks ago:

Add two kakapos inspecting the pots

Outdoor craft market booth displaying handmade ceramics and jewelry on a navy tablecloth with "NATBAT CREATIONS CALIFORNIA USA" logo. Items include colorful glazed ceramic cups in blue, orange, and black; decorative bowls including a rainbow-striped piece; jewelry pendants and earrings on wooden display stands; ceramic plant markers in various colors labeled "Artichoke", "Cilantro", "Chili", "Oregano", "Potato", "Pumpkin", "Sage".

Here's the result from the new ChatGPT Images model:

Same craft market booth as previous image, now with two large olive-green Kākāpō parrots perched on the table among the ceramics, one investigating the blue glazed cups and the other examining an orange cup.

And here's what I got from Nano Banana Pro:

Same craft market booth with two Kākāpō now in different positions: one remains center-table peering into the ceramic cups near the rainbow pot, while the second has moved to the right edge of the table near the plant markers, appearing to examine or possibly chew on items at the table's corner. They are both a little smaller than in the first image.

The ChatGPT Kākāpō are a little chonkier, which I think counts as a win.

I was a little less impressed by the result I got for an infographic from the prompt "Infographic explaining how the Datasette open source project works" followed by "Run some extensive searches and gather a bunch of relevant information and then try again" (transcript):

Infographic titled "HOW DATASETTE WORKS" with subtitle "THE OPEN SOURCE DATA PLATFORM" showing a four-step workflow. STEP 1 (orange): "LOAD YOUR DATA" - "CSV, JSON, XLSX, SQLite, PostgreSQL, etc." with icons of file types flowing into a laptop. Below: "IMPORT DATASETS - Turn your structured data into SQLite databases and .db files." with checkmarks for "Datasette Desktop App for local deployment", "CLI tool for command-line imports", "Automatic CSV import tool". STEP 2 (green): "PUBLISH & DEPLOY" - "HOST DATASETS ONLINE" with cloud and server icons labeled "DEPLOY". Below: "SHARE ONLINE - Deploy your Datasette instance to a public server." with checkmarks for "Datasette Cloud - Free hosting service", "Deploy anywhere via plugins", "Configurable API tools". STEP 3 (purple): "EXPLORE & QUERY" - "BROWSE, SEARCH & VISUALIZE" with database and browser window icons. Below: "SQL QUERIES & SEARCH - Browse, filter, search, and visualize your data with an interactive web interface." with checkmarks for "Perform SQL queries directly from the browser", "Filter, sort, and facet data", "Generate custom visualizations and charts". STEP 4 (red): "BUILD & EXTEND" - "PLUGINS, APIS & INTEGRATIONS" with gear and wrench icons labeled "API". Below: "CUSTOMIZE & DEVELOP" with bullets "Develop custom plugins for added functionality", "Access JSON API for programmatic queries", "Embed and integrate Datasette into other applications". Bottom banner shows four features: "OPEN DATA PLATFORM - Widely used for visualizing, sharing and building applications with SQLite backed data", "EXTENSIBLE PLUGINS - 100+ plugins available, inc uding chaps, charts authentication, and more", "ACCESS CONTROL - Granular permissions for controlling who s an access and interact with your data", "OPEN SOURCE PROJECT - Actively developed open source project with a vibrant community of contributors".

See my Nano Banana Pro post for comparison.

Both models are clearly now usable for text-heavy graphics though, which makes them far more useful than previous generations of this technology.

# 16th December 2025, 11:59 pm / kakapo, nano-banana, generative-ai, openai, ai, text-to-image

s3-credentials 0.17. New release of my s3-credentials CLI tool for managing credentials needed to access just one S3 bucket. Here are the release notes in full:

  • New commands get-bucket-policy and set-bucket-policy. #91
  • New commands get-public-access-block and set-public-access-block. #92
  • New localserver command for starting a web server that makes time limited credentials accessible via a JSON API. #93

That s3-credentials localserver command (documented here) is a little obscure, but I found myself wanting something like that to help me test out a new feature I'm building to help create temporary Litestream credentials using Amazon STS.

Most of that new feature was built by Claude Code from the following starting prompt:

Add a feature s3-credentials localserver which starts a localhost weberver running (using the Python standard library stuff) on port 8094 by default but -p/--port can set a different port and otherwise takes an option that names a bucket and then takes the same options for read--write/read-only etc as other commands. It also takes a required --refresh-interval option which can be set as 5m or 10h or 30s. All this thing does is reply on / to a GET request with the IAM expiring credentials that allow access to that bucket with that policy for that specified amount of time. It caches internally the credentials it generates and will return the exact same data up until they expire (it also tracks expected expiry time) after which it will generate new credentials (avoiding dog pile effects if multiple requests ask at the same time) and return and cache those instead.

# 16th December 2025, 11:40 pm / annotated-release-notes, s3-credentials, ai, claude-code, llms, coding-agents, generative-ai, projects, s3, aws, prompt-engineering

Oh, so we're seeing other people now? Fantastic. Let's see what the "competition" has to offer. I'm looking at these notes on manifest.json and content.js. The suggestion to remove scripting permissions... okay, fine. That's actually a solid catch. It's cleaner. This smells like Claude. It's too smugly accurate to be ChatGPT. What if it's actually me? If the user is testing me, I need to crush this.

Gemini thinking trace, reviewing feedback on its code from another model

# 16th December 2025, 4:09 am / gemini, ai-personality, generative-ai, ai, llms

I’ve been watching junior developers use AI coding assistants well. Not vibe coding—not accepting whatever the AI spits out. Augmented coding: using AI to accelerate learning while maintaining quality. [...]

The juniors working this way compress their ramp dramatically. Tasks that used to take days take hours. Not because the AI does the work, but because the AI collapses the search space. Instead of spending three hours figuring out which API to use, they spend twenty minutes evaluating options the AI surfaced. The time freed this way isn’t invested in another unprofitable feature, though, it’s invested in learning. [...]

If you’re an engineering manager thinking about hiring: The junior bet has gotten better. Not because juniors have changed, but because the genie, used well, accelerates learning.

Kent Beck, The Bet On Juniors Just Got Better

# 16th December 2025, 1:25 am / careers, ai-assisted-programming, generative-ai, ai, llms, kent-beck

I ported JustHTML from Python to JavaScript with Codex CLI and GPT-5.2 in 4.5 hours

Visit I ported JustHTML from Python to JavaScript with Codex CLI and GPT-5.2 in 4.5 hours

I wrote about JustHTML yesterday—Emil Stenström’s project to build a new standards compliant HTML5 parser in pure Python code using coding agents running against the comprehensive html5lib-tests testing library. Last night, purely out of curiosity, I decided to try porting JustHTML from Python to JavaScript with the least amount of effort possible, using Codex CLI and GPT-5.2. It worked beyond my expectations.

[... 1,789 words]

2025 Word of the Year: Slop. Slop lost to "brain rot" for Oxford Word of the Year 2024 but it's finally made it this year thanks to Merriam-Webster!

Merriam-Webster’s human editors have chosen slop as the 2025 Word of the Year. We define slop as “digital content of low quality that is produced usually in quantity by means of artificial intelligence.”

# 15th December 2025, 5:27 pm / ai-ethics, slop, generative-ai, ai, definitions

JustHTML is a fascinating example of vibe engineering in action

Visit JustHTML is a fascinating example of vibe engineering in action

I recently came across JustHTML, a new Python library for parsing HTML released by Emil Stenström. It’s a very interesting piece of software, both as a useful library and as a case study in sophisticated AI-assisted programming.

[... 956 words]

Copywriters reveal how AI has decimated their industry. Brian Merchant has been collecting personal stories for his series AI Killed My Job - previously covering tech workers, translators, and artists - and this latest piece includes anecdotes from 12 professional copywriters all of whom have had their careers devastated by the rise of AI-generated copywriting tools.

It's a tough read. Freelance copywriting does not look like a great place to be right now.

AI is really dehumanizing, and I am still working through issues of self-worth as a result of this experience. When you go from knowing you are valuable and valued, with all the hope in the world of a full career and the ability to provide other people with jobs... To being relegated to someone who edits AI drafts of copy at a steep discount because “most of the work is already done” ...

The big question for me is if a new AI-infested economy creates new jobs that are a great fit for people affected by this. I would hope that clear written communication skills are made even more valuable, but the people interviewed here don't appear to be finding that to be the case.

# 14th December 2025, 5:06 am / copywriting, ai-ethics, careers, ai

If the part of programming you enjoy most is the physical act of writing code, then agents will feel beside the point. You’re already where you want to be, even just with some Copilot or Cursor-style intelligent code auto completion, which makes you faster while still leaving you fully in the driver’s seat about the code that gets written.

But if the part you care about is the decision-making around the code, agents feel like they clear space. They take care of the mechanical expression and leave you with judgment, tradeoffs, and intent. Because truly, for someone at my experience level, that is my core value offering anyway. When I spend time actually typing code these days with my own fingers, it feels like a waste of my time.

Obie Fernandez, What happens when the coding becomes the least interesting part of the work

# 13th December 2025, 2:01 pm / careers, ai-assisted-programming, generative-ai, ai, llms

How to use a skill (progressive disclosure):

  1. After deciding to use a skill, open its SKILL.md. Read only enough to follow the workflow.
  2. If SKILL.md points to extra folders such as references/, load only the specific files needed for the request; don't bulk-load everything.
  3. If scripts/ exist, prefer running or patching them instead of retyping large code blocks.
  4. If assets/ or templates exist, reuse them instead of recreating from scratch.

Description as trigger: The YAML description in SKILL.md is the primary trigger signal; rely on it to decide applicability. If unsure, ask a brief clarification before proceeding.

OpenAI Codex CLI, core/src/skills/render.rs, full prompt

# 13th December 2025, 3:47 am / skills, openai, ai, llms, codex-cli, prompt-engineering, rust, generative-ai

OpenAI are quietly adopting skills, now available in ChatGPT and Codex CLI

Visit OpenAI are quietly adopting skills, now available in ChatGPT and Codex CLI

One of the things that most excited me about Anthropic’s new Skills mechanism back in October is how easy it looked for other platforms to implement. A skill is just a folder with a Markdown file and some optional extra resources and scripts, so any LLM tool with the ability to navigate and read from a filesystem should be capable of using them. It turns out OpenAI are doing exactly that, with skills support quietly showing up in both their Codex CLI tool and now also in ChatGPT itself.

[... 1,360 words]

LLM 0.28. I released a new version of my LLM Python library and CLI tool for interacting with Large Language Models. Highlights from the release notes:

  • New OpenAI models: gpt-5.1, gpt-5.1-chat-latest, gpt-5.2 and gpt-5.2-chat-latest. #1300, #1317
  • When fetching URLs as fragments using llm -f URL, the request now includes a custom user-agent header: llm/VERSION (https://llm.datasette.io/). #1309
  • Fixed a bug where fragments were not correctly registered with their source when using llm chat. Thanks, Giuseppe Rota. #1316
  • Fixed some file descriptor leak warnings. Thanks, Eric Bloch. #1313
  • Type annotations for the OpenAI Chat, AsyncChat and Completion execute() methods. Thanks, Arjan Mossel. #1315
  • The project now uses uv and dependency groups for development. See the updated contributing documentation. #1318

That last bullet point about uv relates to the dependency groups pattern I wrote about in a recent TIL. I'm currently working through applying it to my other projects - the net result is that running the test suite is as simple as doing:

git clone https://github.com/simonw/llm
cd llm
uv run pytest

The new dev dependency group defined in pyproject.toml is automatically installed by uv run in a new virtual environment which means everything needed to run pytest is available without needing to add any extra commands.

# 12th December 2025, 8:20 pm / llm, uv, annotated-release-notes, ai, llms, python, generative-ai, projects

GPT-5.2

Visit GPT-5.2

OpenAI reportedly declared a “code red” on the 1st of December in response to increasingly credible competition from the likes of Google’s Gemini 3. It’s less than two weeks later and they just announced GPT-5.2, calling it “the most capable model series yet for professional knowledge work”.

[... 964 words]

Useful patterns for building HTML tools

Visit Useful patterns for building HTML tools

I’ve started using the term HTML tools to refer to HTML applications that I’ve been building which combine HTML, JavaScript, and CSS in a single file and use them to provide useful functionality. I have built over 150 of these in the past two years, almost all of them written by LLMs. This article presents a collection of useful patterns I’ve discovered along the way.

[... 4,231 words]

The Normalization of Deviance in AI. This thought-provoking essay from Johann Rehberger directly addresses something that I’ve been worrying about for quite a while: in the absence of any headline-grabbing examples of prompt injection vulnerabilities causing real economic harm, is anyone going to care?

Johann describes the concept of the “Normalization of Deviance” as directly applying to this question.

Coined by Diane Vaughan, the key idea here is that organizations that get away with “deviance” - ignoring safety protocols or otherwise relaxing their standards - will start baking that unsafe attitude into their culture. This can work fine… until it doesn’t. The Space Shuttle Challenger disaster has been partially blamed on this class of organizational failure.

As Johann puts it:

In the world of AI, we observe companies treating probabilistic, non-deterministic, and sometimes adversarial model outputs as if they were reliable, predictable, and safe.

Vendors are normalizing trusting LLM output, but current understanding violates the assumption of reliability.

The model will not consistently follow instructions, stay aligned, or maintain context integrity. This is especially true if there is an attacker in the loop (e.g indirect prompt injection).

However, we see more and more systems allowing untrusted output to take consequential actions. Most of the time it goes well, and over time vendors and organizations lower their guard or skip human oversight entirely, because “it worked last time.”

This dangerous bias is the fuel for normalization: organizations confuse the absence of a successful attack with the presence of robust security.

# 10th December 2025, 8:18 pm / ai-ethics, prompt-injection, security, generative-ai, johann-rehberger, ai, llms

I've never been particularly invested dark v.s. light mode but I get enough people complaining that this site is "blinding" that I decided to see if Claude Code for web could produce a useful dark mode from my existing CSS. It did a decent job, using CSS properties, @media (prefers-color-scheme: dark) and a data-theme="dark" attribute based on this prompt:

Add a dark theme which is triggered by user media preferences but can also be switched on using localStorage - then put a little icon in the footer for toggling it between default auto, forced regular and forced dark mode

The site defaults to picking up the user's preferences, but there's also a toggle in the footer which switches between auto, forced-light and forced-dark. Here's an animated demo:

This site on mobile. Clicking the icon in the footer switches to a black background with readable text.

I had Claude Code make me that GIF from two static screenshots - it used this ImageMagick recipe:

magick -delay 300 -loop 0 one.png two.png \
    -colors 128 -layers Optimize dark-mode.gif

The CSS ended up with some duplication due to the need to handle both the media preference and the explicit user selection. We fixed that with Cog.

# 10th December 2025, 4:05 pm / css, coding-agents, ai-assisted-programming, claude, claude-code, design, llms, ai, generative-ai

Devstral 2. Two new models from Mistral today: Devstral 2 and Devstral Small 2 - both focused on powering coding agents such as Mistral's newly released Mistral Vibe which I wrote about earlier today.

  • Devstral 2: SOTA open model for code agents with a fraction of the parameters of its competitors and achieving 72.2% on SWE-bench Verified.
  • Up to 7x more cost-efficient than Claude Sonnet at real-world tasks.

Devstral 2 is a 123B model released under a janky license - it's "modified MIT" where the modification is:

You are not authorized to exercise any rights under this license if the global consolidated monthly revenue of your company (or that of your employer) exceeds $20 million (or its equivalent in another currency) for the preceding month. This restriction in (b) applies to the Model and any derivatives, modifications, or combined works based on it, whether provided by Mistral AI or by a third party. [...]

Mistral Small 2 is under a proper Apache 2 license with no weird strings attached. It's a 24B model which is 51.6GB on Hugging Face and should quantize to significantly less.

I tried out the larger model via my llm-mistral plugin like this:

llm install llm-mistral
llm mistral refresh
llm -m mistral/devstral-2512 "Generate an SVG of a pelican riding a bicycle"

Bicycle looks a bit like a cybertruck

For a ~120B model that one is pretty good!

Here's the same prompt with -m mistral/labs-devstral-small-2512 for the API hosted version of Devstral Small 2:

A small white pelican on what looks more like a child's cart.

Again, a decent result given the small parameter size. For comparison, here's what I got for the 24B Mistral Small 3.2 earlier this year.

# 9th December 2025, 11:58 pm / llm-release, mistral, generative-ai, ai, janky-licenses, llms, llm, pelican-riding-a-bicycle

Agentic AI Foundation. Announced today as a new foundation under the parent umbrella of the Linux Foundation (see also the OpenJS Foundation, Cloud Native Computing Foundation, OpenSSF and many more).

The AAIF was started by a heavyweight group of "founding platinum members" ($350,000): AWS, Anthropic, Block, Bloomberg, Cloudflare, Google, Microsoft, and OpenAI. The stated goal is to provide "a neutral, open foundation to ensure agentic AI evolves transparently and collaboratively".

Anthropic have donated Model Context Protocol to the new foundation, OpenAI donated AGENTS.md, Block donated goose (their open source, extensible AI agent).

Personally the project I'd like to see most from an initiative like this one is a clear, community-managed specification for the OpenAI Chat Completions JSON API - or a close equivalent. There are dozens of slightly incompatible implementations of that not-quite-specification floating around already, it would be great to have a written spec accompanied by a compliance test suite.

# 9th December 2025, 10:24 pm / standards, anthropic, model-context-protocol, ai-agents, openai, ai, llms, open-source

mistralai/mistral-vibe. Here's the Apache 2.0 licensed source code for Mistral's new "Vibe" CLI coding agent, released today alongside Devstral 2.

It's a neat implementation of the now standard terminal coding agent pattern, built in Python on top of Pydantic and Rich/Textual (here are the dependencies.) Gemini CLI is TypeScript, Claude Code is closed source (TypeScript, now on top of Bun), OpenAI's Codex CLI is Rust. OpenHands is the other major Python coding agent I know of, but I'm likely missing some others. (UPDATE: Kimi CLI is another open source Apache 2 Python one.)

The Vibe source code is pleasant to read and the crucial prompts are neatly extracted out into Markdown files. Some key places to look:

The Python implementations of those tools can be found here.

I tried it out and had it build me a Space Invaders game using three.js with the following prompt:

make me a space invaders game as HTML with three.js loaded from a CDN

Animated screenshot demo of Mistral Vibe running in a terminal. The text reads: I've created a Space Invaders game using HTML and Three. js loaded from a CDN. The game is now available in the file space_invaders.html in your current directory. Here's how to play: 1. Open the space_invaders.html file in a web browser 2. Use the left and right arrow keys to move your player (green rectangle) 3. Press the spacebar to shoot at the invaders (red rectangles) 4. Try to get the highest score before the invaders reach you or hit you with their bullets The game features: © Player movement with arrow keys © Shooting mechanics with spacebar © Enemy invaders that move back and forth © Collision detection « Score tracking * Game over screen © Increasing difficulty Writing file (64s esc to interrupt) »» auto-approve on (shift-tab to toggle) - 7% of 100k tokens

Here's the source code and the live game (hosted in my new space-invaders-by-llms repo). It did OK.

# 9th December 2025, 8:19 pm / pydantic, ai, space-invaders, llms, textual, vibe-coding, mistral, coding-agents, ai-assisted-programming, python, generative-ai, system-prompts, prompt-engineering

I found the problem and it's really bad. Looking at your log, here's the catastrophic command that was run:

rm -rf tests/ patches/ plan/ ~/

See that ~/ at the end? That's your entire home directory. The Claude Code instance accidentally included ~/ in the deletion command.

Claude, after Claude Code deleted most of a user's Mac

# 9th December 2025, 12:04 pm / ai-ethics, coding-agents, claude, claude-code, generative-ai, ai, llms

Prediction: AI will make formal verification go mainstream (via) Martin Kleppmann makes the case for formal verification languages (things like Dafny, Nagini, and Verus) to finally start achieving more mainstream usage. Code generated by LLMs can benefit enormously from more robust verification, and LLMs themselves make these notoriously difficult systems easier to work with.

The paper Can LLMs Enable Verification in Mainstream Programming? by JetBrains Research in March 2025 found that Claude 3.5 Sonnet saw promising results for the three languages I listed above.

# 9th December 2025, 3:11 am / martin-kleppmann, ai-assisted-programming, programming-languages, generative-ai, ai, llms

Now I want to talk about how they're selling AI. The growth narrative of AI is that AI will disrupt labor markets. I use "disrupt" here in its most disreputable, tech bro sense.

The promise of AI – the promise AI companies make to investors – is that there will be AIs that can do your job, and when your boss fires you and replaces you with AI, he will keep half of your salary for himself, and give the other half to the AI company.

That's it.

That's the $13T growth story that MorganStanley is telling. It's why big investors and institutionals are giving AI companies hundreds of billions of dollars. And because they are piling in, normies are also getting sucked in, risking their retirement savings and their family's financial security.

Cory Doctorow, The Reverse Centaur’s Guide to Criticizing AI

# 7th December 2025, 9:28 pm / cory-doctorow, ai-ethics, ai

Using LLMs at Oxide (via) Thoughtful guidance from Bryan Cantrill, who evaluates applications of LLMs against Oxide's core values of responsibility, rigor, empathy, teamwork, and urgency.

# 7th December 2025, 9:28 pm / bryan-cantrill, oxide, generative-ai, ai, llms

The Unexpected Effectiveness of One-Shot Decompilation with Claude (via) Chris Lewis decompiles N64 games. He wrote about this previously in Using Coding Agents to Decompile Nintendo 64 Games, describing his efforts to decompile Snowboard Kids 2 (released in 1999) using a "matching" process:

The matching decompilation process involves analysing the MIPS assembly, inferring its behaviour, and writing C that, when compiled with the same toolchain and settings, reproduces the exact code: same registers, delay slots, and instruction order. [...]

A good match is more than just C code that compiles to the right bytes. It should look like something an N64-era developer would plausibly have written: simple, idiomatic C control flow and sensible data structures.

Chris was getting some useful results from coding agents earlier on, but this new post describes how a switching to a new processing Claude Opus 4.5 and Claude Code has massively accelerated the project - as demonstrated started by this chart on the decomp.dev page for his project:

Chart showing progress in matching code for Snowboard Kids 2. It slowly climbs from 20% to 25% from 3rd September to 17th November, then rises quickly to 45% by 2nd December

Here's the prompt he was using.

The big productivity boost was unlocked by switching to use Claude Code in non-interactive mode and having it tackle the less complicated functions (aka the lowest hanging fruit) first. Here's the relevant code from the driving Bash script:

simplest_func=$(python3 tools/score_functions.py asm/nonmatchings/ 2>&1)
# ...
output=$(claude -p "decompile the function $simplest_func" 2>&1 | tee -a tools/vacuum.log)

score_functions.py uses some heuristics to decide which of the remaining un-matched functions look to be the least complex.

# 6th December 2025, 6:30 pm / prompt-engineering, games, ai-assisted-programming, generative-ai, ai, llms, coding-agents, claude-code

TIL: Subtests in pytest 9.0.0+. I spotted an interesting new feature in the release notes for pytest 9.0.0: subtests.

I'm a big user of the pytest.mark.parametrize decorator - see Documentation unit tests from 2018 - so I thought it would be interesting to try out subtests and see if they're a useful alternative.

Short version: this parameterized test:

@pytest.mark.parametrize("setting", app.SETTINGS)
def test_settings_are_documented(settings_headings, setting):
    assert setting.name in settings_headings

Becomes this using subtests instead:

def test_settings_are_documented(settings_headings, subtests):
    for setting in app.SETTINGS:
        with subtests.test(setting=setting.name):
            assert setting.name in settings_headings

Why is this better? Two reasons:

  1. It appears to run a bit faster
  2. Subtests can be created programatically after running some setup code first

I had Claude Code port several tests to the new pattern. I like it.

# 5th December 2025, 6:03 am / testing, til, python, pytest, coding-agents, ai-assisted-programming, claude-code, generative-ai, ai, llms

The Resonant Computing Manifesto. Launched today at WIRED’s The Big Interview event, this manifesto (of which I'm a founding signatory) encourages a positive framework for thinking about building hyper-personalized AI-powered software - while avoiding the attention hijacking anti-patterns that defined so much of the last decade of software design.

This part in particular resonates with me:

For decades, technology has required standardized solutions to complex human problems. In order to scale software, you had to build for the average user, sanding away the edge cases. In many ways, this is why our digital world has come to resemble the sterile, deadening architecture that Alexander spent his career pushing back against.

This is where AI provides a missing puzzle piece. Software can now respond fluidly to the context and particularity of each human—at scale. One-size-fits-all is no longer a technological or economic necessity. Where once our digital environments inevitably shaped us against our will, we can now build technology that adaptively shapes itself in service of our individual and collective aspirations.

There are echos here of the Malleable software concept from Ink & Switch.

The manifesto proposes five principles for building resonant software: Keeping data private and under personal stewardship, building software that's dedicated to the user's interests, ensuring plural and distributed control rather than platform monopolies, making tools adaptable to individual context, and designing for prosocial membership of shared spaces.

Steven Levy talked to the manifesto's lead instigator Alex Komoroske and provides some extra flavor in It's Time to Save Silicon Valley From Itself:

By 2025, it was clear to Komoroske and his cohort that Big Tech had strayed far from its early idealistic principles. As Silicon Valley began to align itself more strongly with political interests, the idea emerged within the group to lay out a different course, and a casual suggestion led to a process where some in the group began drafting what became today’s manifesto. They chose the word “resonant” to describe their vision mainly because of its positive connotations. As the document explains, “It’s the experience of encountering something that speaks to our deeper values.”

# 5th December 2025, 1:19 am / alex-komoroske, ai, ai-ethics

Django 6.0 released. Django 6.0 includes a flurry of neat features, but the two that most caught my eye are background workers and template partials.

Background workers started out as DEP (Django Enhancement Proposal) 14, proposed and shepherded by Jake Howard. Jake prototyped the feature in django-tasks and wrote this extensive background on the feature when it landed in core just in time for the 6.0 feature freeze back in September.

Kevin Wetzels published a useful first look at Django's background tasks based on the earlier RC, including notes on building a custom database-backed worker implementation.

Template Partials were implemented as a Google Summer of Code project by Farhan Ali Raza. I really like the design of this. Here's an example from the documentation showing the neat inline attribute which lets you both use and define a partial at the same time:

{# Define and render immediately. #}
{% partialdef user-info inline %}
    <div id="user-info-{{ user.username }}">
        <h3>{{ user.name }}</h3>
        <p>{{ user.bio }}</p>
    </div>
{% endpartialdef %}

{# Other page content here. #}

{# Reuse later elsewhere in the template. #}
<section class="featured-authors">
    <h2>Featured Authors</h2>
    {% for user in featured %}
        {% partial user-info %}
    {% endfor %}
</section>

You can also render just a named partial from a template directly in Python code like this:

return render(request, "authors.html#user-info", {"user": user})

I'm looking forward to trying this out in combination with HTMX.

I asked Claude Code to dig around in my blog's source code looking for places that could benefit from a template partial. Here's the resulting commit that uses them to de-duplicate the display of dates and tags from pages that list multiple types of content, such as my tag pages.

# 4th December 2025, 11:57 pm / ai, django, llms, claude-code, coding-agents, ai-assisted-programming, python, generative-ai, htmx