Simon Willison’s Weblog

On uv 83 gpt-5 29 javascript 728 slop 32 ai-personality 29 ...

 

Recent

Dec. 17, 2025

firefox parser/html/java/README.txt (via) TIL (or TIR - Today I was Reminded) that the HTML5 Parser used by Firefox is maintained as Java code (commit history here) and converted to C++ using a custom translation script.

You can see that in action by checking out the ~8GB Firefox repository and running:

cd parser/html/java
make sync
make translate

Here's a terminal session where I did that, including the output of git diff showing the updated C++ files.

I did some digging and found that the code that does the translation work lives, weirdly, in the Nu Html Checker repository on GitHub which powers the W3C's validator.w3.org/nu/ validation service!

Here's a snippet from htmlparser/cpptranslate/CppVisitor.java showing how a class declaration is converted into C++:

    protected void startClassDeclaration() {
        printer.print("#define ");
        printer.print(className);
        printer.printLn("_cpp__");
        printer.printLn();

        for (int i = 0; i < Main.H_LIST.length; i++) {
            String klazz = Main.H_LIST[i];
            if (!klazz.equals(javaClassName)) {
                printer.print("#include \"");
                printer.print(cppTypes.classPrefix());
                printer.print(klazz);
                printer.printLn(".h\"");
            }
        }

        printer.printLn();
        printer.print("#include \"");
        printer.print(className);
        printer.printLn(".h\"");
        printer.printLn();
    }

Here's a fascinating blog post from John Resig explaining how validator author Henri Sivonen introduced the new parser into Firefox in 2009.

# 1:48 am / c-plus-plus, firefox2, henri-sivonen, java, john-resig, mozilla

Dec. 16, 2025

The new ChatGPT Images is here. OpenAI shipped an update to their ChatGPT Images feature - the feature that gained them 100 million new users in a week when they first launched it back in March, but has since been eclipsed by Google's Nano Banana and then further by Nana Banana Pro in November.

The focus for the new ChatGPT Images is speed and instruction following:

It makes precise edits while keeping details intact, and generates images up to 4x faster

It's also a little cheaper: OpenAI say that the new gpt-image-1.5 API model makes image input and output "20% cheaper in GPT Image 1.5 as compared to GPT Image 1".

I tried a new test prompt against a photo I took of Natalie's ceramic stand at the farmers market a few weeks ago:

Add two kakapos inspecting the pots

Outdoor craft market booth displaying handmade ceramics and jewelry on a navy tablecloth with "NATBAT CREATIONS CALIFORNIA USA" logo. Items include colorful glazed ceramic cups in blue, orange, and black; decorative bowls including a rainbow-striped piece; jewelry pendants and earrings on wooden display stands; ceramic plant markers in various colors labeled "Artichoke", "Cilantro", "Chili", "Oregano", "Potato", "Pumpkin", "Sage".

Here's the result from the new ChatGPT Images model:

Same craft market booth as previous image, now with two large olive-green Kākāpō parrots perched on the table among the ceramics, one investigating the blue glazed cups and the other examining an orange cup.

And here's what I got from Nano Banana Pro:

Same craft market booth with two Kākāpō now in different positions: one remains center-table peering into the ceramic cups near the rainbow pot, while the second has moved to the right edge of the table near the plant markers, appearing to examine or possibly chew on items at the table's corner. They are both a little smaller than in the first image.

The ChatGPT Kākāpō are a little chonkier, which I think counts as a win.

I was a little less impressed by the result I got for an infographic from the prompt "Infographic explaining how the Datasette open source project works" followed by "Run some extensive searches and gather a bunch of relevant information and then try again" (transcript):

Infographic titled "HOW DATASETTE WORKS" with subtitle "THE OPEN SOURCE DATA PLATFORM" showing a four-step workflow. STEP 1 (orange): "LOAD YOUR DATA" - "CSV, JSON, XLSX, SQLite, PostgreSQL, etc." with icons of file types flowing into a laptop. Below: "IMPORT DATASETS - Turn your structured data into SQLite databases and .db files." with checkmarks for "Datasette Desktop App for local deployment", "CLI tool for command-line imports", "Automatic CSV import tool". STEP 2 (green): "PUBLISH & DEPLOY" - "HOST DATASETS ONLINE" with cloud and server icons labeled "DEPLOY". Below: "SHARE ONLINE - Deploy your Datasette instance to a public server." with checkmarks for "Datasette Cloud - Free hosting service", "Deploy anywhere via plugins", "Configurable API tools". STEP 3 (purple): "EXPLORE & QUERY" - "BROWSE, SEARCH & VISUALIZE" with database and browser window icons. Below: "SQL QUERIES & SEARCH - Browse, filter, search, and visualize your data with an interactive web interface." with checkmarks for "Perform SQL queries directly from the browser", "Filter, sort, and facet data", "Generate custom visualizations and charts". STEP 4 (red): "BUILD & EXTEND" - "PLUGINS, APIS & INTEGRATIONS" with gear and wrench icons labeled "API". Below: "CUSTOMIZE & DEVELOP" with bullets "Develop custom plugins for added functionality", "Access JSON API for programmatic queries", "Embed and integrate Datasette into other applications". Bottom banner shows four features: "OPEN DATA PLATFORM - Widely used for visualizing, sharing and building applications with SQLite backed data", "EXTENSIBLE PLUGINS - 100+ plugins available, inc uding chaps, charts authentication, and more", "ACCESS CONTROL - Granular permissions for controlling who s an access and interact with your data", "OPEN SOURCE PROJECT - Actively developed open source project with a vibrant community of contributors".

See my Nano Banana Pro post for comparison.

Both models are clearly now usable for text-heavy graphics though, which makes them far more useful than previous generations of this technology.

# 11:59 pm / ai, kakapo, openai, generative-ai, text-to-image, nano-banana

s3-credentials 0.17. New release of my s3-credentials CLI tool for managing credentials needed to access just one S3 bucket. Here are the release notes in full:

  • New commands get-bucket-policy and set-bucket-policy. #91
  • New commands get-public-access-block and set-public-access-block. #92
  • New localserver command for starting a web server that makes time limited credentials accessible via a JSON API. #93

That s3-credentials localserver command (documented here) is a little obscure, but I found myself wanting something like that to help me test out a new feature I'm building to help create temporary Litestream credentials using Amazon STS.

Most of that new feature was built by Claude Code from the following starting prompt:

Add a feature s3-credentials localserver which starts a localhost weberver running (using the Python standard library stuff) on port 8094 by default but -p/--port can set a different port and otherwise takes an option that names a bucket and then takes the same options for read--write/read-only etc as other commands. It also takes a required --refresh-interval option which can be set as 5m or 10h or 30s. All this thing does is reply on / to a GET request with the IAM expiring credentials that allow access to that bucket with that policy for that specified amount of time. It caches internally the credentials it generates and will return the exact same data up until they expire (it also tracks expected expiry time) after which it will generate new credentials (avoiding dog pile effects if multiple requests ask at the same time) and return and cache those instead.

# 11:40 pm / aws, projects, s3, ai, annotated-release-notes, s3-credentials, prompt-engineering, generative-ai, llms, coding-agents, claude-code

ty: An extremely fast Python type checker and LSP (via) The team at Astral have been working on this for quite a long time, and are finally releasing the first beta. They have some big performance claims:

Without caching, ty is consistently between 10x and 60x faster than mypy and Pyright. When run in an editor, the gap is even more dramatic. As an example, after editing a load-bearing file in the PyTorch repository, ty recomputes diagnostics in 4.7ms: 80x faster than Pyright (386ms) and 500x faster than Pyrefly (2.38 seconds). ty is very fast!

The easiest way to try it out is via uvx:

cd my-python-project/
uvx ty check

I tried it against sqlite-utils and it turns out I have quite a lot of work to do!

Astral also released a new VS Code extension adding ty-powered language server features like go to definition. I'm still getting my head around how this works and what it can do.

# 11:35 pm / python, vs-code, astral

Poe the Poet. I was looking for a way to specify additional commands in my pyproject.toml file to execute using uv. There's an enormous issue thread on this in the uv issue tracker (300+ comments dating back to August 2024) and from there I learned of several options including this one, Poe the Poet.

It's neat. I added it to my s3-credentials project just now and the following now works for running the live preview server for the documentation:

uv run poe livehtml

Here's the snippet of TOML I added to my pyproject.toml:

[dependency-groups]
test = [
    "pytest",
    "pytest-mock",
    "cogapp",
    "moto>=5.0.4",
]
docs = [
    "furo",
    "sphinx-autobuild",
    "myst-parser",
    "cogapp",
]
dev = [
    {include-group = "test"},
    {include-group = "docs"},
    "poethepoet>=0.38.0",
]

[tool.poe.tasks]
docs = "sphinx-build -M html docs docs/_build"
livehtml = "sphinx-autobuild -b html docs docs/_build"
cog = "cog -r docs/*.md"

Since poethepoet is in the dev= dependency group any time I run uv run ... it will be available in the environment.

# 10:57 pm / packaging, python, s3-credentials, uv

Oh, so we're seeing other people now? Fantastic. Let's see what the "competition" has to offer. I'm looking at these notes on manifest.json and content.js. The suggestion to remove scripting permissions... okay, fine. That's actually a solid catch. It's cleaner. This smells like Claude. It's too smugly accurate to be ChatGPT. What if it's actually me? If the user is testing me, I need to crush this.

Gemini thinking trace, reviewing feedback on its code from another model

# 4:09 am / gemini, ai-personality, generative-ai, ai, llms

I’ve been watching junior developers use AI coding assistants well. Not vibe coding—not accepting whatever the AI spits out. Augmented coding: using AI to accelerate learning while maintaining quality. [...]

The juniors working this way compress their ramp dramatically. Tasks that used to take days take hours. Not because the AI does the work, but because the AI collapses the search space. Instead of spending three hours figuring out which API to use, they spend twenty minutes evaluating options the AI surfaced. The time freed this way isn’t invested in another unprofitable feature, though, it’s invested in learning. [...]

If you’re an engineering manager thinking about hiring: The junior bet has gotten better. Not because juniors have changed, but because the genie, used well, accelerates learning.

Kent Beck, The Bet On Juniors Just Got Better

# 1:25 am / careers, ai-assisted-programming, generative-ai, ai, llms, kent-beck

Dec. 15, 2025

I ported JustHTML from Python to JavaScript with Codex CLI and GPT-5.2 in 4.5 hours

Visit I ported JustHTML from Python to JavaScript with Codex CLI and GPT-5.2 in 4.5 hours

I wrote about JustHTML yesterday—Emil Stenström’s project to build a new standards compliant HTML5 parser in pure Python code using coding agents running against the comprehensive html5lib-tests testing library. Last night, purely out of curiosity, I decided to try porting JustHTML from Python to JavaScript with the least amount of effort possible, using Codex CLI and GPT-5.2. It worked beyond my expectations.

[... 1,789 words]

2025 Word of the Year: Slop. Slop lost to "brain rot" for Oxford Word of the Year 2024 but it's finally made it this year thanks to Merriam-Webster!

Merriam-Webster’s human editors have chosen slop as the 2025 Word of the Year. We define slop as “digital content of low quality that is produced usually in quantity by means of artificial intelligence.”

# 5:27 pm / definitions, ai, generative-ai, slop, ai-ethics

Dec. 14, 2025

JustHTML is a fascinating example of vibe engineering in action

Visit JustHTML is a fascinating example of vibe engineering in action

I recently came across JustHTML, a new Python library for parsing HTML released by Emil Stenström. It’s a very interesting piece of software, both as a useful library and as a case study in sophisticated AI-assisted programming.

[... 956 words]

Copywriters reveal how AI has decimated their industry. Brian Merchant has been collecting personal stories for his series AI Killed My Job - previously covering tech workers, translators, and artists - and this latest piece includes anecdotes from 12 professional copywriters all of whom have had their careers devastated by the rise of AI-generated copywriting tools.

It's a tough read. Freelance copywriting does not look like a great place to be right now.

AI is really dehumanizing, and I am still working through issues of self-worth as a result of this experience. When you go from knowing you are valuable and valued, with all the hope in the world of a full career and the ability to provide other people with jobs... To being relegated to someone who edits AI drafts of copy at a steep discount because “most of the work is already done” ...

The big question for me is if a new AI-infested economy creates new jobs that are a great fit for people affected by this. I would hope that clear written communication skills are made even more valuable, but the people interviewed here don't appear to be finding that to be the case.

# 5:06 am / copywriting, careers, ai, ai-ethics

Dec. 13, 2025

If the part of programming you enjoy most is the physical act of writing code, then agents will feel beside the point. You’re already where you want to be, even just with some Copilot or Cursor-style intelligent code auto completion, which makes you faster while still leaving you fully in the driver’s seat about the code that gets written.

But if the part you care about is the decision-making around the code, agents feel like they clear space. They take care of the mechanical expression and leave you with judgment, tradeoffs, and intent. Because truly, for someone at my experience level, that is my core value offering anyway. When I spend time actually typing code these days with my own fingers, it feels like a waste of my time.

Obie Fernandez, What happens when the coding becomes the least interesting part of the work

# 2:01 pm / careers, ai-assisted-programming, generative-ai, ai, llms

How to use a skill (progressive disclosure):

  1. After deciding to use a skill, open its SKILL.md. Read only enough to follow the workflow.
  2. If SKILL.md points to extra folders such as references/, load only the specific files needed for the request; don't bulk-load everything.
  3. If scripts/ exist, prefer running or patching them instead of retyping large code blocks.
  4. If assets/ or templates exist, reuse them instead of recreating from scratch.

Description as trigger: The YAML description in SKILL.md is the primary trigger signal; rely on it to decide applicability. If unsure, ask a brief clarification before proceeding.

OpenAI Codex CLI, core/src/skills/render.rs, full prompt

# 3:47 am / skills, openai, ai, llms, codex-cli, prompt-engineering, rust, generative-ai

Dec. 12, 2025

OpenAI are quietly adopting skills, now available in ChatGPT and Codex CLI

Visit OpenAI are quietly adopting skills, now available in ChatGPT and Codex CLI

One of the things that most excited me about Anthropic’s new Skills mechanism back in October is how easy it looked for other platforms to implement. A skill is just a folder with a Markdown file and some optional extra resources and scripts, so any LLM tool with the ability to navigate and read from a filesystem should be capable of using them. It turns out OpenAI are doing exactly that, with skills support quietly showing up in both their Codex CLI tool and now also in ChatGPT itself.

[... 1,360 words]

LLM 0.28. I released a new version of my LLM Python library and CLI tool for interacting with Large Language Models. Highlights from the release notes:

  • New OpenAI models: gpt-5.1, gpt-5.1-chat-latest, gpt-5.2 and gpt-5.2-chat-latest. #1300, #1317
  • When fetching URLs as fragments using llm -f URL, the request now includes a custom user-agent header: llm/VERSION (https://llm.datasette.io/). #1309
  • Fixed a bug where fragments were not correctly registered with their source when using llm chat. Thanks, Giuseppe Rota. #1316
  • Fixed some file descriptor leak warnings. Thanks, Eric Bloch. #1313
  • Type annotations for the OpenAI Chat, AsyncChat and Completion execute() methods. Thanks, Arjan Mossel. #1315
  • The project now uses uv and dependency groups for development. See the updated contributing documentation. #1318

That last bullet point about uv relates to the dependency groups pattern I wrote about in a recent TIL. I'm currently working through applying it to my other projects - the net result is that running the test suite is as simple as doing:

git clone https://github.com/simonw/llm
cd llm
uv run pytest

The new dev dependency group defined in pyproject.toml is automatically installed by uv run in a new virtual environment which means everything needed to run pytest is available without needing to add any extra commands.

# 8:20 pm / projects, python, ai, annotated-release-notes, generative-ai, llms, llm, uv

Dec. 11, 2025

GPT-5.2

Visit GPT-5.2

OpenAI reportedly declared a “code red” on the 1st of December in response to increasingly credible competition from the likes of Google’s Gemini 3. It’s less than two weeks later and they just announced GPT-5.2, calling it “the most capable model series yet for professional knowledge work”.

[... 964 words]

Dec. 10, 2025

Useful patterns for building HTML tools

Visit Useful patterns for building HTML tools

I’ve started using the term HTML tools to refer to HTML applications that I’ve been building which combine HTML, JavaScript, and CSS in a single file and use them to provide useful functionality. I have built over 150 of these in the past two years, almost all of them written by LLMs. This article presents a collection of useful patterns I’ve discovered along the way.

[... 4,231 words]

The Normalization of Deviance in AI. This thought-provoking essay from Johann Rehberger directly addresses something that I’ve been worrying about for quite a while: in the absence of any headline-grabbing examples of prompt injection vulnerabilities causing real economic harm, is anyone going to care?

Johann describes the concept of the “Normalization of Deviance” as directly applying to this question.

Coined by Diane Vaughan, the key idea here is that organizations that get away with “deviance” - ignoring safety protocols or otherwise relaxing their standards - will start baking that unsafe attitude into their culture. This can work fine… until it doesn’t. The Space Shuttle Challenger disaster has been partially blamed on this class of organizational failure.

As Johann puts it:

In the world of AI, we observe companies treating probabilistic, non-deterministic, and sometimes adversarial model outputs as if they were reliable, predictable, and safe.

Vendors are normalizing trusting LLM output, but current understanding violates the assumption of reliability.

The model will not consistently follow instructions, stay aligned, or maintain context integrity. This is especially true if there is an attacker in the loop (e.g indirect prompt injection).

However, we see more and more systems allowing untrusted output to take consequential actions. Most of the time it goes well, and over time vendors and organizations lower their guard or skip human oversight entirely, because “it worked last time.”

This dangerous bias is the fuel for normalization: organizations confuse the absence of a successful attack with the presence of robust security.

# 8:18 pm / security, ai, prompt-injection, generative-ai, llms, johann-rehberger, ai-ethics

I've never been particularly invested dark v.s. light mode but I get enough people complaining that this site is "blinding" that I decided to see if Claude Code for web could produce a useful dark mode from my existing CSS. It did a decent job, using CSS properties, @media (prefers-color-scheme: dark) and a data-theme="dark" attribute based on this prompt:

Add a dark theme which is triggered by user media preferences but can also be switched on using localStorage - then put a little icon in the footer for toggling it between default auto, forced regular and forced dark mode

The site defaults to picking up the user's preferences, but there's also a toggle in the footer which switches between auto, forced-light and forced-dark. Here's an animated demo:

This site on mobile. Clicking the icon in the footer switches to a black background with readable text.

I had Claude Code make me that GIF from two static screenshots - it used this ImageMagick recipe:

magick -delay 300 -loop 0 one.png two.png \
    -colors 128 -layers Optimize dark-mode.gif

The CSS ended up with some duplication due to the need to handle both the media preference and the explicit user selection. We fixed that with Cog.

# 4:05 pm / css, coding-agents, ai-assisted-programming, claude, claude-code, design, llms, ai, generative-ai

10 Years of Let’s Encrypt (via) Internet Security Research Group co-founder and Executive Director Josh Aas:

On September 14, 2015, our first publicly-trusted certificate went live. [...] Today, Let’s Encrypt is the largest certificate authority in the world in terms of certificates issued, the ACME protocol we helped create and standardize is integrated throughout the server ecosystem, and we’ve become a household name among system administrators. We’re closing in on protecting one billion web sites.

Their growth rate and numbers are wild:

In March 2016, we issued our one millionth certificate. Just two years later, in September 2018, we were issuing a million certificates every day. In 2020 we reached a billion total certificates issued and as of late 2025 we’re frequently issuing ten million certificates per day.

According to their stats the amount of Firefox traffic protected by HTTPS doubled from 39% at the start of 2016 to ~80% today. I think it's difficult to over-estimate the impact Let's Encrypt has had on the security of the web.

# 12:34 am / https, security

Dec. 9, 2025

Devstral 2. Two new models from Mistral today: Devstral 2 and Devstral Small 2 - both focused on powering coding agents such as Mistral's newly released Mistral Vibe which I wrote about earlier today.

  • Devstral 2: SOTA open model for code agents with a fraction of the parameters of its competitors and achieving 72.2% on SWE-bench Verified.
  • Up to 7x more cost-efficient than Claude Sonnet at real-world tasks.

Devstral 2 is a 123B model released under a janky license - it's "modified MIT" where the modification is:

You are not authorized to exercise any rights under this license if the global consolidated monthly revenue of your company (or that of your employer) exceeds $20 million (or its equivalent in another currency) for the preceding month. This restriction in (b) applies to the Model and any derivatives, modifications, or combined works based on it, whether provided by Mistral AI or by a third party. [...]

Mistral Small 2 is under a proper Apache 2 license with no weird strings attached. It's a 24B model which is 51.6GB on Hugging Face and should quantize to significantly less.

I tried out the larger model via my llm-mistral plugin like this:

llm install llm-mistral
llm mistral refresh
llm -m mistral/devstral-2512 "Generate an SVG of a pelican riding a bicycle"

Bicycle looks a bit like a cybertruck

For a ~120B model that one is pretty good!

Here's the same prompt with -m mistral/labs-devstral-small-2512 for the API hosted version of Devstral Small 2:

A small white pelican on what looks more like a child's cart.

Again, a decent result given the small parameter size. For comparison, here's what I got for the 24B Mistral Small 3.2 earlier this year.

# 11:58 pm / ai, generative-ai, llms, llm, mistral, pelican-riding-a-bicycle, llm-release, janky-licenses

Under the hood of Canada Spends with Brendan Samek

Visit Under the hood of Canada Spends with Brendan Samek

I talked to Brendan Samek about Canada Spends, a project from Build Canada that makes Canadian government financial data accessible and explorable using a combination of Datasette, a neat custom frontend, Ruby ingestion scripts, sqlite-utils and pieces of LLM-powered PDF extraction.

[... 561 words]

Agentic AI Foundation. Announced today as a new foundation under the parent umbrella of the Linux Foundation (see also the OpenJS Foundation, Cloud Native Computing Foundation, OpenSSF and many more).

The AAIF was started by a heavyweight group of "founding platinum members" ($350,000): AWS, Anthropic, Block, Bloomberg, Cloudflare, Google, Microsoft, and OpenAI. The stated goal is to provide "a neutral, open foundation to ensure agentic AI evolves transparently and collaboratively".

Anthropic have donated Model Context Protocol to the new foundation, OpenAI donated AGENTS.md, Block donated goose (their open source, extensible AI agent).

Personally the project I'd like to see most from an initiative like this one is a clear, community-managed specification for the OpenAI Chat Completions JSON API - or a close equivalent. There are dozens of slightly incompatible implementations of that not-quite-specification floating around already, it would be great to have a written spec accompanied by a compliance test suite.

# 10:24 pm / open-source, standards, ai, openai, llms, anthropic, ai-agents, model-context-protocol

mistralai/mistral-vibe. Here's the Apache 2.0 licensed source code for Mistral's new "Vibe" CLI coding agent, released today alongside Devstral 2.

It's a neat implementation of the now standard terminal coding agent pattern, built in Python on top of Pydantic and Rich/Textual (here are the dependencies.) Gemini CLI is TypeScript, Claude Code is closed source (TypeScript, now on top of Bun), OpenAI's Codex CLI is Rust. OpenHands is the other major Python coding agent I know of, but I'm likely missing some others. (UPDATE: Kimi CLI is another open source Apache 2 Python one.)

The Vibe source code is pleasant to read and the crucial prompts are neatly extracted out into Markdown files. Some key places to look:

The Python implementations of those tools can be found here.

I tried it out and had it build me a Space Invaders game using three.js with the following prompt:

make me a space invaders game as HTML with three.js loaded from a CDN

Animated screenshot demo of Mistral Vibe running in a terminal. The text reads: I've created a Space Invaders game using HTML and Three. js loaded from a CDN. The game is now available in the file space_invaders.html in your current directory. Here's how to play: 1. Open the space_invaders.html file in a web browser 2. Use the left and right arrow keys to move your player (green rectangle) 3. Press the spacebar to shoot at the invaders (red rectangles) 4. Try to get the highest score before the invaders reach you or hit you with their bullets The game features: © Player movement with arrow keys © Shooting mechanics with spacebar © Enemy invaders that move back and forth © Collision detection « Score tracking * Game over screen © Increasing difficulty Writing file (64s esc to interrupt) »» auto-approve on (shift-tab to toggle) - 7% of 100k tokens

Here's the source code and the live game (hosted in my new space-invaders-by-llms repo). It did OK.

# 8:19 pm / python, ai, prompt-engineering, generative-ai, llms, textual, ai-assisted-programming, mistral, pydantic, vibe-coding, coding-agents, system-prompts, space-invaders

I found the problem and it's really bad. Looking at your log, here's the catastrophic command that was run:

rm -rf tests/ patches/ plan/ ~/

See that ~/ at the end? That's your entire home directory. The Claude Code instance accidentally included ~/ in the deletion command.

Claude, after Claude Code deleted most of a user's Mac

# 12:04 pm / ai-ethics, coding-agents, claude, claude-code, generative-ai, ai, llms

Prediction: AI will make formal verification go mainstream (via) Martin Kleppmann makes the case for formal verification languages (things like Dafny, Nagini, and Verus) to finally start achieving more mainstream usage. Code generated by LLMs can benefit enormously from more robust verification, and LLMs themselves make these notoriously difficult systems easier to work with.

The paper Can LLMs Enable Verification in Mainstream Programming? by JetBrains Research in March 2025 found that Claude 3.5 Sonnet saw promising results for the three languages I listed above.

# 3:11 am / programming-languages, ai, generative-ai, llms, ai-assisted-programming, martin-kleppmann

Deprecations via warnings don’t work for Python libraries (via) Seth Larson reports that urllib3 2.6.0 released on the 5th of December and finally removed the HTTPResponse.getheaders() and HTTPResponse.getheader(name, default) methods, which have been marked as deprecated via warnings since v2.0.0 in April 2023. They had to add them back again in a hastily released 2.6.1 a few days later when it turned out major downstream dependents such as kubernetes-client and fastly-py still hadn't upgraded.

Seth says:

My conclusion from this incident is that DeprecationWarning in its current state does not work for deprecating APIs, at least for Python libraries. That is unfortunate, as DeprecationWarning and the warnings module are easy-to-use, language-"blessed", and explicit without impacting users that don't need to take action due to deprecations.

On Lobste.rs James Bennett advocates for watching for warnings more deliberately:

Something I always encourage people to do, and try to get implemented anywhere I work, is running Python test suites with -Wonce::DeprecationWarning. This doesn't spam you with noise if a deprecated API is called a lot, but still makes sure you see the warning so you know there's something you need to fix.

I didn't know about the -Wonce option - the documentation describes that as "Warn once per Python process".

# 1:13 am / james-bennett, open-source, python, seth-michael-larson

Dec. 8, 2025

Niche Museums: The Museum of Jurassic Technology. I finally got to check off the museum that's been top of my want-to-go list since I first started documenting niche museums I've been to back in 2019.

The Museum of Jurassic Technology opened in Culver City, Los Angeles in 1988 and has been leaving visitors confused as to what's real and what isn't for nearly forty years.

# 3:16 am / museums

Dec. 7, 2025

Now I want to talk about how they're selling AI. The growth narrative of AI is that AI will disrupt labor markets. I use "disrupt" here in its most disreputable, tech bro sense.

The promise of AI – the promise AI companies make to investors – is that there will be AIs that can do your job, and when your boss fires you and replaces you with AI, he will keep half of your salary for himself, and give the other half to the AI company.

That's it.

That's the $13T growth story that MorganStanley is telling. It's why big investors and institutionals are giving AI companies hundreds of billions of dollars. And because they are piling in, normies are also getting sucked in, risking their retirement savings and their family's financial security.

Cory Doctorow, The Reverse Centaur’s Guide to Criticizing AI

# 9:28 pm / cory-doctorow, ai-ethics, ai

Using LLMs at Oxide (via) Thoughtful guidance from Bryan Cantrill, who evaluates applications of LLMs against Oxide's core values of responsibility, rigor, empathy, teamwork, and urgency.

# 9:28 pm / ai, generative-ai, llms, oxide, bryan-cantrill

Highlights

Monthly briefing

Sponsor me for $10/month and get a curated email digest of the month's most important LLM developments.

Pay me to send you less!

Sponsor & subscribe