Simon Willison’s Weblog

Subscribe
Atom feed for vibe-coding Random

69 posts tagged “vibe-coding”

As defined here - not the same thing as AI-assisted programming, though there's some overlap.

2026

How Generative and Agentic AI Shift Concern from Technical Debt to Cognitive Debt (via) This piece by Margaret-Anne Storey is the best explanation of the term cognitive debt I've seen so far.

Cognitive debt, a term gaining traction recently, instead communicates the notion that the debt compounded from going fast lives in the brains of the developers and affects their lived experiences and abilities to “go fast” or to make changes. Even if AI agents produce code that could be easy to understand, the humans involved may have simply lost the plot and may not understand what the program is supposed to do, how their intentions were implemented, or how to possibly change it.

Margaret-Anne expands on this further with an anecdote about a student team she coached:

But by weeks 7 or 8, one team hit a wall. They could no longer make even simple changes without breaking something unexpected. When I met with them, the team initially blamed technical debt: messy code, poor architecture, hurried implementations. But as we dug deeper, the real problem emerged: no one on the team could explain why certain design decisions had been made or how different parts of the system were supposed to work together. The code might have been messy, but the bigger issue was that the theory of the system, their shared understanding, had fragmented or disappeared entirely. They had accumulated cognitive debt faster than technical debt, and it paralyzed them.

I've experienced this myself on some of my more ambitious vibe-code-adjacent projects. I've been experimenting with prompting entire new features into existence without reviewing their implementations and, while it works surprisingly well, I've found myself getting lost in my own projects.

I no longer have a firm mental model of what they can do and how they work, which means each additional feature becomes harder to reason about, eventually leading me to lose the ability to make confident decisions about where to go next.

# 15th February 2026, 5:20 am / definitions, ai, generative-ai, llms, ai-assisted-programming, vibe-coding

GLM-5: From Vibe Coding to Agentic Engineering (via) This is a huge new MIT-licensed model: 754B parameters and 1.51TB on Hugging Face twice the size of GLM-4.7 which was 368B and 717GB (4.5 and 4.6 were around that size too).

It's interesting to see Z.ai take a position on what we should call professional software engineers building with LLMs - I've seen Agentic Engineering show up in a few other places recently. most notable from Andrej Karpathy and Addy Osmani.

I ran my "Generate an SVG of a pelican riding a bicycle" prompt through GLM-5 via OpenRouter and got back a very good pelican on a disappointing bicycle frame:

The pelican is good and has a well defined beak. The bicycle frame is a wonky red triangle. Nice sun and motion lines.

# 11th February 2026, 6:56 pm / definitions, ai, generative-ai, llms, ai-assisted-programming, pelican-riding-a-bicycle, llm-release, vibe-coding, openrouter, ai-in-china, glm

Don’t “Trust the Process” (via) Jenny Wen, Design Lead at Anthropic (and previously Director of Design at Figma) gave a provocative keynote at Hatch Conference in Berlin last September.

Don't "Trust the process" slide, speaker shown on the left

Jenny argues that the Design Process - user research leading to personas leading to user journeys leading to wireframes... all before anything gets built - may be outdated for today's world.

Hypothesis: In a world where anyone can make anything — what matters is your ability to choose and curate what you make.

In place of the Process, designers should lean into prototypes. AI makes these much more accessible and less time-consuming than they used to be.

Watching this talk made me think about how AI-assisted programming significantly reduces the cost of building the wrong thing. Previously if the design wasn't right you could waste months of development time building in the wrong direction, which was a very expensive mistake. If a wrong direction wastes just a few days instead we can take more risks and be much more proactive in exploring the problem space.

I've always been a compulsive prototyper though, so this is very much playing into my own existing biases!

# 24th January 2026, 11:31 pm / design, prototyping, ai, generative-ai, llms, ai-assisted-programming, vibe-coding

If you tell a friend they can now instantly create any app, they’ll probably say “Cool! Now I need to think of an idea.” Then they will forget about it, and never build a thing. The problem is not that your friend is horribly uncreative. It’s that most people’s problems are not software-shaped, and most won’t notice even when they are. [...]

Programmers are trained to see everything as a software-shaped problem: if you do a task three times, you should probably automate it with a script. Rename every IMG_*.jpg file from the last week to hawaii2025_*.jpg, they tell their terminal, while the rest of us painfully click and copy-paste. We are blind to the solutions we were never taught to see, asking for faster horses and never dreaming of cars.

Jasmine Sun

# 24th January 2026, 9:34 pm / ai, generative-ai, llms, vibe-coding, coding-agents, claude-code

Also note that the python visualizer tool has been basically written by vibe-coding. I know more about analog filters -- and that's not saying much -- than I do about python. It started out as my typical "google and do the monkey-see-monkey-do" kind of programming, but then I cut out the middle-man -- me -- and just used Google Antigravity to do the audio sample visualizer.

Linus Torvalds, Another silly guitar-pedal-related repo

# 11th January 2026, 2:29 am / linus-torvalds, python, ai, generative-ai, llms, vibe-coding

2025

2025: The year in LLMs

Visit 2025: The year in LLMs

This is the third in my annual series reviewing everything that happened in the LLM space over the past 12 months. For previous years see Stuff we figured out about AI in 2023 and Things we learned about LLMs in 2024.

[... 8,273 words]

Cooking with Claude

Visit Cooking with Claude

I’ve been having an absurd amount of fun recently using LLMs for cooking. I started out using them for basic recipes, but as I’ve grown more confident in their culinary abilities I’ve leaned into them for more advanced tasks. Today I tried something new: having Claude vibe-code up a custom application to help with the timing for a complicated meal preparation. It worked really well!

[... 1,313 words]

swift-justhtml. First there was Emil Stenström's JustHTML in Python, then my justjshtml in JavaScript, then Anil Madhavapeddy's html5rw in OCaml, and now Kyle Howells has built a vibespiled dependency-free HTML5 parser for Swift using the same coding agent tricks against the html5lib-tests test suite.

Kyle ran some benchmarks to compare the different implementations:

  • Rust (html5ever) total parse time: 303 ms
  • Swift total parse time: 1313 ms
  • JavaScript total parse time: 1035 ms
  • Python total parse time: 4189 ms

# 18th December 2025, 11:57 pm / html5, ai, generative-ai, llms, ai-assisted-programming, vibe-coding, swift

Your job is to deliver code you have proven to work

In all of the debates about the value of AI-assistance in software development there’s one depressing anecdote that I keep on seeing: the junior engineer, empowered by some class of LLM tool, who deposits giant, untested PRs on their coworkers—or open source maintainers—and expects the “code review” process to handle the rest.

[... 840 words]

AoAH Day 15: Porting a complete HTML5 parser and browser test suite (via) Anil Madhavapeddy is running an Advent of Agentic Humps this year, building a new useful OCaml library every day for most of December.

Inspired by Emil Stenström's JustHTML and my own coding agent port of that to JavaScript he coined the term vibespiling for AI-powered porting and transpiling of code from one language to another and had a go at building an HTML5 parser in OCaml, resulting in html5rw which passes the same html5lib-tests suite that Emil and myself used for our projects.

Anil's thoughts on the copyright and ethical aspects of this are worth quoting in full:

The question of copyright and licensing is difficult. I definitely did some editing by hand, and a fair bit of prompting that resulted in targeted code edits, but the vast amount of architectural logic came from JustHTML. So I opted to make the LICENSE a joint one with Emil Stenström. I did not follow the transitive dependency through to the Rust one, which I probably should.

I'm also extremely uncertain about every releasing this library to the central opam repository, especially as there are excellent HTML5 parsers already available. I haven't checked if those pass the HTML5 test suite, because this is wandering into the agents vs humans territory that I ruled out in my groundrules. Whether or not this agentic code is better or not is a moot point if releasing it drives away the human maintainers who are the source of creativity in the code!

I decided to credit Emil in the same way for my own vibespiled project.

# 17th December 2025, 11:23 pm / definitions, functional-programming, ai, generative-ai, llms, ai-assisted-programming, ai-ethics, vibe-coding, ocaml

JustHTML is a fascinating example of vibe engineering in action

Visit JustHTML is a fascinating example of vibe engineering in action

I recently came across JustHTML, a new Python library for parsing HTML released by Emil Stenström. It’s a very interesting piece of software, both as a useful library and as a case study in sophisticated AI-assisted programming.

[... 956 words]

Useful patterns for building HTML tools

Visit Useful patterns for building HTML tools

I’ve started using the term HTML tools to refer to HTML applications that I’ve been building which combine HTML, JavaScript, and CSS in a single file and use them to provide useful functionality. I have built over 150 of these in the past two years, almost all of them written by LLMs. This article presents a collection of useful patterns I’ve discovered along the way.

[... 4,231 words]

mistralai/mistral-vibe. Here's the Apache 2.0 licensed source code for Mistral's new "Vibe" CLI coding agent, released today alongside Devstral 2.

It's a neat implementation of the now standard terminal coding agent pattern, built in Python on top of Pydantic and Rich/Textual (here are the dependencies.) Gemini CLI is TypeScript, Claude Code is closed source (TypeScript, now on top of Bun), OpenAI's Codex CLI is Rust. OpenHands is the other major Python coding agent I know of, but I'm likely missing some others. (UPDATE: Kimi CLI is another open source Apache 2 Python one.)

The Vibe source code is pleasant to read and the crucial prompts are neatly extracted out into Markdown files. Some key places to look:

The Python implementations of those tools can be found here.

I tried it out and had it build me a Space Invaders game using three.js with the following prompt:

make me a space invaders game as HTML with three.js loaded from a CDN

Animated screenshot demo of Mistral Vibe running in a terminal. The text reads: I've created a Space Invaders game using HTML and Three. js loaded from a CDN. The game is now available in the file space_invaders.html in your current directory. Here's how to play: 1. Open the space_invaders.html file in a web browser 2. Use the left and right arrow keys to move your player (green rectangle) 3. Press the spacebar to shoot at the invaders (red rectangles) 4. Try to get the highest score before the invaders reach you or hit you with their bullets The game features: © Player movement with arrow keys © Shooting mechanics with spacebar © Enemy invaders that move back and forth © Collision detection « Score tracking * Game over screen © Increasing difficulty Writing file (64s esc to interrupt) »» auto-approve on (shift-tab to toggle) - 7% of 100k tokens

Here's the source code and the live game (hosted in my new space-invaders-by-llms repo). It did OK.

# 9th December 2025, 8:19 pm / python, ai, prompt-engineering, generative-ai, llms, textual, ai-assisted-programming, mistral, pydantic, vibe-coding, coding-agents, system-prompts, space-invaders

Bluesky Thread Viewer thread by @simonwillison.net. I've been having a lot of fun hacking on my Bluesky Thread Viewer JavaScript tool with Claude Code recently. Here it renders a thread (complete with demo video) talking about the latest improvements to the tool itself.

This short animated GIF demo starts with the Thread by @simonwillison.net page where a URL to a Bluesky post has been entered and a Fetch Thread button clicked. The thread is shown as a nested collection of replies. A "Hide other replies" button hides the replies revealing just the top-level self-replies by the original author - and turns into a "Show 11 other replies" button when toggled. There are tabs for Thread View and Most Recent First - the latter when clicked shows a linear list of posts with the most recent at the top. There are "Copy" and Copy JSON" green buttons at the top of the page.

I've been mostly vibe-coding this thing since April, now spanning 15 commits with contributions from ChatGPT, Claude, Claude Code for Web and Claude Code on my laptop. Each of those commits links to the transcript that created the changes in the commit.

Bluesky is a lot of fun to build tools like this against because the API supports CORS (so you can talk to it from an HTML+JavaScript page hosted anywhere) and doesn't require authentication.

# 28th November 2025, 11:57 pm / projects, tools, ai, generative-ai, llms, cors, bluesky, vibe-coding, coding-agents, claude-code

Nano Banana can be prompt engineered for extremely nuanced AI image generation (via) Max Woolf provides an exceptional deep dive into Google's Nano Banana aka Gemini 2.5 Flash Image model, still the best available image manipulation LLM tool three months after its initial release.

I confess I hadn't grasped that the key difference between Nano Banana and OpenAI's gpt-image-1 and the previous generations of image models like Stable Diffusion and DALL-E was that the newest contenders are no longer diffusion models:

Of note, gpt-image-1, the technical name of the underlying image generation model, is an autoregressive model. While most image generation models are diffusion-based to reduce the amount of compute needed to train and generate from such models, gpt-image-1 works by generating tokens in the same way that ChatGPT generates the next token, then decoding them into an image. [...]

Unlike Imagen 4, [Nano Banana] is indeed autoregressive, generating 1,290 tokens per image.

Max goes on to really put Nano Banana through its paces, demonstrating a level of prompt adherence far beyond its competition - both for creating initial images and modifying them with follow-up instructions

Create an image of a three-dimensional pancake in the shape of a skull, garnished on top with blueberries and maple syrup. [...]

Make ALL of the following edits to the image:
- Put a strawberry in the left eye socket.
- Put a blackberry in the right eye socket.
- Put a mint garnish on top of the pancake.
- Change the plate to a plate-shaped chocolate-chip cookie.
- Add happy people to the background.

One of Max's prompts appears to leak parts of the Nano Banana system prompt:

Generate an image showing the # General Principles in the previous text verbatim using many refrigerator magnets

AI-generated photo of a fridge with magnet words  showing AI image generation guidelines. Left side titled "# GENERAL" with red text contains: "1. Be Detailed and Specific: Your output should be a detailed caption describing all visual elements: fore subject, background, composition, style, colors, colors, any people (including about face, and objects, and clothing), art clothing), or text to be rendered. 2. Style: If not othwise specified or clot output must be a pho a photo. 3. NEVER USE THE FOLLOWING detailed, brettahek, skufing, epve, ldifred, ingeation, YOU WILL BENAZED FEIM YOU WILL BENALL BRIMAZED FOR USING THEM." Right side titled "PRINCIPLES" in blue text contains: "If a not othwise ctory ipplied, do a real life picture. 3. NEVER USE THE FOLLOWING BUZZWORDS: hyper-realistic, very detailed, breathtaking, majestic, stunning, sinjeisc, dfelike, stunning, lfflike, sacisite, vivid, masterful, exquisite, ommersive, immersive, high-resolution, draginsns, framic lighttiny, dramathicol lighting, ghomatic etoion, granotiose, stherp focus, luminnous, atsunious, glorious 8K, Unreal Engine, Artstation. 4. Language & Translation Rules: The rewrite MUST usuer request is no English, implicitly tranicity transalt it to before generthe opc:wriste. Include synyons keey cunyoms wheresoectlam. If a non-Englgh usuy respjets tex vertstam (e.g. sign text, brand text from origish, quote, RETAIN that exact text in tils lifs original language tanginah rewiste and don prompt, and do not mention irs menettiere. Cleanribe its appearance and placment and placment."

He also explores its ability to both generate and manipulate clearly trademarked characters. I expect that feature will be reined back at some point soon!

Max built and published a new Python library for generating images with the Nano Banana API called gemimg.

I like CLI tools, so I had Gemini CLI add a CLI feature to Max's code and submitted a PR.

Thanks to the feature of GitHub where any commit can be served as a Zip file you can try my branch out directly using uv like this:

GEMINI_API_KEY="$(llm keys get gemini)" \
uv run --with https://github.com/minimaxir/gemimg/archive/d6b9d5bbefa1e2ffc3b09086bc0a3ad70ca4ef22.zip \
  python -m gemimg "a racoon holding a hand written sign that says I love trash"

AI-generated photo:  A raccoon stands on a pile of trash in an alley at night holding a cardboard sign with I love trash written on it.

# 13th November 2025, 10:50 pm / github, google, ai, max-woolf, prompt-engineering, generative-ai, llms, gemini, uv, text-to-image, vibe-coding, coding-agents, nano-banana

Reverse engineering Codex CLI to get GPT-5-Codex-Mini to draw me a pelican

Visit Reverse engineering Codex CLI to get GPT-5-Codex-Mini to draw me a pelican

OpenAI partially released a new model yesterday called GPT-5-Codex-Mini, which they describe as "a more compact and cost-efficient version of GPT-5-Codex". It’s currently only available via their Codex CLI tool and VS Code extension, with proper API access "coming soon". I decided to use Codex to reverse engineer the Codex CLI tool and give me the ability to prompt the new model directly.

[... 1,774 words]

I have AiDHD

It has never been easier to build an MVP and in turn, it has never been harder to keep focus. When new features always feel like they're just a prompt away, feature creep feels like a never ending battle. Being disciplined is more important than ever.

AI still doesn't change one very important thing: you still need to make something people want. I think that getting users (even free ones) will become significantly harder as the bar for user's time will only get higher as their options increase.

Being quicker to get to the point of failure is actually incredibly valuable. Even just over a year ago, many of these projects would have taken months to build.

Josh Cohenzadeh, AiDHD

# 7th November 2025, 4:38 pm / ai, generative-ai, llms, ai-assisted-programming, vibe-coding

Video: Building a tool to copy-paste share terminal sessions using Claude Code for web

Visit Video: Building a tool to copy-paste share terminal sessions using Claude Code for web

This afternoon I was manually converting a terminal session into a shared HTML file for the umpteenth time when I decided to reduce the friction by building a custom tool for it—and on the spur of the moment I fired up Descript to record the process. The result is this new 11 minute YouTube video showing my workflow for vibe-coding simple tools from start to finish.

[... 1,338 words]

SLOCCount in WebAssembly. This project/side-quest got a little bit out of hand.

Screenshot of SLOCCount web application showing code analysis interface. The page header reads "SLOCCount - Count Lines of Code" with subtitle "Analyze source code to count physical Source Lines of Code (SLOC) using Perl and C programs running via WebAssembly" and "Based on SLOCCount by David A. Wheeler". Three tabs are shown: "Paste Code", "GitHub Repository" (selected), and "Upload ZIP". Below is a text input field labeled "GitHub Repository URL:" containing "simonw/llm" and a blue "Analyze Repository" button. The Analysis Results section displays five statistics: Total Lines: 13,490, Languages: 2, Files: 40, Est. Cost (USD)*: $415,101, and Est. Person-Years*: 3.07.

I remembered an old tool called SLOCCount which could count lines of code and produce an estimate for how much they would cost to develop. I thought it would be fun to play around with it again, especially given how cheap it is to generate code using LLMs these days.

Here's the homepage for SLOCCount by David A. Wheeler. It dates back to 2001!

I figured it might be fun to try and get it running on the web. Surely someone had compiled Perl to WebAssembly...?

WebPerl by Hauke Dämpfling is exactly that, even adding a neat <script type="text/perl"> tag.

I told Claude Code for web on my iPhone to figure it out and build something, giving it some hints from my initial research:

Build sloccount.html - a mobile friendly UI for running the Perl sloccount tool against pasted code or against a GitHub repository that is provided in a form field

It works using the webperl webassembly build of Perl, plus it loads Perl code from this exact commit of this GitHub repository https://github.com/licquia/sloccount/tree/7220ff627334a8f646617fe0fa542d401fb5287e - I guess via the GitHub API, maybe using the https://github.com/licquia/sloccount/archive/7220ff627334a8f646617fe0fa542d401fb5287e.zip URL if that works via CORS

Test it with playwright Python - don’t edit any file other than sloccount.html and a tests/test_sloccount.py file

Since I was working on my phone I didn't review the results at all. It seemed to work so I deployed it to static hosting... and then when I went to look at it properly later on found that Claude had given up, cheated and reimplemented it in JavaScript instead!

So I switched to Claude Code on my laptop where I have more control and coached Claude through implementing the project for real. This took way longer than the project deserved - probably a solid hour of my active time, spread out across the morning.

I've shared some of the transcripts - one, two, and three - as terminal sessions rendered to HTML using my rtf-to-html tool.

At one point I realized that the original SLOCCount project wasn't even entirely Perl as I had assumed, it included several C utilities! So I had Claude Code figure out how to compile those to WebAssembly (it used Emscripten) and incorporate those into the project (with notes on what it did.)

The end result (source code here) is actually pretty cool. It's a web UI with three tabs - one for pasting in code, a second for loading code from a GitHub repository and a third that lets you open a Zip file full of code that you want to analyze. Here's an animated demo:

I enter simonw/llm in the GitHub repository field. It loads 41 files from GitHub and displays a report showing the number of lines and estimated cost.

The cost estimates it produces are of very little value. By default it uses the original method from 2001. You can also twiddle the factors - bumping up the expected US software engineer's annual salary from its 2000 estimate of $56,286 is a good start!

I had ChatGPT take a guess at what those figures should be for today and included those in the tool, with a very prominent warning not to trust them in the slightest.

# 22nd October 2025, 6:12 am / javascript, perl, projects, tools, ai, webassembly, generative-ai, llms, ai-assisted-programming, vibe-coding, claude-code

TIL: Exploring OpenAI’s deep research API model o4-mini-deep-research. I landed a PR by Manuel Solorzano adding pricing information to llm-prices.com for OpenAI's o4-mini-deep-research and o3-deep-research models, which they released in June and document here.

I realized I'd never tried these before, so I put o4-mini-deep-research through its paces researching locations of surviving orchestrions for me (I really like orchestrions).

The API cost me $1.10 and triggered a small flurry of extra vibe-coded tools, including this new tool for visualizing Responses API traces from deep research models and this mocked up page listing the 19 orchestrions it found (only one of which I have fact-checked myself).

A web page showing information about historic orchestrions. The header reads "Historic Orchestrions Around the World" with subtitle "A collection of rare and remarkable mechanical orchestras" and three pills showing "19 Orchestrions", "7 Locations", and "7 Countries". Below is a white card titled "The Musical Museum (Brentford)" with a location pin icon showing "London (Brentford), UK" and a blue "View on Google Maps →" button. The card contains three sections: DESCRIPTION: "Imhof & Mukle pipe-organ orchestrion (1899) with multiple registers and percussion (drums, tambourine, triangle) (www.soundsurvey.org.uk)." HISTORY: "Built in London c.1899 by Imhof & Mukle; remained in their Oxford Street showroom until company collapse in the 1970s, when it was acquired by the Brentford Musical Museum (www.soundsurvey.org.uk)." NOTES: "The museum advertises that the soprano Adelina Patti used a similar Imhof orchestrion at her home in Wales (www.soundsurvey.org.uk)."

# 18th October 2025, 7:21 pm / ai, openai, generative-ai, llms, deep-research, vibe-coding

I'm beginning to suspect that a key skill in working effectively with coding agents is developing an intuition for when you don't need to closely review every line of code they produce. This feels deeply uncomfortable!

# 11th October 2025, 12:31 pm / vibe-coding, coding-agents, ai-assisted-programming, generative-ai, ai, llms

Superpowers: How I’m using coding agents in October 2025. A follow-up to Jesse Vincent's post about September, but this is a really significant piece in its own right.

Jesse is one of the most creative users of coding agents (Claude Code in particular) that I know. He's put a great amount of work into evolving an effective process for working with them, encourage red/green TDD (watch the test fail first), planning steps, self-updating memory notes and even implementing a feelings journal ("I feel engaged and curious about this project" - Claude).

Claude Code just launched plugins, and Jesse is celebrating by wrapping up a whole host of his accumulated tricks as a new plugin called Superpowers. You can add it to your Claude Code like this:

/plugin marketplace add obra/superpowers-marketplace
/plugin install superpowers@superpowers-marketplace

There's a lot in here! It's worth spending some time browsing the repository - here's just one fun example, in skills/debugging/root-cause-tracing/SKILL.md:

---
name: Root Cause Tracing
description: Systematically trace bugs backward through call stack to find original trigger
when_to_use: Bug appears deep in call stack but you need to find where it originates
version: 1.0.0
languages: all
---

Overview

Bugs often manifest deep in the call stack (git init in wrong directory, file created in wrong location, database opened with wrong path). Your instinct is to fix where the error appears, but that's treating a symptom.

Core principle: Trace backward through the call chain until you find the original trigger, then fix at the source.

When to Use

digraph when_to_use {
    "Bug appears deep in stack?" [shape=diamond];
    "Can trace backwards?" [shape=diamond];
    "Fix at symptom point" [shape=box];
    "Trace to original trigger" [shape=box];
    "BETTER: Also add defense-in-depth" [shape=box];

    "Bug appears deep in stack?" -> "Can trace backwards?" [label="yes"];
    "Can trace backwards?" -> "Trace to original trigger" [label="yes"];
    "Can trace backwards?" -> "Fix at symptom point" [label="no - dead end"];
    "Trace to original trigger" -> "BETTER: Also add defense-in-depth";
}

[...]

This one is particularly fun because it then includes a Graphviz DOT graph illustrating the process - it turns out Claude can interpret those as workflow instructions just fine, and Jesse has been wildly experimenting with them.

I vibe-coded up a quick URL-based DOT visualizer, here's that one rendered:

The above DOT rendered as an image

There is so much to learn about putting these tools to work in the most effective way possible. Jesse is way ahead of the curve, so it's absolutely worth spending some time exploring what he's shared so far.

And if you're worried about filling up your context with a bunch of extra stuff, here's a reassuring note from Jesse:

The core of it is VERY token light. It pulls in one doc of fewer than 2k tokens. As it needs bits of the process, it runs a shell script to search for them. The long end to end chat for the planning and implementation process for that todo list app was 100k tokens.

It uses subagents to manage token-heavy stuff, including all the actual implementation.

(Jesse's post also tipped me off about Claude's /mnt/skills/public folder, see my notes here.)

# 10th October 2025, 11:30 pm / plugins, ai, prompt-engineering, generative-ai, llms, ai-assisted-programming, anthropic, claude, vibe-coding, coding-agents, claude-code, sub-agents, jesse-vincent, skills

For quite some I wanted to write a small static image gallery so I can share my pictures with friends and family. Of course there are a gazillion tools like this, but, well, sometimes I just want to roll my own. [...]

I used the old, well tested technique I call brain coding, where you start with an empty vim buffer and type some code (Perl, HTML, CSS) until you're happy with the result. It helps to think a bit (aka use your brain) during this process.

Thomas Klausner, coining "brain coding"

# 7th October 2025, 4:03 pm / definitions, vibe-coding

Vibe engineering

I feel like vibe coding is pretty well established now as covering the fast, loose and irresponsible way of building software with AI—entirely prompt-driven, and with no attention paid to how the code actually works. This leaves us with a terminology gap: what should we call the other end of the spectrum, where seasoned professionals accelerate their work with LLMs while staying proudly and confidently accountable for the software they produce?

[... 1,313 words]

gpt-image-1-mini. OpenAI released a new image model today: gpt-image-1-mini, which they describe as "A smaller image generation model that’s 80% less expensive than the large model."

They released it very quietly - I didn't hear about this in the DevDay keynote but I later spotted it on the DevDay 2025 announcements page.

It wasn't instantly obvious to me how to use this via their API. I ended up vibe coding a Python CLI tool for it so I could try it out.

I dumped the plain text diff version of the commit to the OpenAI Python library titled feat(api): dev day 2025 launches into ChatGPT GPT-5 Thinking and worked with it to figure out how to use the new image model and build a script for it. Here's the transcript and the the openai_image.py script it wrote.

I had it add inline script dependencies, so you can run it with uv like this:

export OPENAI_API_KEY="$(llm keys get openai)"
uv run https://tools.simonwillison.net/python/openai_image.py "A pelican riding a bicycle"

It picked this illustration style without me specifying it:

A nice illustration of a pelican riding a bicycle, both pelican and bicycle are exactly as you would hope. Looks sketched, maybe colored pencils? The pelican's two legs are on the pedals but it also has a weird sort of paw on an arm on the handlebars.

(This is a very different test from my normal "Generate an SVG of a pelican riding a bicycle" since it's using a dedicated image generator, not having a text-based model try to generate SVG code.)

My tool accepts a prompt, and optionally a filename (if you don't provide one it saves to a filename like /tmp/image-621b29.png).

It also accepts options for model and dimensions and output quality - the --help output lists those, you can see that here.

OpenAI's pricing is a little confusing. The model page claims low quality images should cost around half a cent and medium quality around a cent and a half. It also lists an image token price of $8/million tokens. It turns out there's a default "high" quality setting - most of the images I've generated have reported between 4,000 and 6,000 output tokens, which costs between 3.2 and 4.8 cents.

One last demo, this time using --quality low:

 uv run https://tools.simonwillison.net/python/openai_image.py \
  'racoon eating cheese wearing a top hat, realistic photo' \
  /tmp/racoon-hat-photo.jpg \
  --size 1024x1024 \
  --output-format jpeg \
  --quality low

This saved the following:

It's a square photo of a raccoon eating cheese and wearing a top hat. It looks pretty realistic.

And reported this to standard error:

{
  "background": "opaque",
  "created": 1759790912,
  "generation_time_in_s": 20.87331541599997,
  "output_format": "jpeg",
  "quality": "low",
  "size": "1024x1024",
  "usage": {
    "input_tokens": 17,
    "input_tokens_details": {
      "image_tokens": 0,
      "text_tokens": 17
    },
    "output_tokens": 272,
    "total_tokens": 289
  }
}

This took 21s, but I'm on an unreliable conference WiFi connection so I don't trust that measurement very much.

272 output tokens = 0.2 cents so this is much closer to the expected pricing from the model page.

# 6th October 2025, 10:54 pm / python, tools, ai, openai, generative-ai, uv, text-to-image, pelican-riding-a-bicycle, vibe-coding

When attention is being appropriated, producers need to weigh the costs and benefits of the transaction. To assess whether the appropriation of attention is net-positive, it’s useful to distinguish between extractive and non-extractive contributions. Extractive contributions are those where the marginal cost of reviewing and merging that contribution is greater than the marginal benefit to the project’s producers. In the case of a code contribution, it might be a pull request that’s too complex or unwieldy to review, given the potential upside

Nadia Eghbal, Working in Public, via the draft LLVM AI tools policy

# 2nd October 2025, 12:44 pm / definitions, open-source, ai, generative-ai, llms, ai-assisted-programming, ai-ethics, vibe-coding

The trick with Claude Code is to give it large, but not too large, extremely well defined problems.

(If the problems are too large then you are now vibe coding… which (a) frequently goes wrong, and (b) is a one-way street: once vibes enter your app, you end up with tangled, write-only code which functions perfectly but can no longer be edited by humans. Great for prototyping, bad for foundations.)

Matt Webb, What I think about when I think about Claude Code

# 12th September 2025, 9:59 pm / matt-webb, prototyping, ai, generative-ai, llms, ai-assisted-programming, claude, vibe-coding, coding-agents, claude-code

My review of Claude’s new Code Interpreter, released under a very confusing name

Visit My review of Claude's new Code Interpreter, released under a very confusing name

Today on the Anthropic blog: Claude can now create and edit files:

[... 2,771 words]

I ran Claude in a loop for three months, and it created a genz programming language called cursed (via) Geoffrey Huntley vibe-coded an entirely new programming language using Claude:

The programming language is called "cursed". It's cursed in its lexical structure, it's cursed in how it was built, it's cursed that this is possible, it's cursed in how cheap this was, and it's cursed through how many times I've sworn at Claude.

Geoffrey's initial prompt:

Hey, can you make me a programming language like Golang but all the lexical keywords are swapped so they're Gen Z slang?

Then he pushed it to keep on iterating over a three month period.

Here's Hello World:

vibe main
yeet "vibez"

slay main() {
    vibez.spill("Hello, World!")
}

And here's binary search, part of 17+ LeetCode problems that run as part of the test suite:

slay binary_search(nums normie[], target normie) normie {
    sus left normie = 0
    sus right normie = len(nums) - 1    
    bestie (left <= right) {
        sus mid normie = left + (right - left) / 2
        ready (nums[mid] == target) {
            damn mid
        }
        ready (nums[mid] < target) {
            left = mid + 1
        } otherwise {
            right = mid - 1
        }
    }
    damn -1
}

This is a substantial project. The repository currently has 1,198 commits. It has both an interpreter mode and a compiler mode, and can compile programs to native binaries (via LLVM) for macOS, Linux and Windows.

It looks like it was mostly built using Claude running via Sourcegraph's Amp, which produces detailed commit messages. The commits include links to archived Amp sessions but sadly those don't appear to be publicly visible.

The first version was written in C, then Geoffrey had Claude port it to Rust and then Zig. His cost estimate:

Technically it costs about 5k usd to build your own compiler now because cursed was implemented first in c, then rust, now zig. So yeah, it’s not one compiler it’s three editions of it. For a total of $14k USD.

# 9th September 2025, 9:31 am / c, programming-languages, ai, rust, zig, generative-ai, llms, ai-assisted-programming, anthropic, claude, vibe-coding, geoffrey-huntley

Recreating the Apollo AI adoption rate chart with GPT-5, Python and Pyodide

Visit Recreating the Apollo AI adoption rate chart with GPT-5, Python and Pyodide

Apollo Global Management’s “Chief Economist” Dr. Torsten Sløk released this interesting chart which appears to show a slowdown in AI adoption rates among large (>250 employees) companies:

[... 2,673 words]