Simon Willison on code-review

13 posts tagged “code-review”

2025

The cognitive debt of LLM-laden coding extends beyond disengagement of our craft. We’ve all heard the stories. Hyped up, vibed up, slop-jockeys with attention spans shorter than the framework-hopping JavaScript devs of the early 2010s, sling their sludge in pull requests and design docs, discouraging collaboration and disrupting teams. Code reviewing coworkers are rapidly losing their minds as they come to the crushing realization that they are now the first layer of quality control instead of one of the last. Asked to review; forced to pick apart. Calling out freshly added functions that are never called, hallucinated library additions, and obvious runtime or compilation errors. All while the author—who clearly only skimmed their “own” code—is taking no responsibility, going “whoopsie, Claude wrote that. Silly AI, ha-ha.”

— Simon Højberg, The Programmer Identity Crisis

# 8th October 2025, 6:08 pm / llms, generative-ai, ai, code-review, ai-ethics

Vibe engineering

I feel like vibe coding is pretty well established now as covering the fast, loose and irresponsible way of building software with AI—entirely prompt-driven, and with no attention paid to how the code actually works. This leaves us with a terminology gap: what should we call the other end of the spectrum, where seasoned professionals accelerate their work with LLMs while staying proudly and confidently accountable for the software they produce?

[... 1,313 words]

2:32 pm / 7th October 2025 / code-review, definitions, software-engineering, ai, generative-ai, llms, ai-assisted-programming, vibe-coding, coding-agents, parallel-agents

GPT‑5-Codex and upgrades to Codex. OpenAI half-released a new model today: GPT‑5-Codex, a fine-tuned GPT-5 variant explicitly designed for their various AI-assisted programming tools.

Update: OpenAI call it a "version of GPT-5", they don't explicitly describe it as a fine-tuned model. Calling it a fine-tune was my mistake here.

I say half-released because it's not yet available via their API, but they "plan to make GPT‑5-Codex available in the API soon".

I wrote about the confusing array of OpenAI products that share the name Codex a few months ago. This new model adds yet another, though at least "GPT-5-Codex" (using two hyphens) is unambiguous enough not to add to much more to the confusion.

At this point it's best to think of Codex as OpenAI's brand name for their coding family of models and tools.

The new model is already integrated into their VS Code extension, the Codex CLI and their Codex Cloud asynchronous coding agent. I'd been calling that last one "Codex Web" but I think Codex Cloud is a better name since it can also be accessed directly from their iPhone app.

Codex Cloud also has a new feature: you can configure it to automatically run code review against specific GitHub repositories (I found that option on chatgpt.com/codex/settings/code-review) and it will create a temporary container to use as part of those reviews. Here's the relevant documentation.

Some documented features of the new GPT-5-Codex model:

Specifically trained for code review, which directly supports their new code review feature.
"GPT‑5-Codex adapts how much time it spends thinking more dynamically based on the complexity of the task." Simple tasks (like "list files in this directory") should run faster. Large, complex tasks should use run for much longer - OpenAI report Codex crunching for seven hours in some cases!
Increased score on their proprietary "code refactoring evaluation" from 33.9% for GPT-5 (high) to 51.3% for GPT-5-Codex (high). It's hard to evaluate this without seeing the details of the eval but it does at least illustrate that refactoring performance is something they've focused on here.
"GPT‑5-Codex also shows significant improvements in human preference evaluations when creating mobile websites" - in the past I've habitually prompted models to "make it mobile-friendly", maybe I don't need to do that any more.
"We find that comments by GPT‑5-Codex are less likely to be incorrect or unimportant" - I originally misinterpreted this as referring to comments in code but it's actually about comments left on code reviews.

The system prompt for GPT-5-Codex in Codex CLI is worth a read. It's notably shorter than the system prompt for other models - here's a diff.

Here's the section of the updated system prompt that talks about comments:

Add succinct code comments that explain what is going on if code is not self-explanatory. You should not add comments like "Assigns the value to the variable", but a brief comment might be useful ahead of a complex code block that the user would otherwise have to spend time parsing out. Usage of these comments should be rare.

Theo Browne has a video review of the model and accompanying features. He was generally impressed but noted that it was surprisingly bad at using the Codex CLI search tool to navigate code. Hopefully that's something that can fix with a system prompt update.

Finally, can it drew a pelican riding a bicycle? Without API access I instead got Codex Cloud to have a go by prompting:

Generate an SVG of a pelican riding a bicycle, save as pelican.svg

Here's the result:

it's a bit messy - the pelican is quite good and the bicycle is quite good but the pelican is stood overlapping the bicycle not riding it.

# 15th September 2025, 6:55 pm / code-review, ai, openai, generative-ai, llms, ai-assisted-programming, pelican-riding-a-bicycle, llm-release, coding-agents, async-coding-agents, gpt-5, codex-cli, theo-browne

Coding with LLMs in the summer of 2025 (an update) (via) Salvatore Sanfilippo describes his current AI-assisted development workflow. He's all-in on LLMs for code review, exploratory prototyping, pair-design and writing "part of the code under your clear specifications", but warns against leaning too hard on pure vibe coding:

But while LLMs can write part of a code base with success (under your strict supervision, see later), and produce a very sensible speedup in development (or, the ability to develop more/better in the same time used in the past — which is what I do), when left alone with nontrivial goals they tend to produce fragile code bases that are larger than needed, complex, full of local minima choices, suboptimal in many ways. Moreover they just fail completely when the task at hand is more complex than a given level.

There are plenty of useful tips in there, especially around carefully managing your context:

When your goal is to reason with an LLM about implementing or fixing some code, you need to provide extensive information to the LLM: papers, big parts of the target code base (all the code base if possible, unless this is going to make the context window so large than the LLM performances will be impaired). And a brain dump of all your understanding of what should be done.

Salvatore warns against relying too hard on tools which hide the context for you, like editors with integrated coding agents. He prefers pasting exactly what's needed into the LLM web interface - I share his preference there.

His conclusions here match my experience:

You will be able to do things that are otherwise at the borders of your knowledge / expertise while learning much in the process (yes, you can learn from LLMs, as you can learn from books or colleagues: it is one of the forms of education possible, a new one). Yet, everything produced will follow your idea of code and product, and will be of high quality and will not random fail because of errors and shortcomings introduced by the LLM. You will also retain a strong understanding of all the code written and its design.

# 21st July 2025, 2:42 am / code-review, salvatore-sanfilippo, ai, generative-ai, llms, ai-assisted-programming, vibe-coding

2022

The Perfect Commit

For the last few years I’ve been trying to center my work around creating what I consider to be the Perfect Commit. This is a single commit that contains all of the following:

[... 2,061 words]

8:41 pm / 29th October 2022 / code-review, definitions, git, github, software-engineering, github-issues

2019

In general, reviewers should favor approving a CL [code review] once it is in a state where it definitely improves the overall code health of the system being worked on, even if the CL isn’t perfect.

— Google Standard of Code Review

# 28th November 2019, 5:40 am / google, code-review

2018

In too many organizations, deploy code is a technical backwater, an accumulation of crufty scripts and glue code, forked gems and interns’ earnest attempts to hack up Capistrano. It usually gives off a strong whiff of “sloppily evolved from many 2 am patches with no code review”.

This is insane. Deploy software is the most important software you have. Treat it that way: recruit an owner, allocate real time for development and testing, bake in metrics and track them over time.

— Charity Majors

# 27th August 2018, 9 pm / deployment, charity-majors, code-review

2017

How to Do Code Reviews Like a Human. Some very well thought out advice on giving constructive reviews, including simple but effective language tricks for keeping suggestions positive and helpful.

# 13th October 2017, 4:36 am / code-review

2013

What is a good onboarding process for a new employee at a startup?

Have them deploy the site on their first day. This ensures they know how to...

[... 174 words]

12:51 pm / 13th August 2013 / code-review, startups, quora, hr

2012

What are the best practices for onboarding new software engineers?

I’m a huge fan of “deploy on the first day”—which forces the issue in getting all of their accounts set up, their development environment ready, showing them where source control is, how code review works and so on. Even if it’s just adding themselves to the about page or humans.txt file, it means they can come in on day two ready to get real work done.

[... 106 words]

3:17 pm / 14th November 2012 / code-review, programmers, software-engineering, startups, quora

2010

Design and code review requested for Django string signing / signed cookies. Do you know your way around web app security and cryptography (in particular signing things using hmac and sha1)? We’d appreciate your help reviewing the usage of these concepts in Django’s proposed string signing and signed cookie implementations.

# 4th January 2010, 1:24 pm / code-review, cryptography, django, hashing, hmac, python, security, sha1

2008

How to use Google Code’s code review tool. I missed this, but Google Code now has a pleasantly simple code review system built in to the source code browser. You can add comments to any changeset, including annotations attached to individual lines of code.

# 28th August 2008, 4:25 pm / code-review, google-code

2007

Review Board. VMWare release a slick looking Django-powered code review system, with hooks in to Subversion and Perforce.

# 31st May 2007, 8:32 am / code-review, django, perforce, subversion, vmware

Simon Willison’s Weblog