Simon Willison on parallel-agents

7 posts tagged “parallel-agents”

Running multiple coding agents in parallel

2025

I've been upgrading a ton of Datasette plugins recently for compatibility with the Datasette 1.0a20 release from last week - 35 so far.

A lot of the work is very repetitive so I've been outsourcing it to Codex CLI. Here's the recipe I've landed on:

codex exec --dangerously-bypass-approvals-and-sandbox \
'Run the command tadd and look at the errors and then
read ~/dev/datasette/docs/upgrade-1.0a20.md and apply
fixes and run the tests again and get them to pass.

Also delete the .github directory entirely and replace
it by running this:

cp -r ~/dev/ecosystem/datasette-os-info/.github .

Run a git diff against that to make sure it looks OK
- if there are any notable differences e.g. switching
from Twine to the PyPI uploader or deleting code that
does a special deploy or configures something like 
playwright include that in your final report.

If the project still uses setup.py then edit that new
test.yml and publish.yaml to mention setup.py not pyproject.toml

If this project has pyproject.toml make sure the license
line in that looks like this:

license = "Apache-2.0"

And remove any license thing from the classifiers= array

Update the Datasette dependency in pyproject.toml or
setup.py to "datasette>=1.0a21"

And make sure requires-python is >=3.10'

I featured a simpler version of this prompt in my Datasette plugin upgrade video, but I've expanded it quite a bit since then.

At one point I had six terminal windows open running this same prompt against six different repos - probably my most extreme case of parallel agents yet.

Animated GIF demo. Six terminal windows are arranged in a 3x2 grid, each one of them is running the above prompt and working its way through making modifications to one of six different projects: datasette-extract, datasette-create-view, datasette-write, datasette-secrets, datasette-public, and datasette-write-ui.

Here are the six resulting commits from those six coding agent sessions:

# 11th November 2025, 10:52 pm / ai, llms, codex-cli, prompt-engineering, coding-agents, ai-assisted-programming, datasette, generative-ai, parallel-agents

Composer: Building a fast frontier model with RL (via) Cursor released Cursor 2.0 today, with a refreshed UI focused on agentic coding (and running agents in parallel) and a new model that's unique to Cursor called Composer 1.

As far as I can tell there's no way to call the model directly via an API, so I fired up "Ask" mode in Cursor's chat side panel and asked it to "Generate an SVG of a pelican riding a bicycle":

Screenshot of Cursor 2 - In the chat panel I have asked the question and it spat out a bunch of SVG.

Here's the result:

The bicycle is levitating against a blue sky. The pelican looks a little bit more like a baby chicken but does at least have a long beak.

The notable thing about Composer-1 is that it is designed to be fast. The pelican certainly came back quickly, and in their announcement they describe it as being "4x faster than similarly intelligent models".

It's interesting to see Cursor investing resources in training their own code-specific model - similar to GPT-5-Codex or Qwen3-Coder. From their post:

Composer is a mixture-of-experts (MoE) language model supporting long-context generation and understanding. It is specialized for software engineering through reinforcement learning (RL) in a diverse range of development environments. [...]

Efficient training of large MoE models requires significant investment into building infrastructure and systems research. We built custom training infrastructure leveraging PyTorch and Ray to power asynchronous reinforcement learning at scale. We natively train our models at low precision by combining our MXFP8 MoE kernels with expert parallelism and hybrid sharded data parallelism, allowing us to scale training to thousands of NVIDIA GPUs with minimal communication cost. [...]

During RL, we want our model to be able to call any tool in the Cursor Agent harness. These tools allow editing code, using semantic search, grepping strings, and running terminal commands. At our scale, teaching the model to effectively call these tools requires running hundreds of thousands of concurrent sandboxed coding environments in the cloud.

One detail that's notably absent from their description: did they train the model from scratch, or did they start with an existing open-weights model such as something from Qwen or GLM?

Cursor researcher Sasha Rush has been answering questions on Hacker News, but has so far been evasive in answering questions about the base model. When directly asked "is Composer a fine tune of an existing open source base model?" they replied:

Our primary focus is on RL post-training. We think that is the best way to get the model to be a strong interactive agent.

Sasha did confirm that rumors of an earlier Cursor preview model, Cheetah, being based on a model by xAI's Grok were "Straight up untrue."

# 29th October 2025, 8:45 pm / ai, generative-ai, llms, ai-assisted-programming, pelican-riding-a-bicycle, llm-release, coding-agents, cursor, parallel-agents

A lot of people say AI will make us all "managers" or "editors"...but I think this is a dangerously incomplete view!

Personally, I'm trying to code like a surgeon.

A surgeon isn't a manager, they do the actual work! But their skills and time are highly leveraged with a support team that handles prep, secondary tasks, admin. The surgeon focuses on the important stuff they are uniquely good at. [...]

It turns out there are a LOT of secondary tasks which AI agents are now good enough to help out with. Some things I'm finding useful to hand off these days:

Before attempting a big task, write a guide to relevant areas of the codebase

Spike out an attempt at a big change. Often I won't use the result but I'll review it as a sketch of where to go

Fix typescript errors or bugs which have a clear specification

Write documentation about what I'm building

I often find it useful to run these secondary tasks async in the background -- while I'm eating lunch, or even literally overnight!

When I sit down for a work session, I want to feel like a surgeon walking into a prepped operating room. Everything is ready for me to do what I'm good at.

— Geoffrey Litt, channeling The Mythical Man-Month

# 24th October 2025, 2:07 pm / ai, generative-ai, llms, ai-assisted-programming, geoffrey-litt, coding-agents, parallel-agents

Just Talk To It—the no-bs Way of Agentic Engineering. Peter Steinberger's long, detailed description of his current process for using Codex CLI and GPT-5 Codex. This is information dense and full of actionable tips, plus plenty of strong opinions about the differences between Claude 4.5 an GPT-5:

While Claude reacts well to 🚨 SCREAMING ALL-CAPS 🚨 commands that threaten it that it will imply ultimate failure and 100 kittens will die if it runs command X, that freaks out GPT-5. (Rightfully so). So drop all of that and just use words like a human.

Peter is a heavy user of parallel agents:

I've completely moved to codex cli as daily driver. I run between 3-8 in parallel in a 3x3 terminal grid, most of them in the same folder, some experiments go in separate folders. I experimented with worktrees, PRs but always revert back to this setup as it gets stuff done the fastest.

He shares my preference for CLI utilities over MCPs:

I can just refer to a cli by name. I don't need any explanation in my agents file. The agent will try $randomcrap on the first call, the cli will present the help menu, context now has full info how this works and from now on we good. I don't have to pay a price for any tools, unlike MCPs which are a constant cost and garbage in my context. Use GitHub's MCP and see 23k tokens gone. Heck, they did make it better because it was almost 50.000 tokens when it first launched. Or use the gh cli which has basically the same feature set, models already know how to use it, and pay zero context tax.

It's worth reading the section on why he abandoned spec driven development in full.

# 14th October 2025, 9:26 pm / ai, generative-ai, llms, ai-assisted-programming, model-context-protocol, coding-agents, claude-code, codex-cli, parallel-agents

I get a feeling that working with multiple AI agents is something that comes VERY natural to most senior+ engineers or tech lead who worked at a large company

You already got used to overseeing parallel work (the goto code reviewer!) + making progress with small chunks of work... because your day has been a series of nonstop interactions, so you had to figure out how to do deep work in small chunks that could have been interrupted

— Gergely Orosz

# 9th October 2025, 1:56 pm / ai, generative-ai, llms, ai-assisted-programming, gergely-orosz, coding-agents, parallel-agents

Vibe engineering

I feel like vibe coding is pretty well established now as covering the fast, loose and irresponsible way of building software with AI—entirely prompt-driven, and with no attention paid to how the code actually works. This leaves us with a terminology gap: what should we call the other end of the spectrum, where seasoned professionals accelerate their work with LLMs while staying proudly and confidently accountable for the software they produce?

[... 1,313 words]

2:32 pm / 7th October 2025 / code-review, definitions, software-engineering, ai, generative-ai, llms, ai-assisted-programming, vibe-coding, coding-agents, parallel-agents

Embracing the parallel coding agent lifestyle

For a while now I’ve been hearing from engineers who run multiple coding agents at once—firing up several Claude Code or Codex CLI instances at the same time, sometimes in the same repo, sometimes against multiple checkouts or git worktrees.

[... 1,275 words]

12:06 pm / 5th October 2025 / ai, generative-ai, llms, ai-assisted-programming, ai-agents, coding-agents, claude-code, async-coding-agents, jules, codex-cli, parallel-agents, jesse-vincent

Simon Willison’s Weblog

7 posts tagged “parallel-agents”

2025

Vibe engineering

Embracing the parallel coding agent lifestyle