Simon Willison’s Weblog

Subscribe

Claude Code for web—a new asynchronous coding agent from Anthropic

20th October 2025

Anthropic launched Claude Code for web this morning. It’s an asynchronous coding agent—their answer to OpenAI’s Codex Cloud and Google’s Jules, and has a very similar shape. I had preview access over the weekend and I’ve already seen some very promising results from it.

It’s available online at claude.ai/code and shows up as a tab in the Claude iPhone app as well:

Screenshot of Claude AI interface showing a conversation about updating a README file. The left sidebar shows "Claude" at the top, followed by navigation items: "Chats", "Projects", "Artifacts", and "Code" (highlighted). Below that is "Starred" section listing several items with trash icons: "LLM", "Python app", "Check my post", "Artifacts", "Summarize", and "Alt text writer". The center panel shows a conversation list with items like "In progress", "Run System C", "Idle", "Update Rese", "Run Matplotl", "Run Marketin", "WebAssembl", "Benchmark M", "Build URL Qu", and "Add Read-Or". The right panel displays the active conversation titled "Update Research Project README" showing a task to update a GitHub README file at https://github.com/simonw/research/blob/main/deepseek-ocr-nvidia-spark/README.md, followed by Claude's response and command outputs showing file listings with timestamps from Oct 20 17:53.

As far as I can tell it’s their latest Claude Code CLI app wrapped in a container (Anthropic are getting really good at containers these days) and configured to --dangerously-skip-permissions. It appears to behave exactly the same as the CLI tool, and includes a neat “teleport” feature which can copy both the chat transcript and the edited files down to your local Claude Code CLI tool if you want to take over locally.

It’s very straight-forward to use. You point Claude Code for web at a GitHub repository, select an environment (fully locked down, restricted to an allow-list of domains or configured to access domains of your choosing, including “*” for everything) and kick it off with a prompt.

While it’s running you can send it additional prompts which are queued up and executed after it completes its current step.

Once it’s done it opens a branch on your repo with its work and can optionally open a pull request.

Putting Claude Code for web to work

Claude Code for web’s PRs are indistinguishable from Claude Code CLI’s, so Anthropic told me it was OK to submit those against public repos even during the private preview. Here are some examples from this weekend:

That second example is the most interesting. I saw a tweet from Armin about his MiniJinja Rust template language adding support for Python 3.14 free threading. I hadn’t realized that project had Python bindings, so I decided it would be interesting to see a quick performance comparison between MiniJinja and Jinja2.

I ran Claude Code for web against a private repository with a completely open environment (* in the allow-list) and prompted:

I’m interested in benchmarking the Python bindings for https://github.com/mitsuhiko/minijinja against the equivalente template using Python jinja2

Design and implement a benchmark for this. It should use the latest main checkout of minijinja and the latest stable release of jinja2. The benchmark should use the uv version of Python 3.14 and should test both the regular 3.14 and the 3.14t free threaded version—so four scenarios total

The benchmark should run against a reasonably complicated example of a template, using template inheritance and loops and such like In the PR include a shell script to run the entire benchmark, plus benchmark implantation, plus markdown file describing the benchmark and the results in detail, plus some illustrative charts created using matplotlib

I entered this into the Claude iPhone app on my mobile keyboard, hence the typos.

It churned away for a few minutes and gave me exactly what I asked for. Here’s one of the four charts it created:

Line chart titled "Rendering Time Across Iterations" showing rendering time in milliseconds (y-axis, ranging from approximately 1.0 to 2.5 ms) versus iteration number (x-axis, ranging from 0 to 200+). Four different lines represent different versions: minijinja (3.14t) shown as a solid blue line, jinja2 (3.14) as a solid orange line, minijinja (3.14) as a solid green line, and jinja2 (3.14t) as a dashed red line. The green line (minijinja 3.14) shows consistently higher rendering times with several prominent spikes reaching 2.5ms around iterations 25, 75, and 150. The other three lines show more stable, lower rendering times between 1.0-1.5ms with occasional fluctuations.

(I was surprised to see MiniJinja out-performed by Jinja2, but I guess Jinja2 has had a decade of clever performance optimizations and doesn’t need to deal with any extra overhead of calling out to Rust.)

Note that I would likely have got the exact same result running this prompt against Claude CLI on my laptop. The benefit of Claude Code for web is entirely in its convenience as a way of running these tasks in a hosted container managed by Anthropic, with a pleasant web and mobile UI layered over the top.

Anthropic are framing this as part of their sandboxing strategy

It’s interesting how Anthropic chose to announce this new feature: the product launch is buried half way down their new engineering blog post Beyond permission prompts: making Claude Code more secure and autonomous, which starts like this:

Claude Code’s new sandboxing features, a bash tool and Claude Code on the web, reduce permission prompts and increase user safety by enabling two boundaries: filesystem and network isolation.

I’m very excited to hear that Claude Code CLI is taking sandboxing more seriously. I’ve not yet dug into the details of that—it looks like it’s using seatbelt on macOS and Bubblewrap on Linux.

Anthropic released a new open source (Apache 2) library, anthropic-experimental/sandbox-runtime, with their implementation of this so far.

Filesystem sandboxing is relatively easy. The harder problem is network isolation, which they describe like this:

Network isolation, by only allowing internet access through a unix domain socket connected to a proxy server running outside the sandbox. This proxy server enforces restrictions on the domains that a process can connect to, and handles user confirmation for newly requested domains. And if you’d like further-increased security, we also support customizing this proxy to enforce arbitrary rules on outgoing traffic.

This is crucial to protecting against both prompt injection and lethal trifecta attacks. The best way to prevent lethal trifecta attacks is to cut off one of the three legs, and network isolation is how you remove the data exfiltration leg that allows successful attackers to steal your data.

If you run Claude Code for web in “No network access” mode you have nothing to worry about.

I’m a little bit nervous about their “Trusted network access” environment. It’s intended to only allow access to domains relating to dependency installation, but the default domain list has dozens of entries which makes me nervous about unintended exfiltration vectors sneaking through.

You can also configure a custom environment with your own allow-list. I have one called “Everything” which allow-lists “*”, because for projects like my MiniJinja/Jinja2 comparison above there are no secrets or source code involved that need protecting.

I see Anthropic’s focus on sandboxes as an acknowledgment that coding agents run in YOLO mode (--dangerously-skip-permissions and the like) are enormously more valuable and productive than agents where you have to approve their every step.

The challenge is making it convenient and easy to run them safely. This kind of sandboxing kind is the only approach to safety that feels credible to me.

Update: A note on cost: I’m currently using a Claude “Max” plan that Anthropic gave me in order to test some of their features, so I don’t have a good feeling for how Claude Code would cost for these kinds of projects.

From running npx ccusage@latest (an unofficial cost estimate tool) it looks like I’m using between $1 and $5 worth of daily Claude CLI invocations at the moment.

This is Claude Code for web—a new asynchronous coding agent from Anthropic by Simon Willison, posted on 20th October 2025.

Previous: Getting DeepSeek-OCR working on an NVIDIA Spark via brute force using Claude Code

Monthly briefing

Sponsor me for $10/month and get a curated email digest of the month's most important LLM developments.

Pay me to send you less!

Sponsor & subscribe