Simon Willison’s Weblog

Subscribe

Sunday, 29th June 2025

How to Fix Your Context. Drew Breunig has been publishing some very detailed notes on context engineering recently. In How Long Contexts Fail he described four common patterns for context rot, which he summarizes like so:

  • Context Poisoning: When a hallucination or other error makes it into the context, where it is repeatedly referenced.
  • Context Distraction: When a context grows so long that the model over-focuses on the context, neglecting what it learned during training.
  • Context Confusion: When superfluous information in the context is used by the model to generate a low-quality response.
  • Context Clash: When you accrue new information and tools in your context that conflicts with other information in the prompt.

In this follow-up he introduces neat ideas (and more new terminology) for addressing those problems.

Tool Loadout describes selecting a subset of tools to enable for a prompt, based on research that shows anything beyond 20 can confuse some models.

Context Quarantine is "the act of isolating contexts in their own dedicated threads" - I've called rhis sub-agents in the past, it's the pattern used by Claude Code and explored in depth in Anthropic's multi-agent research paper.

Context Pruning is "removing irrelevant or otherwise unneeded information from the context", and Context Summarization is the act of boiling down an accrued context into a condensed summary. These techniques become particularly important as conversations get longer and run closer to the model's token limits.

Context Offloading is "the act of storing information outside the LLM’s context". I've seen several systems implement their own "memory" tool for saving and then revisiting notes as they work, but an even more interesting example recently is how various coding agents create and update plan.md files as they work through larger problems.

Drew's conclusion:

The key insight across all the above tactics is that context is not free. Every token in the context influences the model’s behavior, for better or worse. The massive context windows of modern LLMs are a powerful capability, but they’re not an excuse to be sloppy with information management.

# 8:15 pm / ai, prompt-engineering, generative-ai, llms, drew-breunig, llm-tool-use, ai-agents

Tip: Use keyword-only arguments in Python dataclasses (via) Useful tip from Christian Hammond: if you create a Python dataclass using @dataclass(kw_only=True) its constructor will require keyword arguments, making it easier to add additional properties in the future, including in subclasses, without risking breaking existing code.

# 8:51 pm / python

Agentic Coding: The Future of Software Development with Agents. Armin Ronacher delivers a 37 minute YouTube talk describing his adventures so far with Claude Code and agentic coding methods.

A friend called Claude Code catnip for programmers and it really feels like this. I haven't felt so energized and confused and just so willing to try so many new things... it is really incredibly addicting.

I picked up a bunch of useful tips from this video:

  • Armin runs Claude Code with the --dangerously-skip-permissions option, and says this unlocks a huge amount of productivity. I haven't been brave enough to do this yet but I'm going to start using that option while running in a Docker container to ensure nothing too bad can happen.
  • When your agentic coding tool can run commands in a terminal you can mostly avoid MCP - instead of adding a new MCP tool, write a script or add a Makefile command and tell the agent to use that instead. The only MCP Armin uses is the Playwright one.
  • Combined logs are a really good idea: have everything log to the same place and give the agent an easy tool to read the most recent N log lines.
  • While running Claude Code, use Gemini CLI to run sub-agents, to perform additional tasks without using up Claude Code's own context
  • Designing additional tools that provide very clear errors, so the agents can recover when something goes wrong.
  • Thanks to Playwright, Armin has Claude Code perform all sorts of automated operations via a signed in browser instance as well. "Claude can debug your CI... it can sign into a browser, click around, debug..." - he also has it use the gh GitHub CLI tool to interact with things like GitHub Actions workflows.

"Tip 1: Unified Logging" at top, followed by title "Forward Everything Into One Log File" and bullet points: "Combine console.log + server logs + everything else", "patch console.log in the browser -> forward to server via API call", "All output streams flow to a single, tailable log file", "Give it a way to log out SQL too!", "Provide a make tail-logs command for easy access". Bottom shows example: "# Example" and "make tail-logs  # Shows last 50 lines, follows new output".

# 11:59 pm / armin-ronacher, ai, generative-ai, llms, ai-assisted-programming, anthropic, claude, ai-agents, model-context-protocol, claude-code

2025 » June

MTWTFSS
      1
2345678
9101112131415
16171819202122
23242526272829
30