4th November 2025 - Link Blog
MCP Colors: Systematically deal with prompt injection risk (via) Tim Kellogg proposes a neat way to think about prompt injection, especially with respect to MCP tools.
Classify every tool with a color: red if it exposes the agent to untrusted (potentially malicious) instructions, blue if it involves a "critical action" - something you would not want an attacker to be able to trigger.
This means you can configure your agent to actively avoid mixing the two colors at once:
The Chore: Go label every data input, and every tool (especially MCP tools). For MCP tools & resources, you can use the _meta object to keep track of the color. The agent can decide at runtime (or earlier) if it’s gotten into an unsafe state.
Personally, I like to automate. I needed to label ~200 tools, so I put them in a spreadsheet and used an LLM to label them. That way, I could focus on being precise and clear about my criteria for what constitutes “red”, “blue” or “neither”. That way I ended up with an artifact that scales beyond my initial set of tools.
Recent articles
- Changes in the system prompt between Claude Opus 4.6 and 4.7 - 18th April 2026
- Join us at PyCon US 2026 in Long Beach - we have new AI and security tracks this year - 17th April 2026
- Qwen3.6-35B-A3B on my laptop drew me a better pelican than Claude Opus 4.7 - 16th April 2026