22nd July 2024 - Link Blog
Breaking Instruction Hierarchy in OpenAI's gpt-4o-mini. Johann Rehberger digs further into GPT-4o's "instruction hierarchy" protection and finds that it has little impact at all on common prompt injection approaches.
I spent some time this weekend to get a better intuition about
gpt-4o-minimodel and instruction hierarchy, and the conclusion is that system instructions are still not a security boundary.From a security engineering perspective nothing has changed: Do not depend on system instructions alone to secure a system, protect data or control automatic invocation of sensitive tools.
Recent articles
- Claude Opus 4.8: "a modest but tangible improvement" - 28th May 2026
- I think Anthropic and OpenAI have found product-market fit - 27th May 2026
- Notes on Pope Leo XIV's encyclical on AI - 25th May 2026