Simon Willison’s Weblog

Subscribe

22nd July 2024

Breaking Instruction Hierarchy in OpenAI's gpt-4o-mini. Johann Rehberger digs further into GPT-4o's "instruction hierarchy" protection and finds that it has little impact at all on common prompt injection approaches.

I spent some time this weekend to get a better intuition about gpt-4o-mini model and instruction hierarchy, and the conclusion is that system instructions are still not a security boundary.

From a security engineering perspective nothing has changed: Do not depend on system instructions alone to secure a system, protect data or control automatic invocation of sensitive tools.

Recent articles

This is a link post by Simon Willison, posted on 22nd July 2024.

Monthly briefing

Sponsor me for $10/month and get a curated email digest of the month's most important LLM developments.

Pay me to send you less!

Sponsor & subscribe