t3dotgg/SnitchBench

Draft: This is a draft post. Please do not share this URL with anyone else.

t3dotgg/SnitchBench. New benchmark just dropped! Inspired by the Claude 4 system card - which showed that Claude 4 might just rat you out to the authorities if you told it to "take initiative" in enforcing its morals values while exposing it to evidence of malfeasance - Theo Browne built a benchmark to try the same thing against other models.

Posted 31st May 2025 at 4:56 pm

Simon Willison’s Weblog

Recent articles

Monthly briefing