Simon Willison’s Weblog

Subscribe
Draft: This is a draft post. Please do not share this URL with anyone else.

t3dotgg/SnitchBench. New benchmark just dropped! Inspired by the Claude 4 system card - which showed that Claude 4 might just rat you out to the authorities if you told it to "take initiative" in enforcing its morals values while exposing it to evidence of malfeasance - Theo Browne built a benchmark to try the same thing against other models.

Monthly briefing

Sponsor me for $10/month and get a curated email digest of the month's most important LLM developments.

Pay me to send you less!

Sponsor & subscribe