Simon Willison’s Weblog

Subscribe

AbsenceBench: Language Models Can't Tell What's Missing (via) Here's another interesting result to file under the "jagged frontier" of LLMs, where their strengths and weaknesses are often unintuitive.

Long context models have been getting increasingly good at passing "Needle in a Haystack" tests recently, but what about a problem in the opposite direction?

This paper explores what happens when you give a model some content and then a copy with a portion removed, then ask what changed.

Here's a truncated table of results from the paper:

Models Poetry Sequences GitHub PRs Average
Gemini-2.5-flash* 87.3 95.4 30.9 71.2
Claude-3.7-Sonnet* 72.7 96.0 40.0 69.6
Claude-3.7-Sonnet 73.5 91.4 35.7 66.9
Gemini-2.5-flash 79.3 85.2 26.2 63.6
o3-mini* 65.0 78.1 38.9 60.7
GPT-4.1 54.3 57.5 36.2 49.3
... ... ... ... ...
DeepSeek-R1* 38.7 29.5 23.1 30.4
Qwen3-235B* 26.1 18.5 24.6 23.1
Mixtral-8x7B-Instruct 4.9 21.9 17.3 14.7

* indicates a reasoning model. Sequences are lists of numbers like 117,121,125,129,133,137, Poetry consists of 100-1000 line portions from the Gutenberg Poetry Corpus and PRs are diffs with 10 to 200 updated lines.

The strongest models do well at numeric sequences, adequately at the poetry challenge and really poorly with those PR diffs. Reasoning models do slightly better at the cost of burning through a lot of reasoning tokens - often more than the length of the original document.

The paper authors - Harvey Yiyun Fu and Aryan Shrivastava and Jared Moore and Peter West and Chenhao Tan and Ari Holtzman - have a hypothesis as to what's going on here:

We propose an initial hypothesis explaining this behavior: identifying presence is simpler than absence with the attention mechanisms underlying Transformers (Vaswani et al., 2017). Information included in a document can be directly attended to, while the absence of information cannot.

Monthly briefing

Sponsor me for $10/month and get a curated email digest of the month's most important LLM developments.

Pay me to send you less!

Sponsor & subscribe