In my experience with AI coding, very large context windows aren't useful in practice. Every model seems to get confused when you feed them more than ~25-30k tokens. The models stop obeying their system prompts, can't correctly find/transcribe pieces of code in the context, etc.
Developing aider, I've seen this problem with gpt-4o, Sonnet, DeepSeek, etc. Many aider users report this too. It's perhaps the #1 problem users have, so I created a dedicated help page.
Very large context may be useful for certain tasks with lots of "low value" context. But for coding, it seems to lure users into a problematic regime.
Recent articles
- Design Patterns for Securing LLM Agents against Prompt Injections - 13th June 2025
- Comma v0.1 1T and 2T - 7B LLMs trained on openly licensed text - 7th June 2025
- The last six months in LLMs, illustrated by pelicans on bicycles - 6th June 2025