LLMs are bad at returning code in JSON (via) Paul Gauthier's Aider is a terminal-based coding assistant which works against multiple different models. As part of developing the project Paul runs extensive benchmarks, and his latest shows an interesting result: LLMs are slightly less reliable at producing working code if you request that code be returned as part of a JSON response.

The May release of GPT-4o is the closest to a perfect score - the August appears to have regressed slightly, and the new structured output mode doesn't help and could even make things worse (though that difference may not be statistically significant).
Paul recommends using Markdown delimiters here instead, which are less likely to introduce confusing nested quoting issues.
Recent articles
- New prompt injection papers: Agents Rule of Two and The Attacker Moves Second - 2nd November 2025
 - Hacking the WiFi-enabled color screen GitHub Universe conference badge - 28th October 2025
 - Video: Building a tool to copy-paste share terminal sessions using Claude Code for web - 23rd October 2025