The cost of AI reasoning over time (via) Karina Nguyen from Anthropic provides a fascinating visualization illustrating the cost of different levels of LLM over the past few years, plotting their cost-per-token against their scores on the MMLU benchmark.
Claude 3 Haiku currently occupies the lowest cost to score ratio, over on the lower right hand side of the chart.
Recent articles
- My review of Claude's new Code Interpreter, released under a very confusing name - 9th September 2025
- Recreating the Apollo AI adoption rate chart with GPT-5, Python and Pyodide - 9th September 2025
- GPT-5 Thinking in ChatGPT (aka Research Goblin) is shockingly good at search - 6th September 2025