DeepSeek 3.1. The latest model from DeepSeek, a 685B monster (like DeepSeek v3 before it) but this time it's a hybrid reasoning model.
DeepSeek claim:
DeepSeek-V3.1-Think achieves comparable answer quality to DeepSeek-R1-0528, while responding more quickly.
Drew Breunig points out that their benchmarks show "the same scores with 25-50% fewer tokens" - at least across AIME 2025 and GPQA Diamond and LiveCodeBench.
The DeepSeek release includes prompt examples for a coding agent, a python agent and a search agent - yet more evidence that the leading AI labs have settled on those as the three most important agentic patterns for their models to support.
Here's the pelican riding a bicycle it drew me (transcript), which I ran from my phone using OpenRouter chat.
Recent articles
- The Summer of Johann: prompt injections as far as the eye can see - 15th August 2025
- Open weight LLMs exhibit inconsistent performance across providers - 15th August 2025
- LLM 0.27, the annotated release notes: GPT-5 and improved tool calling - 11th August 2025