14th August 2024 - Link Blog
DeepSeek API introduces Context Caching on Disk (via) I wrote about Claude prompt caching this morning. It turns out Chinese LLM lab DeepSeek released their own implementation of context caching a couple of weeks ago, with the simplest possible pricing model: it's just turned on by default for all users.
When duplicate inputs are detected, the repeated parts are retrieved from the cache, bypassing the need for recomputation. This not only reduces service latency but also significantly cuts down on overall usage costs.
For cache hits, DeepSeek charges $0.014 per million tokens, slashing API costs by up to 90%.
[...]
The disk caching service is now available for all users, requiring no code or interface changes. The cache service runs automatically, and billing is based on actual cache hits.
DeepSeek currently offer two frontier models, DeepSeek-V2 and DeepSeek-Coder-V2, both of which can be run as open weights models or accessed via their API.
Recent articles
- Notes on the xAI/Anthropic data center deal - 7th May 2026
- Live blog: Code w/ Claude 2026 - 6th May 2026
- Vibe coding and agentic engineering are getting closer than I'd like - 6th May 2026