Context caching for Google Gemini (via) Another new Gemini feature announced today. Long context models enable answering questions against large chunks of text, but the price of those long prompts can be prohibitive - $3.50/million for Gemini Pro 1.5 up to 128,000 tokens and $7/million beyond that.
Context caching offers a price optimization, where the long prefix prompt can be reused between requests, halving the cost per prompt but at an additional cost of $4.50 / 1 million tokens per hour to keep that context cache warm.
Given that hourly extra charge this isn't a default optimization for all cases, but certain high traffic applications might be able to save quite a bit on their longer prompt systems.
It will be interesting to see if other vendors such as OpenAI and Anthropic offer a similar optimization in the future.
Update 14th August 2024: Anthropic's Claude now has its own version of prompt caching.
Recent articles
- Video: Building a tool to copy-paste share terminal sessions using Claude Code for web - 23rd October 2025
- Dane Stuckey (OpenAI CISO) on prompt injection risks for ChatGPT Atlas - 22nd October 2025
- Living dangerously with Claude - 22nd October 2025