Google is rolling out a new feature in its Gemini API that it says could significantly reduce the cost of using its advanced AI models.
The feature, called implicit caching, promises up to 75 per cent savings on repeated inputs for developers using the Gemini 2.5 Pro and 2.5 Flash models.
The announcement, made on Wednesday, is a major update for developers grappling with rising costs associated with frontier AI models. Unlike Google’s earlier explicit caching system—where developers had to manually define and manage cached prompts—implicit caching works automatically, without requiring user intervention.
According to Google, when a request sent to the Gemini API shares a common beginning with a previously submitted one, the system recognizes it and retrieves cached results, reducing computing load and costs. This automatic mechanism is now enabled by default.
The cost-saving mechanism is triggered when a request includes a minimum of 1,024 tokens for Gemini 2.5 Flash or 2,048 tokens for 2.5 Pro. Tokens are the basic units of data that AI models process—roughly equivalent to three-quarters of a word each.
The launch follows recent developer backlash over unexpectedly high charges tied to Google’s earlier caching system. Complaints surged over the past week, prompting a public apology from the Gemini team and a pledge to improve billing transparency.
However, while the update is positioned as a win for developers, Google has not offered third-party validation of the savings, raising questions about real-world performance. The company recommends that developers place recurring context at the beginning of their prompts to increase cache hits, while more variable data should appear later.
As AI APIs become more embedded in apps and services, the pressure on companies like Google to manage cost and performance will only intensify. Developers and businesses alike will be watching closely to see if Google delivers on its latest promise.

