By using this site, you agree to our Privacy Policy and Terms of Service.
Accept
Absolute GeeksAbsolute Geeks
  • LATEST
    • TECH
    • GAMING
    • AUTOMOTIVE
    • QUICK READS
  • REVIEWS
    • SMARTPHONES
    • HEADPHONES
    • ACCESSORIES
    • LAPTOPS
    • SPEAKERS
    • TABLETS
    • WEARABLES
    • APPS
    • GAMING
    • TV & MOVIES
    • ━
    • ALL REVIEWS
  • PLAY
    • TV & MOVIES REVIEWS
    • THE LATEST
  • DECRYPT
    • GUIDES
    • OPINIONS
  • +
    • TMT LABS
    • GET IN TOUCH
Reading: Google adds implicit caching to Gemini API, promising up to 75% cost savings
Share
Absolute GeeksAbsolute Geeks
  • LATEST
    • TECH
    • GAMING
    • AUTOMOTIVE
    • QUICK READS
  • REVIEWS
    • SMARTPHONES
    • HEADPHONES
    • ACCESSORIES
    • LAPTOPS
    • SPEAKERS
    • TABLETS
    • WEARABLES
    • APPS
    • GAMING
    • TV & MOVIES
    • ━
    • ALL REVIEWS
  • PLAY
    • TV & MOVIES REVIEWS
    • THE LATEST
  • DECRYPT
    • GUIDES
    • OPINIONS
  • +
    • TMT LABS
    • GET IN TOUCH
Follow US

Google adds implicit caching to Gemini API, promising up to 75% cost savings

GEEK STAFF
GEEK STAFF
May 9, 2025

Google is making a key change to its Gemini API that could significantly reduce costs for developers using its latest AI models. A new feature called implicit caching is now live for Gemini 2.5 Pro and 2.5 Flash, with Google claiming it can slash processing costs by up to 75% for requests that include repetitive context.

For developers feeling the pinch of mounting inference costs, this update could offer some relief—provided it works as advertised.

A Smarter, Simpler Approach to Caching

Caching isn’t new in AI infrastructure. It’s a common performance tactic that stores and reuses frequently accessed data, reducing the need for repeated computation. But until now, Google’s Gemini API only supported explicit caching, which required developers to manually identify and define high-frequency prompts.

That process often proved tedious—and, in some cases, ineffective. Some developers recently voiced frustration that even with explicit caching enabled, costs remained unexpectedly high when using Gemini 2.5 Pro. Those concerns reached a boiling point last week, prompting the Gemini team to issue a public apology and pledge improvements.

Enter implicit caching, which operates automatically and is enabled by default for Gemini 2.5 models. If a request shares a common prefix with a previous one—typically repeated instructions or context—the system checks for a cache hit and applies cost savings on the backend. No manual setup is required.

Google says developers can expect the best results when keeping repetitive context at the beginning of a prompt, with variable content placed at the end. For a request to be eligible, the minimum token count is 1,024 tokens for 2.5 Flash and 2,048 tokens for 2.5 Pro—roughly equivalent to 750 and 1,500 words, respectively.

Real Savings or Another PR Patch?

While the automatic nature of implicit caching is a clear improvement over the manual approach, there are still some unknowns. Google hasn’t provided independent benchmarks or third-party validation of the claimed savings, and it remains to be seen how consistently real-world use cases will benefit.

This update follows mounting pressure on AI companies to control API costs as large language models become more powerful—and more expensive to run. For developers integrating AI into commercial products or at-scale workflows, even small inefficiencies can drive up operational costs quickly.

If implicit caching works as Google claims, it could make Gemini 2.5 Pro and 2.5 Flash more competitive in a market that includes OpenAI, Anthropic, and Meta. But for now, early adopters will be the ones testing whether the savings are real or just theoretical.

Share
What do you think?
Happy0
Sad0
Love0
Surprise0
Cry0
Angry0
Dead0

LATEST STORIES

Nothing’s first over-ear headphones are coming to challenge AirPods Max—with style and sass
TECH
OpenAI launches Codex: a cloud-based coding agent for parallel dev tasks
TECH
Gemini app gets smarter, flashier prompt bar on Android and iOS
TECH
Find N5 arrives in UAE: OPPO’s slimmest foldable yet, packed with AI and high-end hardware
TECH
Absolute GeeksAbsolute Geeks
Follow US
© 2014-2025 Absolute Geeks, a TMT Labs L.L.C-FZ media network - Privacy Policy
Level up with the Geek Newsletter
Tech, entertainment, and smart guides

Zero spam, we promise. Unsubscribe any time.
Welcome Back!

Sign in to your account

Username or Email Address
Password

Lost your password?