By using this site, you agree to our Privacy Policy and Terms of Service.
Accept
Absolute Geeks UAEAbsolute Geeks UAE
  • STORIES
    • TECH
    • AUTOMOTIVE
    • GUIDES
    • OPINIONS
  • REVIEWS
    • READERS’ CHOICE
    • ALL REVIEWS
    • ━
    • SMARTPHONES
    • CARS
    • HEADPHONES
    • ACCESSORIES
    • LAPTOPS
    • TABLETS
    • WEARABLES
    • SPEAKERS
    • APPS
  • WATCHLIST
    • TV & MOVIES REVIEWS
    • SPOTLIGHT
  • GAMING
    • GAMING NEWS
    • GAME REVIEWS
  • +
    • TMT LABS
    • WHO WE ARE
    • GET IN TOUCH
Reading: Google adds implicit caching to Gemini API, promising up to 75% cost savings
Share
Notification Show More
Absolute Geeks UAEAbsolute Geeks UAE
  • STORIES
    • TECH
    • AUTOMOTIVE
    • GUIDES
    • OPINIONS
  • REVIEWS
    • READERS’ CHOICE
    • ALL REVIEWS
    • ━
    • SMARTPHONES
    • CARS
    • HEADPHONES
    • ACCESSORIES
    • LAPTOPS
    • TABLETS
    • WEARABLES
    • SPEAKERS
    • APPS
  • WATCHLIST
    • TV & MOVIES REVIEWS
    • SPOTLIGHT
  • GAMING
    • GAMING NEWS
    • GAME REVIEWS
  • +
    • TMT LABS
    • WHO WE ARE
    • GET IN TOUCH
Follow US

Google adds implicit caching to Gemini API, promising up to 75% cost savings

GEEK DESK
GEEK DESK
May 9

Google is making a key change to its Gemini API that could significantly reduce costs for developers using its latest AI models. A new feature called implicit caching is now live for Gemini 2.5 Pro and 2.5 Flash, with Google claiming it can slash processing costs by up to 75% for requests that include repetitive context.

For developers feeling the pinch of mounting inference costs, this update could offer some relief—provided it works as advertised.

A Smarter, Simpler Approach to Caching

Caching isn’t new in AI infrastructure. It’s a common performance tactic that stores and reuses frequently accessed data, reducing the need for repeated computation. But until now, Google’s Gemini API only supported explicit caching, which required developers to manually identify and define high-frequency prompts.

That process often proved tedious—and, in some cases, ineffective. Some developers recently voiced frustration that even with explicit caching enabled, costs remained unexpectedly high when using Gemini 2.5 Pro. Those concerns reached a boiling point last week, prompting the Gemini team to issue a public apology and pledge improvements.

Enter implicit caching, which operates automatically and is enabled by default for Gemini 2.5 models. If a request shares a common prefix with a previous one—typically repeated instructions or context—the system checks for a cache hit and applies cost savings on the backend. No manual setup is required.

Google says developers can expect the best results when keeping repetitive context at the beginning of a prompt, with variable content placed at the end. For a request to be eligible, the minimum token count is 1,024 tokens for 2.5 Flash and 2,048 tokens for 2.5 Pro—roughly equivalent to 750 and 1,500 words, respectively.

Real Savings or Another PR Patch?

While the automatic nature of implicit caching is a clear improvement over the manual approach, there are still some unknowns. Google hasn’t provided independent benchmarks or third-party validation of the claimed savings, and it remains to be seen how consistently real-world use cases will benefit.

This update follows mounting pressure on AI companies to control API costs as large language models become more powerful—and more expensive to run. For developers integrating AI into commercial products or at-scale workflows, even small inefficiencies can drive up operational costs quickly.

If implicit caching works as Google claims, it could make Gemini 2.5 Pro and 2.5 Flash more competitive in a market that includes OpenAI, Anthropic, and Meta. But for now, early adopters will be the ones testing whether the savings are real or just theoretical.

Share
What do you think?
Happy0
Sad0
Love0
Surprise0
Cry0
Angry0
Dead0

WHAT'S HOT ❰

Hisense rolls out Ramadan electronics offers across TVs and home appliances
Smart #5 electric SUV launches in UAE with fast-charging and 590 km range
Yango Yasmina Ramadan update brings calendar tracking and curated audio tools
Ookla names e& UAE world’s fastest mobile network for fifth time
Third-generation Audi Q3 arrives in the UAE starting from AED 197,000
Absolute Geeks UAEAbsolute Geeks UAE
Follow US
© 2014 - 2026 Absolute Geeks, a TMT Labs L.L.C-FZ media network
Upgrade Your Brain Firmware
Receive updates, patches, and jokes you’ll pretend you understood.
No spam, just RAM for your brain.
Welcome Back!

Sign in to your account

Username or Email Address
Password

Lost your password?