Google launches Gemini 3 Flash with lower costs and faster performance

Google has introduced Gemini 3 Flash, a new lightweight AI model designed to prioritize speed, lower operating costs, and broad accessibility across consumer and developer-facing products. The launch positions Gemini 3 Flash as a faster alternative within Google’s expanding Gemini 3 model family, with the company emphasizing responsiveness and efficiency rather than raw scale.

The core pitch behind Gemini 3 Flash is performance per dollar. Google says the model delivers significantly faster responses than earlier versions, including Gemini 2.5 Pro, while operating at lower token costs. Pricing is set at $0.50 per million input tokens and $3 per million output tokens, making it one of the more affordable options in Google’s AI lineup. This pricing structure appears aimed at developers building applications where latency and cost predictability matter as much as reasoning depth.

Speed alone, however, is not the only focus. Google claims Gemini 3 Flash maintains strong reasoning capabilities, citing benchmark results such as 90.4 percent on GPQA Diamond and 33.7 percent on Humanity’s Last Exam without external tools. While benchmarks are an imperfect proxy for real-world usefulness, these figures suggest the model is designed to handle complex queries without fully sacrificing accuracy in favor of responsiveness.

Gemini 3 Flash also introduces what Google describes as adaptive thinking. The model can scale its reasoning effort based on task complexity, spending more time on demanding problems while remaining efficient for simpler requests. According to Google, this approach allows Gemini 3 Flash to use roughly 30 percent fewer tokens than Gemini 2.5 Pro, even when tackling more involved tasks. In multimodal testing, the model reportedly scored 81.2 percent on the MMMU Pro benchmark, indicating solid performance across text, image, and video inputs.

For developers, Gemini 3 Flash is positioned as a practical option for real-time applications. Google highlights its performance on SWE-bench Verified, where it achieved a score of 78 percent, attributing this to balanced reasoning, tool use, and multimodal processing. The company suggests use cases such as in-game assistants, rapid experimentation, A/B testing, and video analysis, where fast iteration is often more valuable than exhaustive reasoning.

Multimodal capabilities are another area of emphasis. Gemini 3 Flash can interpret images, audio, and video, then generate outputs based on that understanding. Google frames this as enabling faster content creation, lightweight app building, and small interactive projects that do not require extensive coding expertise.

Gemini 3 Flash is available globally as of December 17 through the Gemini app and has been set as the default model for AI Mode in Google Search. Developers can access it through Google AI Studio, the Gemini API, Gemini CLI, and Google Antigravity, while enterprise customers will find it integrated into Vertex AI and Gemini Enterprise.

Overall, Gemini 3 Flash reflects Google’s effort to balance speed, cost, and capability as AI tools move deeper into everyday products and workflows. Rather than replacing higher-end models, it appears designed to handle the high-volume, low-latency tasks that increasingly define practical AI usage.