Google has unveiled Gemini 2.5 Computer Use, a new AI model designed to interact with digital interfaces in ways that closely mimic human web behavior. Available now in public preview through the Gemini API on Google AI Studio and Vertex AI, the model marks a notable step toward more autonomous, task-oriented agents that can operate across websites and web apps.
Built on the foundation of Gemini 2.5 Pro, the Computer Use model extends its predecessor’s visual understanding and reasoning capabilities into interactive environments. It can perform common browser-based actions such as clicking, scrolling, typing, selecting dropdowns, and navigating between pages. In early benchmarks, Google reports that Gemini 2.5 Computer Use surpassed competing systems on tests like Online-Mind2Web, WebVoyager, and AndroidWorld, while maintaining faster response times.
Unlike conventional AI models that rely solely on API calls, Gemini 2.5 Computer Use interprets screenshots of a digital interface and generates precise UI actions in response. The process is iterative: the agent receives a screenshot and a task description, decides the next logical action (for example, clicking a button or entering text), executes it, and then analyzes the resulting interface through a new screenshot. This loop continues until the task is completed.
In demonstration videos, Google showcased the model sorting sticky notes on a virtual whiteboard and transferring data—such as pet details—from one website to a customer relationship management system. While these examples were accelerated for clarity, they highlight the model’s ability to handle multi-step workflows that typically require human attention.
Currently, Gemini 2.5 Computer Use supports 13 types of actions and performs best in browser environments. It’s not yet optimized for desktop-level operations but has shown promising results in mobile-oriented benchmarks. Google has also integrated safety checks at each step: every action proposed by the model is screened by a safety service, and developers can impose restrictions or require confirmation for sensitive operations, including financial transactions.
Within Google, several teams are already deploying the model for internal use across products like Search and Firebase, primarily for testing and automation. Early external partners are using it to build workflow automation tools and interactive AI assistants.
Developers can access Gemini 2.5 Computer Use through Google AI Studio or Vertex AI, with a dedicated demo environment available via Browserbase for experimentation. The release underlines Google’s growing focus on bridging perception and action in AI—moving from understanding web content to actively engaging with it in a controlled, human-like manner.