Breaking the Barrier of Multimodal Switching! Google Brings Native Computer Operations into Gemini 3.5 Flash

Google DeepMind team announced a major technological breakthrough, integrating native computer usage capabilities directly into the Gemini 3.5 Flash model. This means developers can now build AI agents that autonomously view and perform actions on screens across browsers, mobile devices, and desktop computers using a single model.

Previously, this capability was only available as a separate model, requiring developers to perform complex switching and context transfer between different models. With the native integration, AI no longer needs to manually pass information when executing cross-platform long tasks, greatly simplifying the development process.

Say goodbye to context loss, directly addressing agent reliability issues

Google's team believes that the core bottleneck of AI agents is not the limit of individual tools, but the loss of context information when switching between multiple tools. By unifying search, maps, and computer operations within a single model architecture, context flows continuously, significantly reducing the probability of failure during complex tasks.

This "multi-tool integration" design is like building a comprehensive building with internal connectivity, eliminating the long and error-prone communication processes between multiple independent buildings. This architectural adjustment has the potential to bring substantial improvements in the reliability and response latency of agent-based tasks.

Focus on three core scenarios, establish multi-layered security defenses

This native capability will primarily be applied to three core scenarios, including automated tasks requiring continuous operation for hours or even days, continuous software testing for automatic user interface consistency verification, and knowledge-intensive work across applications. These scenarios highly rely on the continuity of context between multiple tasks and can effectively replace humans in performing repetitive and high-energy-consuming operations.

In terms of security design, Google has adopted a multi-layered defense strategy, including targeted adversarial training, enterprise security safeguards for sensitive operations, and indirect prompt injection detection. These mechanisms will collectively help enterprise users establish a relatively complete security boundary in open and uncontrollable computer environments.

Google DeepMind Invests $75 Million in A24: AI Enters the Hollywood Independent Film Industry

Google DeepMind invests $75M to partner with indie studio A24, co-developing AI filmmaking tools from project inception. This pioneering collaboration between a tech giant and top creators aims to build new AI capabilities for filmmakers. A24 is known for hits like 'Everything Everywhere All at Once.'....

Breaking the Barrier of Multimodal Switching! Google Brings Native Computer Operations into Gemini 3.5 Flash

Say goodbye to context loss, directly addressing agent reliability issues

Focus on three core scenarios, establish multi-layered security defenses

Related Recommendations

AI Bounding Box Query! Google Chrome 149 Collaborates with Gemini 3.5 Flash to Upgrade Screenshot Interaction

Google Releases Gemini 3.5 Flash with Native Integration of Computer Usage Tools, Replacing the 2.5 Framework

New Turning Point in the Computational Power Battle: OpenAI Teams Up with Broadcom to Launch the First Self-Developed Inference Chip, Jalapeño

Google DeepMind Invests $75 Million in A24: AI Enters the Hollywood Independent Film Industry

Samsung Electronics Globally Promotes ChatGPT and Codex to Enhance Employee Work Efficiency