Google has officially launched the latest member of its Gemini 3 series - Gemini 3.1 Flash-Lite. As the fastest and most cost-effective lightweight model in the series, its release marks Google's continued efforts in the "cost-effective AI" field, aiming to provide developers with an ultimate real-time interaction experience.

image.png

In terms of performance, Gemini 3.1 Flash-Lite shows remarkable progress. According to data from authoritative evaluation platforms, compared to its predecessor 2.5 Flash, the new model has achieved a 2.5 times increase in first-word response time (TTFT), and the overall output speed has also increased by 45%. This extremely low latency makes it perfectly suitable for dialogue robots and real-time processing scenarios that require immediate feedback.

image.png

Aside from being fast, this model also offers excellent value for money. Google has set a highly competitive pricing plan: only $0.25 per million input tokens. In multiple core capability tests, 3.1 Flash-Lite even showed the ability to challenge superior models, leading its peers at the same level in multi-modal understanding and logical reasoning indicators, with some data even surpassing larger predecessor models.

image.png

Additionally, Google has equipped this model with an innovative "thinking level" feature in AI Studio and Vertex AI. Developers can flexibly adjust the model's "depth of thinking" according to business needs: for simple translation or content review, they can pursue maximum efficiency; while facing complex logic simulation or data dashboard generation, they can trigger deeper reasoning capabilities. Currently, the model is available through API to preview users and enterprise platforms, providing a new tool for global developers to build low-latency AI applications.

Key points:

  • Significant improvement in response speed: First-word response speed increased by 2.5 times, overall speed improved by 45%, focusing on real-time interaction scenarios.

  • 💰 Extreme cost control: Input price as low as $0.25 per million tokens, greatly reducing the barrier to large-scale AI deployment.

  • 🧠 Controllable depth of thinking: New "thinking level" adjustment function, supporting flexible switching between efficiency and deep reasoning.