超越 Transformer Architecture? Inception Launches the World's First Inference Large Model Based on Diffusion Models

Artificial intelligence startup Inception Labs has recently announced the release of Mercury2, which is not only a high-performance inference model but also represents a bold "paradigm shift" in its underlying architecture.

The model completely abandons the currently popular Transformer architecture and instead uses diffusion-based models to generate text, aiming to break through the performance bottlenecks of traditional large models.

Different from traditional models that generate tokens (characters) one by one, Mercury2 works more like an experienced editor. Instead of generating one character at a time, it can perform global optimization and rewriting on multiple text blocks simultaneously. This parallel processing logic allows Mercury2 to demonstrate remarkable speed advantages when handling complex logical reasoning tasks.

According to test data obtained by AIbase, under the drive of NVIDIA Blackwell GPU, Mercury2's generation speed reaches an astonishing 1009 tokens per second. In end-to-end latency tests, the model responds in just 1.7 seconds, which is more than eight times faster than Google's Gemini3Flash and far exceeds Anthropic's Claude Haiku4.5. Despite its extremely fast speed, its quality remains competitive with current top lightweight reasoning models in authoritative reasoning benchmark tests such as GPQA Diamond and AIME.

In terms of commercial strategy, Inception Labs has adopted a highly competitive pricing plan, with input and output costs only a quarter of those of similar competitors. Currently, Mercury2 has officially opened API interfaces and supports ultra-long context of 128,000 tokens and tool calling functions. For voice assistants, search systems, and programming tools that pursue extreme response speed, this "unconventional" diffusion reasoning model undoubtedly offers an attractive new option.

Summary:

🌀 Revolutions in the underlying architecture: Abandoning the traditional character-by-character generation mode, adopting diffusion model technology, supporting global optimization of multiple text blocks simultaneously, achieving a qualitative change in reasoning logic.
⚡ Outstanding performance: Achieving second-level response with the latest hardware, generating over 1000 tokens per second, with latency performance significantly better than Gemini3 and Claude4.5.
💰 High cost-effectiveness for commercial use: Challenging the existing market landscape with extremely low cost, supporting long texts and API access, focusing on enterprise-level AI applications sensitive to latency.

Roblox Launches AI Creation Tool Build, Supporting Text-Generated Games and 3D Scenes

Roblox launched AI creation tool Build and upgraded its Studio, using AI to lower the development barrier. Users can generate editable game content via text prompts. The feature begins testing on July 28, deepening the 'user-generated content' philosophy. The platform has 132 million daily active users. Build is a mobile-first tool, enabling creation anytime, anywhere.....

Google Tests Gemini Voice Customization Feature, Adds Four Adjustment Options: Speed, Energy, Formality, and Friendliness

Google is developing a voice customization feature for Gemini, allowing users to fine-tune the AI's communication style using four sliders: speed, energy, formality, and friendliness. This feature was discovered in the Google app's 17.41.12 beta version and will break the limitations of preset voice options, enabling personalized voice control in the future.

1Password Teams Up with Claude to Launch an Integrated Feature: AI Can Log You Into Websites, But Passwords Remain Hidden from It

1Password integrates with the AI assistant Claude, offering a new approach to address risks in password sharing: Claude can fill in login credentials and one-time verification codes on behalf of the user, completing browser tasks, but the plaintext passwords remain locked in an encrypted vault, not readable by the AI. This solution makes AI an executor rather than an informed party, drawing a clear line between convenience and security, avoiding handing over the key to others.

AI Travel Platform Fora Completes $60 Million D-Round Funding, Valuation Rises to $1 Billion

AI travel platform Fora raised $60M in Series D at a $1B valuation. Led by Forerunner and Tactile Ventures with Insight Partners, total funding hits $138.5M. Founded in 2021, it combines travel services and agent ops, offering tools for booking, planning and communication to help users become travel agents with low barriers.....

超越 Transformer Architecture? Inception Launches the World's First Inference Large Model Based on Diffusion Models - Mercury 2

Related Recommendations

Roblox Launches AI Creation Tool Build, Supporting Text-Generated Games and 3D Scenes

Google Tests Gemini Voice Customization Feature, Adds Four Adjustment Options: Speed, Energy, Formality, and Friendliness

1Password Teams Up with Claude to Launch an Integrated Feature: AI Can Log You Into Websites, But Passwords Remain Hidden from It

AI Travel Platform Fora Completes $60 Million D-Round Funding, Valuation Rises to $1 Billion

Google Upgrades Gemini Spark AI Assistant with New Workspace Editing Capabilities and Over 50% Speed Improvement