Article Content

Xiaomi Open Sources First-generation Robot VLA Large Model, Breaking the Bottleneck of Physical Intelligence Latency

Published in Latest AI News

Time :Feb 12, 2026

Read :4minute

The field of Embodied AI has made a major breakthrough today. Xiaomi has officially open-sourced its first-generation robot model Xiaomi-Robotics-0. This model has 4.7 billion parameters, aiming to solve the problem of slow robot movements caused by inference delays in existing VLA (Vision-Language-Action) models, achieving real-time inference and efficient generalization on consumer-grade GPUs.

Core Architecture: Collaboration between Brain and Cerebellum

To balance general understanding and high-frequency control, Xiaomi-Robotics-0 adopts an innovative MoT (Mixture-of-Transformers) hybrid architecture:

Vision-Language Brain (VLM): As the base, it is responsible for interpreting ambiguous human instructions and capturing spatial relationships in high-definition vision.
Action Execution Cerebellum (Action Expert): Embedded with multiple layers of Diffusion Transformer (DiT), it generates precise "action chunks" through flow matching technology, ensuring flexibility in physical execution.

Training Secrets: A Two-Stage Evolutionary Theory

The Xiaomi R&D team balanced the model's common sense understanding and physical operation capabilities through a rigorous training formula:

Cross-modal Pre-training: Introducing the Action Proposal mechanism allows the VLM to maintain logical reasoning ability while aligning feature space and action space. After that, the VLM is frozen and DiT is specifically trained to generate smooth action sequences.
Post-training: To address the "action discontinuity" issue during real machine operation, an asynchronous inference mode is used. Combining Clean Action Prefix (ensuring continuous trajectory) and Λ-shape Attention Mask (forcing attention to current visual feedback), the robot gains strong response agility when facing sudden environmental changes.

Practical Performance: Breaking Multiple SOTA Records

In testing, Xiaomi-Robotics-0 demonstrated dominant performance:

Simulation Benchmark: In three major simulation tests, LIBERO, CALVIN, and SimplerEnv, it defeated 30 comparison models and achieved the best current results (SOTA).
Real Machine Generalization: On a dual-arm robot platform, whether disassembling blocks or folding flexible towels, the model showed high hand-eye coordination and physical generalization ability.

Open Source Ecosystem

Xiaomi has fully opened up technical resources this time, including technical homepage, open source code, and model weights released on Hugging Face, aiming to jointly push the boundaries of embodied intelligence through community efforts.

Technical Homepage: https://xiaomi-robotics-0.github.io
Open Source Code: https://github.com/XiaomiRobotics/Xiaomi-Robotics-0
Model Weights: https://huggingface.co/XiaomiRobotics

Related Recommendations

Spring Festival Orders Expected to Exceed 5,000! Qingtian ZHU GMV Increased by 80% Quarter-over-quarter

During the Spring Festival, the robot rental platform Qingtian ZHU saw a surge in orders, with expected holiday orders exceeding 5,000. The overall GMV increased by about 80% quarter-over-quarter. Order growth remained strong before and after the holiday. About 30% of first-time renters indicated an increase in market penetration.

Feb 13, 2026

152.6k

Meitu Kuaipai Joins Seedance 2.0 Large Model in Its First Batch

The video recording tool 'Kuaipai' under Meitu will integrate the Seedance 2.0 large model, expected to launch at the end of February. This move aims to deeply embed AI generation capabilities into users' workflows, improving the efficiency of short video creation. As a leading voice-over video application in China, 'Kuaipai' was previously known for features such as script readers and intelligent video mixing. This upgrade will further enhance its AI creation capabilities.

Feb 13, 2026

93.6k

Rivaling Claude 4.5! Silicon-based Flow Launches High-speed Version GLM-5, Domestic Large Model Secures Fourth Place Globally

The domestic large model GLM-5 achieved significant breakthroughs in early 2026, ranking fourth in the authoritative list Artificial Analysis globally after being open-sourced, with a score comparable to Claude Opus4.5. Its core innovations include: expanding parameter scale to 744B and pre-training data reaching 28.5T; integrating DeepSeek sparse attention mechanism, which maintains long-text understanding capabilities while reducing deployment costs; and excelling in programming and engineering fields.

Feb 13, 2026

143.4k

Getting Rid of NVIDIA Dependency! OpenAI Joins Forces with Cerebras to Launch GPT-5.3-Codex-Spark: The First Fruit of a $10 Billion Computing Power

OpenAI is accelerating its strategy to reduce reliance on NVIDIA, and on February 12, 2026, it launched its first AI model based on Cerebras chips, GPT-5.3-Codex-Spark. This model is designed specifically for software engineers, offering a more flexible interactive experience, supporting instant interruption and switching, allowing developers to pause lengthy computations at any time and quickly handle other urgent coding tasks.

Feb 13, 2026

113.2k

Dual Evolution of Intelligent Driving and Cockpit! AVATR.OS 5.0 Officially Released: MoLA Large Model on Board, First Direct Upgrade to Huawei ADS 4.1

AVATR released the AVATR.OS 5.0.0 system for all vehicle models on February 11, 2026. The core of this major update is the deep integration of AI large model capabilities and upgrading to Huawei's latest intelligent driving system. The most notable highlight is the official launch of the MoLA large model assistant, which has significantly enhanced semantic understanding capabilities, supports word composition, character splitting, and knowledge correction, allowing more accurate understanding of user commands.

Feb 11, 2026

149.9k

Intelligent Future, Your Artificial Intelligence Solution Think Tank

English 简体中文繁體中文にほんご