The identity of the "mysterious model" that has been lingering in the field of embodied intelligence for three weeks has finally been revealed. Previously, a model named MotuBrain quietly topped two major international benchmarks for physical world understanding and action execution, sparking widespread speculation in the industry. Recently, Shengshu Technology, which has gained fame through the video large model Vidu, officially announced that this model is its latest commercial achievement in the field of embodied intelligence.

This "cross-border" effort is no mere experiment. MotuBrain set new records in WorldArena (assessing physical world understanding) and RoboTwin 2.0 (assessing action execution). Especially in complex environments with simulated random disturbances, it was the only model to achieve an average score above 95, demonstrating strong generalization capabilities.

image.png

"See and Act": Breaking the Boundaries Between Perception and Action

Differing from traditional "imagine first, then act" models, MotuBrain adopts an innovative "World Action Model" approach. This "see and act" design allows robots to simulate while making decisions, ensuring that prediction and execution errors do not amplify each other, greatly improving response speed.

In practical demonstrations, the robot equipped with this system showed a high level of intelligence. In a hot pot scenario, the robot could visually determine if the spoon was empty and decide autonomously whether to retrieve it again, rather than rigidly repeating preset actions. This "reading the room" ability marks the transition of robots from simple mechanical execution to true intelligent decision-making.

image.png

One Brain, Multiple Forms: Smooth Integration of Long-Term Tasks

The core advantage of MotuBrain lies in its strong versatility. It not only supports "one brain, multiple forms," adapting to different degrees of freedom and sensor-equipped robot bodies, but also possesses long-term task processing capabilities. In demonstrations such as flower arranging, mixing cocktails, and tidying up the sofa, the robot can complete more than 10 atomic actions continuously, with a smooth process requiring no human intervention.

Data shows that as the variety of tasks increases, the learning success rate of MotuBrain tends to rise. This indicates that the model has mastered the universal underlying laws of the physical world, rather than memorizing action templates. The more diverse the tasks, the better its performance.

Establishing a Presence in the Physical World, Pursuing Dual Tracks of Digital and Physical Realms

The strength demonstrated by Shengshu Technology stems from its deep technical foundation. Through the world-first U-ViT architecture, the company achieved unification between digital world generation (VGM) and physical world execution (WAM). On one hand, Vidu generates virtual worlds, while on the other, MotuBrain drives physical interactions. This dual-track strategy gives it a significant advantage in data acquisition costs and model iteration speed.

Currently, Shengshu Technology has reached strategic partnerships with several companies, including WuJie Dynamics and XingChen Intelligence. As the focus of competition in embodied intelligence shifts, model developers with a general-purpose "brain" are becoming key forces in reshaping the industry landscape.