Unifollm-VLA-0, a large model developed by Zhiyu, has been officially open-sourced. As the visual-language-action (VLA) model in the Unifollm series specifically designed for general-purpose humanoid robot operations, it marks a crucial step in the evolution of robotic brains from mere "text and image understanding" to embodied intelligence with "physical common sense."

QQ20260130-093721.jpg

Technological Breakthrough: Deep Integration from Perception to Action

Unifollm-VLA-0 aims to break the limitations of traditional vision-language models (VLMs) in physical interaction:

Embodied Brain Evolution: Through continuous pre-training on robot operation data, the model can understand the interaction rules of the physical world, rather than just staying at the semantic level.

Spacial Detail Alignment: The model integrates text instructions with 2D/3D spatial details, significantly enhancing its spatial perception and positional reasoning capabilities in complex environments.

Dynamics Constraints: It incorporates action chunk prediction and forward/backward dynamics constraints, achieving unified modeling of long-term action sequences.

QQ20260130-093737.jpg

R&D Architecture: Secondary Evolution Based on Qwen2.5-VL

Zhiyu refined the model using a systematically cleaned multi-task dataset:

Core Base: Built upon the open-source Qwen2.5-VL-7B model.

Efficient Training: Achieved high-quality task generalization with only about 340 hours of real-robot data for discrete action prediction training.

Performance Evaluation: In spatial understanding benchmark tests, its performance not only far exceeded the base model but could even rival Gemini-Robotics-ER1.5 in specific scenarios.

QQ20260130-093746.jpg

Practical Performance: A Single Strategy Handles 12 Complex Tasks

The validation results on Zhiyu's G1 humanoid robot platform are impressive:

Multi-task Generality: The model can stably complete 12 complex operational tasks, including object grasping and placement, under the same strategy network (checkpoint).

Strong Robustness: Real-robot experiments show that even when facing external disturbances, the robot maintains good execution stability and anti-interference capability.

Currently, Zhiyu has fully released the model code and related materials on GitHub and the project homepage, aiming to help global developers jointly promote the commercialization of general-purpose humanoid robots.