Unifollm-VLA-0, a large model developed by Zhiyu, has been officially open-sourced. As the visual-language-action (VLA) model in the Unifollm series specifically designed for general-purpose humanoid robot operations, it marks a crucial step in the evolution of robotic brains from mere "text and image understanding" to embodied intelligence with "physical common sense."

Technological Breakthrough: Deep Integration from Perception to Action
Unifollm-VLA-0 aims to break the limitations of traditional vision-language models (VLMs) in physical interaction:
Embodied Brain Evolution: Through continuous pre-training on robot operation data, the model can understand the interaction rules of the physical world, rather than just staying at the semantic level.
Spacial Detail Alignment: The model integrates text instructions with 2D/3D spatial details, significantly enhancing its spatial perception and positional reasoning capabilities in complex environments.
Dynamics Constraints: It incorporates action chunk prediction and forward/backward dynamics constraints, achieving unified modeling of long-term action sequences.

R&D Architecture: Secondary Evolution Based on Qwen2.5-VL
Zhiyu refined the model using a systematically cleaned multi-task dataset:
Core Base: Built upon the open-source Qwen2.5-VL-7B model.
Efficient Training: Achieved high-quality task generalization with only about 340 hours of real-robot data for discrete action prediction training.
Performance Evaluation: In spatial understanding benchmark tests, its performance not only far exceeded the base model but could even rival Gemini-Robotics-ER1.5 in specific scenarios.

Practical Performance: A Single Strategy Handles 12 Complex Tasks
The validation results on Zhiyu's G1 humanoid robot platform are impressive:
Multi-task Generality: The model can stably complete 12 complex operational tasks, including object grasping and placement, under the same strategy network (checkpoint).
Strong Robustness: Real-robot experiments show that even when facing external disturbances, the robot maintains good execution stability and anti-interference capability.
Currently, Zhiyu has fully released the model code and related materials on GitHub and the project homepage, aiming to help global developers jointly promote the commercialization of general-purpose humanoid robots.
