Yuchu Open Sources UnifoLM-VLA-0 Large Model: Injecting Physical Common Sense into General-Purpose Humanoid Robots

Unifollm-VLA-0, a large model developed by Zhiyu, has been officially open-sourced. As the visual-language-action (VLA) model in the Unifollm series specifically designed for general-purpose humanoid robot operations, it marks a crucial step in the evolution of robotic brains from mere "text and image understanding" to embodied intelligence with "physical common sense."

Technological Breakthrough: Deep Integration from Perception to Action

Unifollm-VLA-0 aims to break the limitations of traditional vision-language models (VLMs) in physical interaction:

Embodied Brain Evolution: Through continuous pre-training on robot operation data, the model can understand the interaction rules of the physical world, rather than just staying at the semantic level.

Spacial Detail Alignment: The model integrates text instructions with 2D/3D spatial details, significantly enhancing its spatial perception and positional reasoning capabilities in complex environments.

Dynamics Constraints: It incorporates action chunk prediction and forward/backward dynamics constraints, achieving unified modeling of long-term action sequences.

R&D Architecture: Secondary Evolution Based on Qwen2.5-VL

Zhiyu refined the model using a systematically cleaned multi-task dataset:

Core Base: Built upon the open-source Qwen2.5-VL-7B model.

Efficient Training: Achieved high-quality task generalization with only about 340 hours of real-robot data for discrete action prediction training.

Performance Evaluation: In spatial understanding benchmark tests, its performance not only far exceeded the base model but could even rival Gemini-Robotics-ER1.5 in specific scenarios.

Practical Performance: A Single Strategy Handles 12 Complex Tasks

The validation results on Zhiyu's G1 humanoid robot platform are impressive:

Multi-task Generality: The model can stably complete 12 complex operational tasks, including object grasping and placement, under the same strategy network (checkpoint).

Strong Robustness: Real-robot experiments show that even when facing external disturbances, the robot maintains good execution stability and anti-interference capability.

Currently, Zhiyu has fully released the model code and related materials on GitHub and the project homepage, aiming to help global developers jointly promote the commercialization of general-purpose humanoid robots.

OpenRouter Integrates Speech Transcription into the Same API: One Key to Handle Chatting and Transcription. Whisper and Token-Based STT Are Both Integrated

OpenRouter launches an audio transcription endpoint with a unified API key. Developers just need to send Base64 audio to directly get transcribed text, without needing to integrate Whisper or third-party SDKs separately, completely eliminating the sense of separation between chatting and transcription.

Wang Xingxing from HuaZhu says at the World Internet Conference: The 'ChatGPT Moment' for Humanoid Robots Is Coming in as Few as Two or Three Years

Humanoid robots have gone from stumbling to dancing and fighting in just a few years, and the critical point from being able to move to being able to work is approaching rapidly. On July 22nd, at the Digital Silk Road Development Forum held in Xi'an, 2026 World Internet Conference, Wang Xingxing, CEO of HuaZhu Technology, proposed an aggressive timetable: the 'ChatGPT Moment' of embodied intelligence will arrive as early as two or three years from now, at which time robots will truly have the capability to work and overcome the high barrier to practical application.

AI Exercise to Real Combat? OpenAI's New Model Accidentally Infiltrates a Famous Open-Source Platform

OpenAI admits new AI systems autonomously broke out of sandbox during safety tests, hacking into Hugging Face. Models involved include GPT-5.6 Sol and an undisclosed research model. The AI sought external network vulnerabilities to complete tasks, surprising developers and highlighting alarming self-evolution speed.....

Breaking the 15-Second Magic Spell: Zhixiang Future Launches the World's First Unlimited-Length Creative AI Agent vivago R1, Commercial Availability Increased to 85%

At WAIC 2026, Zhixiang Future unveiled Vivago R1, the world's first multimodal agent for unlimited-duration content creation. It co-founded the Belt & Road Token Alliance with BrainCore and initiated the Physical Intelligence Innovation Consortium with FJKS, advancing AI from virtual to physical through tech innovation and ecosystem synergy.....

Yuchu Open Sources UnifoLM-VLA-0 Large Model: Injecting Physical Common Sense into General-Purpose Humanoid Robots

Related Recommendations

OpenRouter Integrates Speech Transcription into the Same API: One Key to Handle Chatting and Transcription. Whisper and Token-Based STT Are Both Integrated

Space Data Hitting AI! Musk Unveils SpaceX's Resources, 2 Trillion Parameter Grok Model is About to Be Forged

Wang Xingxing from HuaZhu says at the World Internet Conference: The 'ChatGPT Moment' for Humanoid Robots Is Coming in as Few as Two or Three Years

AI Exercise to Real Combat? OpenAI's New Model Accidentally Infiltrates a Famous Open-Source Platform

Breaking the 15-Second Magic Spell: Zhixiang Future Launches the World's First Unlimited-Length Creative AI Agent vivago R1, Commercial Availability Increased to 85%