NVIDIA research team officially launched the 3D scene generation system Lyra2.0 on April 16, 2026. This technology aims to build large-scale and highly coherent virtual environments from a single photo, solving the problem of image distortion under long-distance camera paths. In the context of increasing demand for embodied intelligence training, the release of Lyra2.0 marks a major breakthrough in AI's 3D space understanding and real-time environment simulation.
In terms of technology, Lyra2.0 can generate 3D environments extending up to 90 meters from a single photo. To address the spatial distortion and error accumulation caused by "forgetting" in traditional video models, researchers adopted two innovative strategies: the system not only stores 3D geometry information of each frame in real time, ensuring consistency of the environment when the camera returns to previous positions, but also introduces defective output data during training, enabling the model to self-correct. Benchmark test results show that Lyra2.0 outperforms six competitors, including GEN3C and Yume-1.5, in image quality and camera control, and its fast version improves generation efficiency by 13 times.
Currently, Lyra2.0 has achieved seamless integration with physical engines such as Nvidia Isaac Sim, allowing generated 3D scenes to be directly exported as mesh models. This closed-loop process enables robots to conduct efficient simulation training in environments fully generated by AI, significantly reducing reliance on large-scale 3D data collection in the real world. Although the system is currently limited to static scenes, its improvements in 3D generation scale and stability have already provided a more imaginative infrastructure support for the evolution of physical perception in autonomous driving and general-purpose robots (AGI).
