Domestic leading video large model Vidu has announced the official launch of the "one-click MV generation" feature on its open platform. The introduction of this technology marks the transition of video creation from material splicing to a fully automatic end-to-end generation era. Now, users only need to provide background music, reference images, and simple text instructions, and the system can output high-quality MVs with excellent image quality and coherent storytelling within minutes.

The core of this feature lies in a deeply collaborative multi-agent system. Upon receiving user instructions, multiple specialized AI agents within the system immediately start precise collaboration. First, the director agent deeply analyzes the music structure and lyrics, planning the overall narrative flow; then, the storyboard agent transforms the creative ideas into detailed shot descriptions, including professional parameters such as shot size, camera movement, and duration.
In the visual generation phase, the system demonstrates extremely high industrial-level stability. Through the "multi-image reference to video" technology, creators can upload up to 7 reference images as visual anchors. Based on this, the visual generation agent maintains a highly consistent character, scene, and aesthetic style throughout a five-minute video. Finally, the editing and synthesis agent is responsible for transitions between shots and automatically generates dynamic subtitles synchronized frame by frame with the lyrics.
This "fully automated production" model requires no human intervention, greatly lowering the barrier to professional video creation. For creators, Vidu is no longer just a generation tool, but an integrated virtual production factory, making complex MV production as simple as sending instructions.
Key points:
🎬 Full automation of multi-agent collaboration: The system includes four intelligent agents: director, storyboard, visual generation, and editing, achieving full-process automation from music analysis to final output.
🖼️ Industrial-level style consistency: Supports up to 7 reference images for positioning, ensuring that character and scene styles do not drift throughout a five-minute video.
🎵 Precise audio-visual synchronization: AI can automatically identify the rhythm of background music and complete transitions, while generating frame-by-frame synchronized dynamic subtitles, allowing delivery within minutes.
