Today, Wan2.5-Preview is officially released. This new AI model, with its revolutionary architecture and powerful features, aims to reshape the future of visual generation. The new model has made significant breakthroughs in multimodal processing, video generation, and image editing.

Native Multimodal Architecture and Deep Alignment

Wan2.5-Preview adopts a new unified understanding and generation framework, enabling flexible input and output for text, images, videos, and audio. By jointly training these modal data, the model can achieve stronger modal alignment, which is key to achieving audio-visual synchronization and precise instruction following. In addition, the model is optimized through **Reinforcement Learning from Human Feedback (RLHF)** to ensure that the generated image quality and video dynamics meet human aesthetic preferences.

QQ20250924-135532.png

Video Features: Audio-Visual Synchronization and Cinematic Aesthetics

In terms of video generation, Wan2.5-Preview brings multiple innovations:

  • Synchronized A/V Generation: Natively supports high-fidelity, highly consistent video generation, and can generate audio including multiple voices, sound effects, and background music (BGM) simultaneously.

  • Controllable Multimodal Input: Users can use text, images, and audio as input sources, enabling infinite creative combinations.

  • Cinematic Aesthetics: The 1080p high-definition 10-second video generated by the model has strong dynamic and structural stability, and it upgrades the cinematic control system, allowing the creation of works with cinematic beauty.

Image Features: Creativity and Precise Control

Wan2.5-Preview also significantly improves image generation and editing:

  • Advanced Image Generation: The model has significantly improved its ability to follow instructions, capable of generating realistic images, diverse artistic styles, creative layouts, and professional charts.

  • Image Editing: Supports dialog-based, instruction-driven image editing, and can achieve pixel-level precision, used for complex tasks such as multi-concept fusion, material conversion, and product color exchange.

Wan2.5-Preview marks a new stage in AI visual generation technology. Its powerful multimodal capabilities and precise control functions will provide developers and creators with unprecedented tools.