On June 5, at the 2026 AI Industry Application Conference, Tencent Cloud's audio and video services officially launched the AI-native capability foundation WAND. Relying on over 20 years of technical accumulation, Tencent Cloud's audio and video services have comprehensively upgraded from the underlying model, media capabilities to the access method. The AI capabilities for audio and video media will be opened to the industry in an Agent-Native mode, achieving a strategic upgrade from providing single-point media processing capabilities to a native media base for AI applications and Agents.

The WAND architecture consists of three layers: model engine, capability layer, and scenario solutions, including six self-developed media-specific models for encoding/decoding, enhancement, erasure, generation, understanding, and audio, supplementing the shortcomings of mainstream generative large models in the media production process

WAND Capability Architecture Diagram
In real business scenarios, WAND demonstrates high adaptability and efficiency advantages
Additionally, facing high concurrency and extremely low latency requirements in sports live streaming scenarios, WAND integrates identification, generation, synthesis, and encoding into a fully automated process through self-developed model collaboration scheduling, saving more than 50% of the bitrate compared to traditional solutions, and has supported thousands of global top-tier events so far
As the leader maintaining the first market share in China and overseas for 11 consecutive times, Tencent Cloud's audio and video services are accelerating the audio and video capabilities to become production-level tools that can be uniformly scheduled by Agents, fully empowering innovation in audiovisual applications in the AI Agent era
