One sentence to instantly adjust the voice! Alibaba Tongyi launches a dual-model for voice: supports 'FreeStyle' natural language control

Today, the Speech Team of Alibaba Tongyi Lab announced the launch of two revolutionary voice generation models: Fun-CosyVoice3.5 and Fun-AudioGen-VD. The biggest highlight of these two models is the support for "FreeStyle" commands. Users do not need complex parameter adjustments; they can precisely control the expression style of the voice or build complex audio scenes from scratch by simply providing a natural language description.

The functions of the two models have different focuses:

Fun-CosyVoice3.5: Multilingual Replication and Fine-grained Control

This model is an upgraded version of the previous CosyVoice, with core breakthroughs in the "understanding ability" of speech expression.

Command-based Generation: Users can input commands such as "speak more confidently" or "slow down the speaking speed and add some emotional fluctuations," and the model will adjust the output in real time.
Language Expansion: It now supports Thai, Indonesian, Portuguese, and Vietnamese, maintaining industry-leading performance in transcription accuracy (WER) and voice similarity across 13 languages.
Obscure Character Optimization: Through specialized optimization, the error rate for obscure characters has been significantly reduced from 15.2% to 5.3%.
Performance Improvement: The first packet delay has been reduced by 35%, greatly improving the smoothness in real-time interaction scenarios.

Fun-AudioGen-VD: Full-scene Sound Design

This model is more like a "sound director," capable of generating integrated audio that includes "characters + scenes."

Sound Customization: It supports specifying gender, age, accent, and even further details such as "hoarse, deep, or low-pitched" characteristics.
Emotion and Role: It can simulate roles such as customer service representatives, announcers, and children, and even express complex psychological states such as "externally calm but internally trembling."
Environmental Immersion: It supports adding background sounds (such as battlefield noise or café chatter) and spatial effects (such as cathedral echoes or underwater sound perception), achieving comprehensive spatial simulation.

The Tongyi Lab stated that the release of these two models will further reduce the barriers to high-quality voice creation, providing powerful AI support for fields such as podcasts, game development, and film post-production.

iQIYI Launches Its First Fully AI-Generated Film 'Soul Ferry · Floating Dream' Scheduled for the Summer of 2026

iQIYI announced that the first fully AI-generated feature film in China, 'Soul Ferry · Floating Dream,' is scheduled for the summer of 2026. Based on the classic IP 'Soul Ferry,' the film is supervised by Guo Jingyu, with the original screenwriter and director team involved, and uses generative AI technology throughout to build characters, scenes, and narratives, marking a transition of AI in film creation from an auxiliary tool to a full-process application.

iQIYI Responds to Controversy Over Signing AI Actors: Entry into the Library Does Not Mean Direct Authorization

iQIYI responds to the rumor of signing AI actors, clarifying that the 'Nadu Pro' platform only provides a regulated collaboration channel for AIGC creators, and is not related to signing real actors. The company has officially explained the issue to dispel the rumors, which were previously misunderstood by artists and fans.

Explosive Update! OpenAI Codex Introduces Digital Memory Function, Screen Recording Becomes Core Productivity

OpenAI introduced the Chronicle feature for Codex, which records users' screen operations in the background to build a memory database for AI, allowing it to grasp users' projects, tools, and progress in real time. Users do not need to repeat explanations in subsequent interactions, and the AI can accurately understand the intent based on the memory.

One sentence to instantly adjust the voice! Alibaba Tongyi launches a dual-model for voice: supports 'FreeStyle' natural language control

Fun-CosyVoice3.5: Multilingual Replication and Fine-grained Control

Fun-AudioGen-VD: Full-scene Sound Design

Related Recommendations

GPT Image2's Team Revealed for the First Time: A Core Team of 13 People, 4 Months to Redesign AI Drawing

iQIYI Launches Its First Fully AI-Generated Film 'Soul Ferry · Floating Dream' Scheduled for the Summer of 2026

Ezviz Launches EZVIZ AI Core X Smart Large Model Host: 64 Tops Computing Power Opens the Era of Home Brain

iQIYI Responds to Controversy Over Signing AI Actors: Entry into the Library Does Not Mean Direct Authorization

Explosive Update! OpenAI Codex Introduces Digital Memory Function, Screen Recording Becomes Core Productivity