Article

Voice AI 'Step to Success'! Step Audio Unveils 130B Dominant Voice Model, Real-Time Dialogue + Emotion Cloning, Here It Comes!

Published in Latest AI News

Time :Feb 18, 2025

Read :4minute

A milestone breakthrough in the field of voice interaction! Recently, the domestic AI company Step Audio has shockingly open-sourced a a massive voice model with 130 billion parameters, attracting significant attention from the industry. This powerful model, hailed as "dominant," is the industry's first product-level open-source real-time voice dialogue system that integrates voice understanding and generation control. Its comprehensive functionality and advanced technology are astonishing, indicating that the development of voice AI technology may leap to new heights.

The core highlight of this open-source model lies in its integrated design and powerful control capabilities. It can not only accurately understand user voice commands but also flexibly control the voice generation process, creating an unprecedented personalized voice interaction experience.

In terms of language support, this model demonstrates impressive multilingual capabilities, smoothly switching between Chinese, English, and Japanese, easily handling cross-language communication scenarios. Even more surprisingly, it deeply supports dials, currently covering major dialects such as Cantonese and Sichuanese, making voice interaction closer to everyday life and more relatable.

Besides language, this model can finely control voice emotions, allowing users to freely set the emotional tone of the voice, such as happy or sad, making AI expressions more impactful. The speech rate and prosody style can also be adjusted at will to meet different expressive needs in various contexts. It even goes further by supporting RAP and humming, introducing limitless possibilities for content creation.

Even more astonishing is that this model features voice cloning, meaning users can utilize this technology to create highly personalized voice assistants, even achieving the "replication" and "inheritance" of voices.

Step Audio's open-sourcing of such a powerful voice model will undoubtedly greatly promote technological progress and application innovation across the industry. It not only significantly lowers the barriers to applying voice AI technology but also suggests that future voice interactions will become smarter, more natural, and personalized, truly integrating into people's daily lives.

Project address: https://github.com/stepfun-ai/Step-Audio/tree/main

Related Recommendations

Major Upgrade in Voice Interaction: Claude is Developing Multilingual Support, Bringing a Phone-Call Experience Closer

Anthropic is upgrading Claude's voice mode, breaking through the English limitation, and adding support for multiple languages such as Chinese, Cantonese, Japanese, and German, enhancing the multilingual interaction experience.

Jun 18, 2026

298.8k

OpenAI Voice API Major Upgrade: More Accurate Transcription, 40% Faster Agent Speed

OpenAI launched two API updates to enhance the performance of AI agents in voice interaction and complex tasks. The new real-time model gpt-realtime-1.5 and its accompanying audio model significantly improve the reliability of voice commands. Internal testing shows that the new model has improved digit and letter transcription accuracy by about 10%, logic audio task accuracy by 5%, and instruction execution accuracy by 7%.

Feb 25, 2026

192.1k

Google Launches AI Application for iPhone with Voice Interaction Feature Gemini Live

Google officially launched the new Gemini application on the Apple App Store, introducing the voice interaction feature Gemini Live, marking a significant breakthrough in the smart voice assistant field. Meanwhile, Apple's plan to integrate OpenAI's ChatGPT into Siri also indicates an intensifying competition in this area. As an upgraded version of Bard released by Google in 2023, Gemini is

Nov 18, 2024

208.8k

CNKI Unveils Mobile App for AI-Powered Academic Research Assistant

Recently, CNKI launched the mobile version of its AI Academic Research Assistant, aimed at providing researchers with more convenient academic support. This AI assistant, after receiving widespread acclaim upon its launch on the PC platform, is now available through the CNKI mobile app, catering to users needs for on-the-go access.The main features provided by the AI Academic Research Assistant include:Enhanced Question-Answering Retrieval: Users can ask questions in natural language, and the AI

Jul 18, 2024

230.4k

Battle of the Models: The Spin-off, Battle of 2.0

📱 Smart Speaker Dilemma: The article analyzes the reasons why smart speakers are labeled as "toys", such as poor interaction experience and limited usage scenarios. 💡 Application of Large Models: The article mentions that large models bring new development opportunities for smart speakers and can significantly enhance their interaction experience. 🤝 Industry Competition: The article discusses the competitive landscape in the smart speaker market involving Baidu, Alibaba, Xiaomi, and others.

Oct 11, 2023

149.8k

Intelligent Future, Your Artificial Intelligence Solution Think Tank

English 简体中文繁體中文にほんご