Artificial intelligence giant OpenAI has once again pushed the boundaries of voice interaction, officially launching three new real-time voice models: GPT-Realtime-2, GPT-Realtime-Translate, and GPT-Realtime-Whisper. These three models have already been integrated into the Realtime API for developers to use, aiming to address long-standing pain points in voice interaction, such as high latency, inability to interrupt naturally, and difficulty with multilingual support.
As the highlight of this release, GPT-Realtime-2 is defined as the most intelligent AI voice model currently available and the first voice tool with GPT-5 level reasoning capabilities. Unlike traditional voice assistants, it maintains extremely natural and smooth conversations while performing real-time complex logic reasoning, flexibly calling external tools, and accurately identifying and handling user interruptions or corrections. This breakthrough means that future voice assistants will no longer be simple command executors but real-time collaborative partners capable of handling multi-step complex tasks.
In terms of pricing, the audio input cost for GPT-Realtime-2 is set at $32 per million Tokens (approximately RMB 218), and the output cost is $64 (approximately RMB 436). The cost for cached input is significantly reduced, amounting to only $0.4.
Aside from the core reasoning model, the other two functional models also have their own unique features. GPT-Realtime-Translate demonstrates strong translation performance, supporting real-time conversion between 70 input languages and 13 output languages. Its translation speed is almost synchronized with the speaker, making it suitable for high-demand real-time communication scenarios such as international meetings. GPT-Realtime-Whisper, on the other hand, focuses on achieving ultra-low latency streaming transcription, realizing a "voice follows the person" experience, which greatly reduces the waiting time for meeting notes and real-time subtitles. The billing methods for these two models are more flexible, charged by the minute at $0.034 and $0.017 per minute, respectively.
Industry analysts believe that this series of actions by OpenAI marks the transition of AI voice interaction from "simple response" to "deep real-time understanding," further solidifying its technological leadership in the intelligent era.
