OpenAI Releases Three Real-Time Speech Models with GPT-5 Level Reasoning Capabilities

Artificial intelligence giant OpenAI has once again pushed the boundaries of voice interaction, officially launching three new real-time voice models: GPT-Realtime-2, GPT-Realtime-Translate, and GPT-Realtime-Whisper. These three models have already been integrated into the Realtime API for developers to use, aiming to address long-standing pain points in voice interaction, such as high latency, inability to interrupt naturally, and difficulty with multilingual support.

As the highlight of this release, GPT-Realtime-2 is defined as the most intelligent AI voice model currently available and the first voice tool with GPT-5 level reasoning capabilities. Unlike traditional voice assistants, it maintains extremely natural and smooth conversations while performing real-time complex logic reasoning, flexibly calling external tools, and accurately identifying and handling user interruptions or corrections. This breakthrough means that future voice assistants will no longer be simple command executors but real-time collaborative partners capable of handling multi-step complex tasks.

In terms of pricing, the audio input cost for GPT-Realtime-2 is set at $32 per million Tokens (approximately RMB 218), and the output cost is $64 (approximately RMB 436). The cost for cached input is significantly reduced, amounting to only $0.4.

Aside from the core reasoning model, the other two functional models also have their own unique features. GPT-Realtime-Translate demonstrates strong translation performance, supporting real-time conversion between 70 input languages and 13 output languages. Its translation speed is almost synchronized with the speaker, making it suitable for high-demand real-time communication scenarios such as international meetings. GPT-Realtime-Whisper, on the other hand, focuses on achieving ultra-low latency streaming transcription, realizing a "voice follows the person" experience, which greatly reduces the waiting time for meeting notes and real-time subtitles. The billing methods for these two models are more flexible, charged by the minute at $0.034 and $0.017 per minute, respectively.

Industry analysts believe that this series of actions by OpenAI marks the transition of AI voice interaction from "simple response" to "deep real-time understanding," further solidifying its technological leadership in the intelligent era.

Large Model Company Launches Smartphones to Compete with OpenAI: Step Stars to Unveil Its First AI Agent Terminal on July 13th

Jieyue Xingchen to hold July 13 conference themed 'True Agent in the Agent Era,' unveiling next-gen agent terminal products, possibly including AI terminal brand, agent system, and first AI agent phone. Aligns with OpenAI's push for new AI terminals, signaling industry acceleration in agent hardware.....

OpenAI Second-in-Command Resigns: Due to Recurrence of Neuroimmune Disease, Fidji Simo Joins as Part-Time Consultant

Fidji Simo, CEO of OpenAI's Applications division, resigned on July 10 due to a neuroimmune disease relapse, transitioning to a part-time advisor. She joined the board in 2024 and was appointed the new Applications CEO in May 2025, a role integrating business and product operations, overseeing COO and CFO.....

OpenAI Releases Three Real-Time Speech Models with GPT-5 Level Reasoning Capabilities

Related Recommendations

Large Model Company Launches Smartphones to Compete with OpenAI: Step Stars to Unveil Its First AI Agent Terminal on July 13th

OpenAI Releases GPT-5.6, Deeply Integrated with Microsoft Copilot 365 to Dispel the Rumors of Divergence

OpenAI Second-in-Command Resigns: Due to Recurrence of Neuroimmune Disease, Fidji Simo Joins as Part-Time Consultant

OpenAI Integration: Chat, Office, and Programming in One - New ChatGPT Desktop Application Released

OpenAI Releases GPT-5.6 Model Series: Sol, Terra, and Luna Versions Launched, Focusing on Cybersecurity and High Cost-Effectiveness