Introducing gpt-realtime. Released a few days ago (August 28th), gpt-realtime
is OpenAI's new "most advanced speech-to-speech model". It looks like this is a replacement for the older gpt-4o-realtime-preview
model that was released last October.
This is a slightly confusing release. The previous realtime model was clearly described as a variant of GPT-4o, sharing the same October 2023 training cut-off date as that model.
I had expected that gpt-realtime
might be a GPT-5 relative, but its training date is still October 2023 whereas GPT-5 is September 2024.
gpt-realtime
also shares the relatively low 32,000 context token and 4,096 maximum output token limits of gpt-4o-realtime-preview
.
The only reference I found to GPT-5 in the documentation for the new model was a note saying "Ambiguity and conflicting instructions degrade performance, similar to GPT-5."
The usage tips for gpt-realtime
have a few surprises:
Iterate relentlessly. Small wording changes can make or break behavior.
Example: Swapping “inaudible” → “unintelligible” improved noisy input handling. [...]
Convert non-text rules to text: The model responds better to clearly written text.
Example: Instead of writing, "IF x > 3 THEN ESCALATE", write, "IF MORE THAN THREE FAILURES THEN ESCALATE."
There are a whole lot more prompting tips in the new Realtime Prompting Guide.
OpenAI list several key improvements to gpt-realtime
including the ability to configure it with a list of MCP servers, "better instruction following" and the ability to send it images.
My biggest confusion came from the pricing page, which lists separate pricing for using the Realtime API with gpt-realtime
and GPT-4o mini. This suggests to me that the old gpt-4o-mini-realtime-preview model is still available, despite it no longer being listed on the OpenAI models page.
gpt-4o-mini-realtime-preview
is a lot cheaper:
Model | Token Type | Input | Cached Input | Output |
---|---|---|---|---|
gpt-realtime | Text | $4.00 | $0.40 | $16.00 |
Audio | $32.00 | $0.40 | $64.00 | |
Image | $5.00 | $0.50 | - | |
gpt-4o-mini-realtime-preview | Text | $0.60 | $0.30 | $2.40 |
Audio | $10.00 | $0.30 | $20.00 |
The mini model also has a much longer 128,000 token context window.
Recent articles
- V&A East Storehouse and Operation Mincemeat in London - 27th August 2025
- The Summer of Johann: prompt injections as far as the eye can see - 15th August 2025
- Open weight LLMs exhibit inconsistent performance across providers - 15th August 2025