1. Home
  2. Product
  3. OpenAI GPT-4o Audio Models
This is a launch from GPT-4o
See 8 previous launches
OpenAI GPT-4o Audio Models

OpenAI GPT-4o Audio Models

Build Powerful Voice Agents
Top Product
OpenAI GPT-4o Audio Models was ranked #3 of the day for March 21st, 2025
New OpenAI audio models for developers: gpt-4o powered speech-to-text (more accurate than Whisper) and steerable text-to-speech. Build voice agents, transcriptions, and more.

Meet the team

OpenAI GPT-4o Audio Models gallery image
OpenAI GPT-4o Audio Models gallery image
OpenAI GPT-4o Audio Models gallery image
OpenAI GPT-4o Audio Models gallery image
OpenAI GPT-4o Audio Models gallery image

What do you think? …

Best
Zac Zuo
Hunter
📌

Hi everyone!


Voice is the future, and OpenAI's new audio models are accelerating that shift! They've just launched three new models in their API:

  • 🎤 gpt-4o-transcribe & gpt-4o-mini-transcribe (STT): Beating Whisper on accuracy, even in noisy environments. Great for call centers, meeting transcription, and more.

  • 🗣️ gpt-4o-mini-tts (TTS): This is the game-changer. Steerable voice output – you control the style and tone! Think truly personalized voice agents.

  • 🛠️ Easy Integration: Works with the OpenAI API and Agents SDK, supporting both speech-to-speech and chained development.

Experience the steerable TTS for yourself: OpenAI.fm

Kirill Belov

Can it translate voice in real time stream?

Zac Zuo
Hunter

@kirill_a_belov Think it's still not a single API call, you'll need to chain together a few different APIs to do real-time translation. STT-LLM(for text translation)-TTS.

Kirill Belov

@zaczuo Yep. I've heard Apple will have such feature in new air pods.

André J

The alloy and shimmer voices always sounded 10x better than the others. And tbh. Having tried 11labs a lot. Alloy and Shimmer is the bar to beat. Love the testing UX on openai.fm tho. Used to be only able to test these voices in open-ai's internal playground dashboard.

Zac Zuo
Hunter

@sentry_co I’ve been working on voice-based AI apps, so I always keep a close eye on AI capabilities in the audio. Clearly, current LLMs still have some way to go in achieving native end-to-end audio processing—handling input without converting to text via ASR and output without generating text before TTS.

After all, humans can listen and speak before learning to read, and even illiterate people communicate just fine in society, right? Speech carries the core of communication, with emotions, tones, and nuances that can’t be fully conveyed when flattened into text.

We might be observing significant progress through more emotional audio output, which helps us refine our understanding of audio input.

André J

Yeah we are 90% there. It's just that the last 10% will take 90% of the effort 😅. Maybe. I do feel that I get the best results with AI voices, when I run the same text a few times over, and then The AI will slightly change on each iteration. Then I cherry pick which segments I like the most from each iteration. And then put it all together. I think this process could be done by an AI tho. Maybe the cherry picking part is hard for an AI, because it doesnt understand which is better.

Zac Zuo
Hunter

@sentry_co True. And the rest 10% might bring 90% of the total impact ;)

Ilya
Excuse me, but how can I try this?
About this launch
GPT-4o
GPT-4o
Fast, intelligent, flexible GPT model
4.72 out of 5.0
421
Points
Point chart
19
Comments
Comments chart
#3
Day Rank
#17
Week Rank
OpenAI GPT-4o Audio Models by was hunted by in Artificial Intelligence, Audio, Development. Made by , , , , , , , and . Featured on March 21st, 2025. is rated 4.7/5 by 1,011 users. It first launched on March 14th, 2023.
Trending launches