Inworld builds the infrastructure for production voice AI. One platform with speech-to-text, an LLM router, and the top-ranked text-to-speech, all connected on a single API so context flows between every layer. Used by developers building voice agents, AI companions, and conversational apps.
This is the 5th launch from Inworld. View more

Realtime TTS-2
Launching today
Realtime TTS 1.5 is #1 on Artificial Analysis, voted best in blind tests by thousands of real users. TTS-2 builds on that with six major upgrades: natural language voice direction for tone, emotion, speed, and pitch. Text-based voice design, where you describe a voice in words and generate it. Cross-lingual synthesis across 100+ languages preserving speaker identity. IPA phonetic control for brand names and rare words. And improved alphanumeric pronunciation. Try it free at inworld.ai/tts.








Free Options
Launch Team / Built With








Inworld
Hi Product Hunt! We're back! I'm Kylan, CEO and co-founder of @Inworld.
Some of you might remember when we launched Inworld TTS here. It went on to become the #1 ranked voice AI on Artificial Analysis, voted best in blind listening tests by thousands of real users. That meant a lot to us, so we went back and rebuilt the model from the ground up.
Today we're launching Realtime TTS 2.0. Try the live speech-to-speech experience at realtime.ai.
Here's the thing we kept hearing from builders: voice AI was built for audiobooks and voiceovers. It sounds good, but it sounds like a human reading from a script. If you've ever talked to a voice agent and thought "something feels off," that's why. Realtime conversation is a completely different problem, and we decided to solve it.
What can you build with it?
Companion apps that adapt to your user's mood and tone in real time through natural language voice direction
Language tutors that switch languages mid-session with the same voice, no re-recording
Characters that sound exactly how you describe them with text-based voice design
Support agents that get every code, name, and number right with improved alphanumeric handling and International Phonetic Alphabet (IPA) support
So what actually changed?
Natural conversationality. We trained the model on conversational speech instead of narration. You get natural rhythm, breath, micro-pauses, the cadence humans actually use when they talk to each other. Every voice you build on TTS 2.0 sounds like a person in conversation, not a narrator.
Conversational awareness. TTS 2.0 is informed by the full audio context of the multi-turn exchange. Not just the current sentence, the whole conversation. How it speaks adapts to how it was spoken to. A line delivered after a joke lands differently than the same line after bad news. The model knows the difference because it heard what came before.
Full voice direction. You steer the model with natural language the way you'd direct a voice actor. Not preset emotion tags, full descriptions: "act like you just got home from a long day, tired but warm." Combined with inline controls for specific moments ([whispering], [sigh], [excited]), the voice is as controllable as it is expressive.
Text-based voice design. Describe a voice in plain text, generate it. "A posh british man, aged 30-40, speaking deliberately" Iterate on the prompt until it fits, save it, deploy it. No casting calls, no recording booth.
Crosslingual fluency. One voice across 100+ languages with on-the-fly switching inside a single generation. Your voice identity is preserved across every language. No re-recording, no managing separate voices per locale.
Realtime TTS 1.5 is still #1 on the leaderboard. TTS 2.0 takes that quality and adds everything that was missing to uplevel realtime conversation.
Learn more at inworld.ai/tts. Happy to answer any questions in the comments.
– Kylan
Inworld
Hey everyone, Andreas from the Inworld team! I've been pumped about this launch for weeks and I'm so excited that we finally get share TTS-2 with you all. If you want to hear what it can do, jump into the playground at inworld.ai/tts and try voice design or steering for yourself or play with our realtime demo at realtime.ai. Would love to hear your reactions!
Inworld
@andreasassad My parents are using realtime.ai to practice foreign languages!
The voice control seems to be crazy good, you can just describe the tone and it gets really close without all the tweaking. Feels more usable than most TTS tools I’ve tested. I am gonna test it!
Inworld
#1 TTS just got better!
Inworld