• Home  
  • OpenAI launches AI models for voice generation, transcription
- News

OpenAI launches AI models for voice generation, transcription

OpenAI is introducing new AI models for transcription and voice generation, promising significant improvements over previous releases. These models align with OpenAI’s broader vision of developing autonomous AI agents capable of handling tasks independently. OpenAI’s Head of Product, Olivier Godemont, described one possible application as AI-powered chatbots capable of engaging with customers in natural conversation. […]

OpenAI unveils major upgrade to ChatGPT’s image-generation capabilities

OpenAI is introducing new AI models for transcription and voice generation, promising significant improvements over previous releases.

These models align with OpenAI’s broader vision of developing autonomous AI agents capable of handling tasks independently.

OpenAI’s Head of Product, Olivier Godemont, described one possible application as AI-powered chatbots capable of engaging with customers in natural conversation. “We’re going to see more and more agents pop up in the coming months,” Godemont said. “The goal is to help customers and developers leverage AI agents that are useful, available, and accurate.”

The new text-to-speech model, “gpt-4o-mini-tts,” enhances speech synthesis, offering greater nuance and realism. Developers can adjust the model’s speech style with natural language prompts, instructing it to sound, for example, like a “mad scientist” or a “serene mindfulness teacher.”

Jeff Harris, an OpenAI product staff member, emphasized the model’s flexibility. “In different contexts, you don’t just want a flat, monotonous voice,” Harris explained. “For customer support, if an apology is needed, the voice can convey that emotion.”

On the transcription side, OpenAI is introducing “gpt-4o-transcribe” and “gpt-4o-mini-transcribe,” replacing its older Whisper model. Trained on diverse, high-quality datasets, these new models improve speech recognition, especially in noisy environments, and reduce hallucination errors—an issue that plagued Whisper.

However, performance varies by language. OpenAI’s internal benchmarks indicate that gpt-4o-transcribe has a word error rate of nearly 30% for Indic and Dravidian languages such as Tamil, Telugu, Malayalam, and Kannada, meaning three out of ten words may be misinterpreted.

Unlike previous transcription models, OpenAI does not plan to open-source these models, citing their larger size and higher computational requirements. Harris noted, “We want to ensure that open-source releases are thoughtful and tailored to specific needs.”