OpenAI launches AI models for voice generation, transcription

OpenAI is introducing new AI models for transcription and voice generation, promising significant improvements over previous releases.

These models align with OpenAI’s broader vision of developing autonomous AI agents capable of handling tasks independently.

OpenAI’s Head of Product, Olivier Godemont, described one possible application as AI-powered chatbots capable of engaging with customers in natural conversation. “We’re going to see more and more agents pop up in the coming months,” Godemont said. “The goal is to help customers and developers leverage AI agents that are useful, available, and accurate.”

The new text-to-speech model, “gpt-4o-mini-tts,” enhances speech synthesis, offering greater nuance and realism. Developers can adjust the model’s speech style with natural language prompts, instructing it to sound, for example, like a “mad scientist” or a “serene mindfulness teacher.”

Jeff Harris, an OpenAI product staff member, emphasized the model’s flexibility. “In different contexts, you don’t just want a flat, monotonous voice,” Harris explained. “For customer support, if an apology is needed, the voice can convey that emotion.”

On the transcription side, OpenAI is introducing “gpt-4o-transcribe” and “gpt-4o-mini-transcribe,” replacing its older Whisper model. Trained on diverse, high-quality datasets, these new models improve speech recognition, especially in noisy environments, and reduce hallucination errors—an issue that plagued Whisper.

However, performance varies by language. OpenAI’s internal benchmarks indicate that gpt-4o-transcribe has a word error rate of nearly 30% for Indic and Dravidian languages such as Tamil, Telugu, Malayalam, and Kannada, meaning three out of ten words may be misinterpreted.

Unlike previous transcription models, OpenAI does not plan to open-source these models, citing their larger size and higher computational requirements. Harris noted, “We want to ensure that open-source releases are thoughtful and tailored to specific needs.”

6 Uncomplicated Ways to Get Studying Fun

Acestea sunt semne bune că vor oferi mult

Investment commission pledges FDI support to improve Nigeria’s

Nigeria’s gas revolution will lift West Africa’s energy

Contact Info

Some Populer Post

Japan eyes relaxed car safety regulations amid US trade

Cardoso reiterates CBN’s commitment to transparent monetary reforms

Mixed reactions as Anambra govt demolishes Innoson Motors’ headquarters

Air Peace halts all flights to Enugu’s airport over

OpenAI launches AI models for voice generation, transcription

Tagged:

Google sorts mail by relevance with AI-powered update

Spotify launches ‘Concerts Near You’ playlist for event...

Japan eyes relaxed car safety regulations amid US.

Cardoso reiterates CBN’s commitment to transparent monetary reforms

Mixed reactions as Anambra govt demolishes Innoson Motors’.

Air Peace halts all flights to Enugu’s airport.