An artificial intelligence research company OpenAI has provided an in-depth look at their new synthetic voice generator called Voice Engine. This cutting-edge technology allows users to create naturalistic speech that closely mimics a real person's voice, using just a short 15-second audio sample as input.
OpenAI first developed Voice Engine in late 2022 to power the preset voice options in their text-to-speech API. However, they recognized the immense potential but also the risks of such realistic synthetic voice capabilities. As a result, OpenAI has taken a cautious approach, only privately testing Voice Engine with a small group of trusted partners rather than a wide public release.
OpenAI outlines several compelling use cases that these early partners have explored with Voice Engine:
- Education companies like Age of Learning generate expressive voices to assist children's reading and engage with AI tutors more naturally.
- Translation platforms like Synthesia use Voice Engine to accurately convey a speaker's original accent and speech patterns when translating videos into other languages.
- Healthcare organizations like Dimagi create synthetic voices in local languages and dialects to improve access to vital medical information and services.
- Accessibility apps like Livox offer more natural-sounding synthetic voice options for non-verbal individuals using augmentative communication devices.
- Medical institutions explore ways to restore patients' voices after conditions like strokes or brain tumours, using just a short sample of their previous speech.
While promising, OpenAI also directly acknowledges the "serious risks" of synthetic voice technology, especially around deception, privacy violations, and the potential for misuse. They have implemented safeguards like watermarking, voice authentication, and explicit disclosure requirements for their partners.
However, OpenAI states that wider societal preparation is still needed, suggesting phasing out voice authentication for security, policies around voice privacy, public education on AI capabilities, and better audiovisual forensics to detect synthetic media.
The company states they are choosing not to widely release Voice Engine yet, favouring an open dialogue with "policymakers, researchers, developers and creatives" on thoughtfully navigating the challenges and opportunities of this rapidly advancing technology.
OpenAI's blog post provides a candid look at one of the most sophisticated text-to-speech systems yet developed. While synthetic voices could revolutionize accessibility, education, and communications, OpenAI is taking a measured approach as major ethical quandaries remain around voice privacy, impersonation, and deception.