Meta has another new AI model on the docket, and this one seems perfectly engineered for the land of tomorrow if that utopian future is filled with nothing but deepfakes and modified audio. Like AI image generators, Voicebox generates synthetic voices based on a simple text prompt from scratch — or, in actuality — sound from thousands of audiobooks.
On Friday, Meta announced its new Voicebox AI that can create voice clips using simple text prompts. In a video, CEO Mark Zuckerberg shared on his Facebook and Instagram, he said the Voicebox AI model can take a text prompt and read it in a variety of human, though somewhat digital-sounding, voices. Otherwise, Voicebox can also modify audio to remove unwanted noises from voice clips, like a dog barking in the background. Unlike many other AI voice synthesisation models, Meta’s AI can create audio in languages other than English, including French, Spanish, German, Polish, and Portuguese, and the company said the AI can effectively translate any passage from one language to another, while keeping the same voice style.
According to Meta, Voicebox can take an audio sample as short as two seconds long and then match that audio style for a text-to-speech generation. If true, it’s more sophisticated than other synthesisation models like Speechify or ElevenLabs, which normally require a fair bit more data before they can generate a quality synthetic voice.
In Meta’s promotional clip, one of the voices being modified does sound uncannily like Zuckerberg himself. Depending on how capable the model truly is, hearing Zuck does bring to mind some of the deepfakes modelled after the Meta CEO.
Unlike the company’s many other AI releases as of late, Voicebox isn’t going open source upon its debut, all of which brings to mind that Meta could be restricting its latest AI release because of potential harms that could result. While some folks online have used similar programs to craft synthesized voice clips of their favourite characters in media for fun, others have used them in harassment campaigns against the voice actors themselves. So it could be trying to prevent harm or it could be saving this potentially lucrative model for some future enterprise.
According to the Voicebox research paper, the system was trained on more than 50,000 hours of unfiltered, unenhanced speech from English audiobooks and another 60,000 hours of listening from multilingual audiobooks. That’s why in Meta’s video, the synthetic speech sounds less conversational, and more like somebody reading a child a bedtime story. The researchers said they would eventually scale the model to include more casual speech.
The model is also limited in that users cannot independently control what kind of voice the AI apes and the emotionality of a different speech sample.
But what is most concerning is that Meta doesn’t seem to address the elephant in the room with its latest paper. The researchers did not say which audiobooks were used to train the AI, and where they came from. It’s unclear if the tens of thousands of hours of audiobooks would be equivalent to many thousands of audiobooks.
Gizmodo reached out to Meta for more information about which audiobooks were used in the training data. A Meta spokesperson said they were “public domain” audiobooks, though the company declined to articulate where the company downloaded these books.
Voice actors have not been especially happy with the proliferation of AI, and are especially concerned about contracts allowing for companies to synthesise their voices without compensation. Apple has already taken heat for quietly launching a series of books narrated by AI-generated voices. The tech giant has reportedly approached several major audiobook publishers to create these new AI-narrated stories.
Considering how the audiobook market revenue has been growing by double digits year after year, and the way creative industries are salivating at reducing labour costs, this latest model could prove yet another headache for voice professionals.
The Cheapest NBN 50 Plans
It’s the most popular NBN speed in Australia for a reason. Here are the cheapest plans available.