Meta took a step towards a universal language translator on Tuesday with the release of its new Seamless M4T AI model, which the company says can quickly and efficiently understand language from speech or text in up to 100 languages and generate translation in either mode of communication. Multiple tech companies have released similar advanced AI translation models in recent months.
In a blog post, Meta describes its new translation system as “the first all-in-one multimodal and multilingual AI translation model” capable of speech recognition and speech-to-text translation for nearly 100 different languages. The model can also interpret speech and text and spit back out translated spoken words for 36 and 35 languages respectively. Seamless M4T can also reportedly understand when users change languages mid-sentence, which could help when using a model to translate people who mix parts of languages together when they speak, which language researchers refer to as codeswitching.
“SeamlessM4T is a unified multilingual model, meaning that it doesn’t rely on intermediate models to produce results,” Meta Research Scientist Manager Paco Guzmán told Gizmodo. “Other cascaded systems for spoken translation often do: speech recognition, text translation, text-to-speech generation. SeamlessM4T does it in a single go.”
In a video demo, Guzmán spoke the sentence “our goal is to create a more connected world.” The model quickly recognized the language spoken was English and then translated that into Russian. A computerized Russian voice spat the sentence back out with a more or less human timbre.
Unlike other past translation models, SeamlessM4T uses one single system which Meta believes will ultimately result in reduced errors and delays and increased quality. Meta compared this all-in-one translator approach to the Babel fish universal translator in The Hitchhiker’s Guide to The Galaxy. For now, you won’t have to shove this one in your ear.
Meta is releasing Seamless M4TT under a Creative Commons license so other translators and AI researchers can build off of it. The company is also releasing the metadata of SeamlessAlign, which contains over 270,000 hours of mined speech and text. Meta claims it’s the largest dataset of its kind.
Though much of new AI in recent months has pointed out the unreliability of using large language models for delivering accurate factual information, language translation is something these models are actually well-suited for. Seamless M4T is made possible by Meta’s previous iterations in translation models. One of those broke new ground by successfully translating the primarily spoken language Hokkien into spoken words, a first for a new model. More recently, the company released its Massively Multilingual Speech system, which Meta claims can provide automatic speech detection and language identification for more than 1,100 languages.