Meta recently introduced SeamlessM4T, a single AI model capable of speech and text translations in multiple languages.
What is SeamlessM4T?
SeamlessM4T is the first of its kind. It’s a multimodal AI model that caters to a wide range of translation and transcription needs:
- Speech recognition for almost 100 language
- Translating from speech to text in nearly 100 input and output languages
- Converting speech-to-speech in nearly 100 input languages and 36 (including English) output languages
- Text-to-text translation for nearly 100 languages
- Translating text to speech in almost 100 input languages and 35 (including English) output languages
Meta’s aim is to make people communicate effortlessly through speech and text across different languages. Turning to literature for inspiration, the origins of SeamlessM4T might surprise you.
From Babel Fish to Reality
SeamlessM4T is fiction-inspired. The Babel Fish from “The Hitchhiker’s Guide to the Galaxy.” In reality, “existing speech-to-speech and speech-to-text systems only cover a small fraction of the world’s languages.”
With this model, Meta aims to address this gap. The uniqueness lies in SeamlessM4T’s single system approach which they say “reduces errors and delays, increasing the efficiency and quality of the translation process.” This improves the way people from different language backgrounds communicate.
This isn’t Meta’s first venture into the world of translation and linguistic technology.
More from News
- UK The First European Country To Invest In Nuclear Fuel
- Student Founders: Oxford’s Onfido Startup Transforms ID Tech
- Sumissura, The Online Made-To-Measure Clothing Brand, Supports Its Campaign With AI
- You Can Now Use Claude AI On Your iPhone
- The Rise Of Bots For Social Media Intimidation
- Experts Recommend Daily Activity Balance: Apps For Better Health
- Bank Holiday Incoming: Apps Helping You Plan Your Long Weekend
- You Can Now Detect AI-Generated Music With This Tool
A Legacy of Advancements
Meta has a history of striving towards creating a universal translator. Previously, they launched ‘No Language Left Behind’, a text-to-text machine translation model for 200 languages, and it’s now even integrated into Wikipedia.
They’ve also released Universal Speech Translator, a first of its kind for Hokkien, a language without a common writing system. Plus, they introduced Massively Multilingual Speech which offers speech recognition and synthesis across over 1,100 languages.
How Does it Work?
Understanding Speech:
Meta’s self-supervised speech encoder, known as w2v-BERT 2.0, analyses millions of hours of multilingual speech. It breaks down the audio signal and forms an understanding of the content.
Processing Text:
The NLLB model, a previous release, forms the basis for the text encoder. It’s trained to understand nearly 100 languages and produce meaningful representations for translation.
Producing Text and Speech:
The team trained the text decoder to take encoded speech or text representations and transform them. A multilingual HiFi-GAN unit vocoder is then used to convert these units into audio.
Data Scaling and Results
SeamlessM4T uses large datasets for optimal functioning. With data scaling, Meta created the largest open speech/speech and speech/text parallel corpus named SeamlessAlign. SeamlessAlign has more than 443,000 hours of speech in its total volume.
When it comes to performance, SeamlessM4T sets a new standard. It achieves top-tier results for almost 100 languages and multitask support across multiple functionalities – all with a single model.
Safety First
Meta prioritises the accuracy of translation systems. They understand the risks of mistranscription or generating outputs that could be toxic or incorrect.
In their words, “we conducted research on toxicity and bias to help us understand which areas of the model might be sensitive.” They have implemented a rigorous toxicity classifier to ensure that any harmful content is filtered out, making it a safer tool for users.
Sharing the Technology
True to its commitment to open science, Meta is sharing this revolutionary model with the public. Their vision to bring the world closer together is a priority, and their overall quest to connect people could be great for us all.
In the words of the Meta team: “This is only the latest step in our ongoing effort to build AI-powered technology that helps connect people across languages.” This is an innovation that will help people understand and be understood, irrespective of the language they speak.