Meta’s open-source speech AI recognizes over 4,000 spoken languages

Meta has created an AI language model that (in a refreshing change of pace) isn’t a ChatGPT clone. The company’s Massively Multilingual Speech (MMS) project can recognize over 4,000 spoken languages and produce speech (text-to-speech) in over 1,100. Like most of its other publicly announced AI projects, Meta is open-sourcing MMS today to help preserve language diversity and encourage researchers to build on its foundation. “Today, we are publicly sharing our models and code so that others in the research community can build upon our work,” the company wrote. “Through this work, we hope to make a small contribution to preserve the incredible language diversity of the world.”

Speech recognition and text-to-speech models typically require training on thousands of hours of audio with accompanying transcription labels. (Labels are crucial to machine learning, allowing the algorithms to correctly categorize and “understand” the data.) But for languages that aren’t widely used in industrialized nations — many of which are in danger of disappearing in the coming decades — “this data simply does not exist,” as Meta puts it.

Meta used an unconventional approach to collecting audio data: tapping into audio recordings of translated religious texts. “We turned to religious texts, such as the Bible, that have been translated in many different languages and whose translations have been widely studied for text-based language translation research,” the company said. “These translations have publicly available audio recordings of people reading these texts in different languages.” Incorporating the unlabeled recordings of the Bible and similar texts, Meta’s researchers increased the model’s available languages to over 4,000.

Blog