---
tags:
- mms
- xlsr

license: cc-by-nc-4.0
datasets:
- google/fleurs
- mozilla-foundation/common_voice_8_0
metrics:
- wer
- cer
---

# Massively Multilingual Speech (MMS) - Finetuned ASR - ALL

This is a checkpoint of [MMS Zero-shot project](https://arxiv.org/abs/2407.17852), a model to transcribe the speech of almost any language using only a small amount of unlabeled text in the new language.
The approach is based on a multilingual acoustic model trained on data in 1,150 languages (leveraging the data of [MMS](https://ai.meta.com/blog/multilingual-model-speech-recognition/)) which outputs transcriptions in an intermediate representation ([uroman](https://github.com/isi-nlp/uroman) tokens).
A small amount of text in the new, unseen language is then also mapped to the this intermediate representation and at infernce time, this mapping, with an optional language model, enables transcribing a new language.

## Table Of Content

- [Example](#example)
- [Model details](#model-details)
- [Additional links](#additional-links)

## Example

Please have a look at [the official space](https://huggingface.co/spaces/mms-meta/mms-zeroshot/tree/main) for an example on using the model.

## Model details

- **Developed by:** Jinming Zhao et al.
- **Model type:** Scaling A Simple Approach to Zero-Shot Speech Recognition
- **License:** CC-BY-NC 4.0 license
- **Num parameters**: 300 million
- **Cite as:**

      @article{zhao2024scaling,
        title={Scaling A Simple Approach to Zero-Shot Speech Recognition},
        author={Zhao, Jinming and Pratap, Vineel and Auli, Michael},
        journal={arXiv preprint arXiv:2407.17852},
        year={2024}
      }

## Additional Links

- [Paper](https://arxiv.org/abs/2407.17852)
- [GitHub Repository](https://github.com/facebookresearch/fairseq/tree/main/examples/mms/zero_shot)
- [Official Space](https://huggingface.co/spaces/mms-meta/mms-zeroshot)