Automatic Speech Recognition
Transformers
Safetensors
wav2vec2
mms
xlsr
Inference Endpoints
mms-zeroshot-300m / README.md
vineelpratap's picture
Update README.md
27decb0 verified
metadata
tags:
  - mms
  - xlsr
license: cc-by-nc-4.0
datasets:
  - google/fleurs
  - mozilla-foundation/common_voice_8_0
metrics:
  - wer
  - cer

Massively Multilingual Speech (MMS) - Finetuned ASR - ALL

This is a checkpoint of MMS Zero-shot project, a model to transcribe the speech of almost any language using only a small amount of unlabeled text in the new language. The approach is based on a multilingual acoustic model trained on data in 1,150 languages (leveraging the data of MMS) which outputs transcriptions in an intermediate representation (uroman tokens). A small amount of text in the new, unseen language is then also mapped to the this intermediate representation and at infernce time, this mapping, with an optional language model, enables transcribing a new language.

Table Of Content

Example

Please have a look at the official space for an example on using the model.

Model details

  • Developed by: Jinming Zhao et al.

  • Model type: Scaling A Simple Approach to Zero-Shot Speech Recognition

  • License: CC-BY-NC 4.0 license

  • Num parameters: 300 million

  • Cite as:

    @article{zhao2024scaling,
      title={Scaling A Simple Approach to Zero-Shot Speech Recognition},
      author={Zhao, Jinming and Pratap, Vineel and Auli, Michael},
      journal={arXiv preprint arXiv:2407.17852},
      year={2024}
    }
    

Additional Links