--- tags: - mms language: - ab - af - ak - am - ar - as - av - ay - az - ba - bm - be - bn - bi - bo - sh - br - bg - ca - cs - ce - cv - ku - cy - da - de - dv - dz - el - en - eo - et - eu - ee - fo - fa - fj - fi - fr - fy - ff - ga - gl - gn - gu - zh - ht - ha - he - hi - sh - hu - hy - ig - ia - ms - is - it - jv - ja - kn - ka - kk - kr - km - ki - rw - ky - ko - kv - lo - la - lv - ln - lt - lb - lg - mh - ml - mr - ms - mk - mg - mt - mn - mi - my - zh - nl - 'no' - 'no' - ne - ny - oc - om - or - os - pa - pl - pt - ms - ps - qu - qu - qu - qu - qu - qu - qu - qu - qu - qu - qu - qu - qu - qu - qu - qu - qu - qu - qu - qu - qu - qu - ro - rn - ru - sg - sk - sl - sm - sn - sd - so - es - sq - su - sv - sw - ta - tt - te - tg - tl - th - ti - ts - tr - uk - ms - vi - wo - xh - ms - yo - ms - zu - za license: cc-by-nc-4.0 datasets: - google/fleurs metrics: - wer --- # Massively Multilingual Speech (MMS) - Finetuned ASR - ALL This checkpoint is a model fine-tuned for multi-lingual ASR and part of Facebook's [Massive Multilingual Speech project](https://research.facebook.com/publications/scaling-speech-technology-to-1000-languages/). This checkpoint is based on the [Wav2Vec2 architecture](https://huggingface.co/docs/transformers/model_doc/wav2vec2) and makes use of adapter models to transcribe 1000+ languages. The checkpoint consists of **1 billion parameters** and has been fine-tuned from [facebook/mms-1b](https://huggingface.co/facebook/mms-1b) on 1162 languages. ## Table Of Content - [Example](#example) - [Supported Languages](#supported-languages) - [Model details](#model-details) - [Additional links](#additional-links) ## Example This MMS checkpoint can be used with [Transformers](https://github.com/huggingface/transformers) to transcribe audio of 1107 different languages. Let's look at a simple example. First, we install transformers and some other libraries ``` pip install torch accelerate torchaudio datasets pip install --upgrade transformers ```` **Note**: In order to use MMS you need to have at least `transformers >= 4.30` installed. If the `4.30` version is not yet available [on PyPI](https://pypi.org/project/transformers/) make sure to install `transformers` from source: ``` pip install git+https://github.com/huggingface/transformers.git ``` Next, we load a couple of audio samples via `datasets`. Make sure that the audio data is sampled to 16000 kHz. ```py from datasets import load_dataset, Audio # English stream_data = load_dataset("mozilla-foundation/common_voice_13_0", "en", split="test", streaming=True) stream_data = stream_data.cast_column("audio", Audio(sampling_rate=16000)) en_sample = next(iter(stream_data))["audio"]["array"] # French stream_data = load_dataset("mozilla-foundation/common_voice_13_0", "fr", split="test", streaming=True) stream_data = stream_data.cast_column("audio", Audio(sampling_rate=16000)) fr_sample = next(iter(stream_data))["audio"]["array"] ``` Next, we load the model and processor ```py from transformers import Wav2Vec2ForCTC, AutoProcessor import torch model_id = "facebook/mms-1b-all" processor = AutoProcessor.from_pretrained(model_id) model = Wav2Vec2ForCTC.from_pretrained(model_id) ``` Now we process the audio data, pass the processed audio data to the model and transcribe the model output, just like we usually do for Wav2Vec2 models such as [facebook/wav2vec2-base-960h](https://huggingface.co/facebook/wav2vec2-base-960h) ```py inputs = processor(en_sample, sampling_rate=16_000, return_tensors="pt") with torch.no_grad(): outputs = model(**inputs).logits ids = torch.argmax(outputs, dim=-1)[0] transcription = processor.decode(ids) # 'joe keton disapproved of films and buster also had reservations about the media' ``` We can now keep the same model in memory and simply switch out the language adapters by calling the convenient [`load_adapter()`]() function for the model and [`set_target_lang()`]() for the tokenizer. We pass the target language as an input - "fra" for French. ```py processor.tokenizer.set_target_lang("fra") model.load_adapter("fra") inputs = processor(fr_sample, sampling_rate=16_000, return_tensors="pt") with torch.no_grad(): outputs = model(**inputs).logits ids = torch.argmax(outputs, dim=-1)[0] transcription = processor.decode(ids) # "ce dernier est volé tout au long de l'histoire romaine" ``` In the same way the language can be switched out for all other supported languages. Please have a look at: ```py processor.tokenizer.vocab.keys() ``` *Beam Search Decoding with Language Model* To run decoding with n-gram language model, download the decoding config which consits of the paths to language model, lexicon, token files and also the best decoding hyperparameters. Language models are only avaialble for the 102 languages of the FLEURS dataset. ```py import json lm_decoding_config = {} lm_decoding_configfile = hf_hub_download( repo_id="facebook/mms-cclms", filename="decoding_config.json", subfolder="mms-1b-all", ) with open(lm_decoding_configfile) as f: lm_decoding_config = json.loads(f.read()) ``` Now, download all the files needed for decoding. ```py # modify the ISO language code if using a different language. decoding_config = lm_decoding_config["eng"] lm_file = hf_hub_download( repo_id="facebook/mms-cclms", filename=decoding_config["lmfile"].rsplit("/", 1)[1], subfolder=decoding_config["lmfile"].rsplit("/", 1)[0], ) token_file = hf_hub_download( repo_id="facebook/mms-cclms", filename=decoding_config["tokensfile"].rsplit("/", 1)[1], subfolder=decoding_config["tokensfile"].rsplit("/", 1)[0], ) lexicon_file = None if decoding_config["lexiconfile"] is not None: lexicon_file = hf_hub_download( repo_id="facebook/mms-cclms", filename=decoding_config["lexiconfile"].rsplit("/", 1)[1], subfolder=decoding_config["lexiconfile"].rsplit("/", 1)[0], ) ``` Create the `torchaudio.models.decoder.CTCDecoder` object ```py from torchaudio.models.decoder import ctc_decoder beam_search_decoder = ctc_decoder( lexicon=lexicon_file, tokens=token_file, lm=lm_file, nbest=1, beam_size=500, beam_size_token=50, lm_weight=float(decoding_config["lmweight"]), word_score=float(decoding_config["wordscore"]), sil_score=float(decoding_config["silweight"]), blank_token="", ) ``` Passing the model output to the ctc decoder will return the transcription. ```py beam_search_result = beam_search_decoder(outputs.to("cpu")) transcription = " ".join(beam_search_result[0][0].words).strip() ``` For more details, please have a look at [the official docs](https://huggingface.co/docs/transformers/main/en/model_doc/mms). ## Supported Languages This model supports 1162 languages. Unclick the following to toogle all supported languages of this checkpoint in [ISO 639-3 code](https://en.wikipedia.org/wiki/ISO_639-3). You can find more details about the languages and their ISO 649-3 codes in the [MMS Language Coverage Overview](https://dl.fbaipublicfiles.com/mms/misc/language_coverage_mms.html).
Click to toggle - abi - abk - abp - aca - acd - ace - acf - ach - acn - acr - acu - ade - adh - adj - adx - aeu - afr - agd - agg - agn - agr - agu - agx - aha - ahk - aia - aka - akb - ake - akp - alj - alp - alt - alz - ame - amf - amh - ami - amk - ann - any - aoz - apb - apr - ara - arl - asa - asg - asm - ast - ata - atb - atg - ati - atq - ava - avn - avu - awa - awb - ayo - ayr - ayz - azb - azg - azj-script_cyrillic - azj-script_latin - azz - bak - bam - ban - bao - bas - bav - bba - bbb - bbc - bbo - bcc-script_arabic - bcc-script_latin - bcl - bcw - bdg - bdh - bdq - bdu - bdv - beh - bel - bem - ben - bep - bex - bfa - bfo - bfy - bfz - bgc - bgq - bgr - bgt - bgw - bha - bht - bhz - bib - bim - bis - biv - bjr - bjv - bjw - bjz - bkd - bkv - blh - blt - blx - blz - bmq - bmr - bmu - bmv - bng - bno - bnp - boa - bod - boj - bom - bor - bos - bov - box - bpr - bps - bqc - bqi - bqj - bqp - bre - bru - bsc - bsq - bss - btd - bts - btt - btx - bud - bul - bus - bvc - bvz - bwq - bwu - byr - bzh - bzi - bzj - caa - cab - cac-dialect_sanmateoixtatan - cac-dialect_sansebastiancoatan - cak-dialect_central - cak-dialect_santamariadejesus - cak-dialect_santodomingoxenacoj - cak-dialect_southcentral - cak-dialect_western - cak-dialect_yepocapa - cap - car - cas - cat - cax - cbc - cbi - cbr - cbs - cbt - cbu - cbv - cce - cco - cdj - ceb - ceg - cek - ces - cfm - cgc - che - chf - chv - chz - cjo - cjp - cjs - ckb - cko - ckt - cla - cle - cly - cme - cmn-script_simplified - cmo-script_khmer - cmo-script_latin - cmr - cnh - cni - cnl - cnt - coe - cof - cok - con - cot - cou - cpa - cpb - cpu - crh - crk-script_latin - crk-script_syllabics - crn - crq - crs - crt - csk - cso - ctd - ctg - cto - ctu - cuc - cui - cuk - cul - cwa - cwe - cwt - cya - cym - daa - dah - dan - dar - dbj - dbq - ddn - ded - des - deu - dga - dgi - dgk - dgo - dgr - dhi - did - dig - dik - dip - div - djk - dnj-dialect_blowowest - dnj-dialect_gweetaawueast - dnt - dnw - dop - dos - dsh - dso - dtp - dts - dug - dwr - dyi - dyo - dyu - dzo - eip - eka - ell - emp - enb - eng - enx - epo - ese - ess - est - eus - evn - ewe - eza - fal - fao - far - fas - fij - fin - flr - fmu - fon - fra - frd - fry - ful - gag-script_cyrillic - gag-script_latin - gai - gam - gau - gbi - gbk - gbm - gbo - gde - geb - gej - gil - gjn - gkn - gld - gle - glg - glk - gmv - gna - gnd - gng - gof-script_latin - gog - gor - gqr - grc - gri - grn - grt - gso - gub - guc - gud - guh - guj - guk - gum - guo - guq - guu - gux - gvc - gvl - gwi - gwr - gym - gyr - had - hag - hak - hap - hat - hau - hay - heb - heh - hif - hig - hil - hin - hlb - hlt - hne - hnn - hns - hoc - hoy - hrv - hsb - hto - hub - hui - hun - hus-dialect_centralveracruz - hus-dialect_westernpotosino - huu - huv - hvn - hwc - hye - hyw - iba - ibo - icr - idd - ifa - ifb - ife - ifk - ifu - ify - ign - ikk - ilb - ilo - imo - ina - inb - ind - iou - ipi - iqw - iri - irk - isl - ita - itl - itv - ixl-dialect_sangasparchajul - ixl-dialect_sanjuancotzal - ixl-dialect_santamarianebaj - izr - izz - jac - jam - jav - jbu - jen - jic - jiv - jmc - jmd - jpn - jun - juy - jvn - kaa - kab - kac - kak - kam - kan - kao - kaq - kat - kay - kaz - kbo - kbp - kbq - kbr - kby - kca - kcg - kdc - kde - kdh - kdi - kdj - kdl - kdn - kdt - kea - kek - ken - keo - ker - key - kez - kfb - kff-script_telugu - kfw - kfx - khg - khm - khq - kia - kij - kik - kin - kir - kjb - kje - kjg - kjh - kki - kkj - kle - klu - klv - klw - kma - kmd - kml - kmr-script_arabic - kmr-script_cyrillic - kmr-script_latin - kmu - knb - kne - knf - knj - knk - kno - kog - kor - kpq - kps - kpv - kpy - kpz - kqe - kqp - kqr - kqy - krc - kri - krj - krl - krr - krs - kru - ksb - ksr - kss - ktb - ktj - kub - kue - kum - kus - kvn - kvw - kwd - kwf - kwi - kxc - kxf - kxm - kxv - kyb - kyc - kyf - kyg - kyo - kyq - kyu - kyz - kzf - lac - laj - lam - lao - las - lat - lav - law - lbj - lbw - lcp - lee - lef - lem - lew - lex - lgg - lgl - lhu - lia - lid - lif - lin - lip - lis - lit - lje - ljp - llg - lln - lme - lnd - lns - lob - lok - lom - lon - loq - lsi - lsm - ltz - luc - lug - luo - lwo - lww - lzz - maa-dialect_sanantonio - maa-dialect_sanjeronimo - mad - mag - mah - mai - maj - mak - mal - mam-dialect_central - mam-dialect_northern - mam-dialect_southern - mam-dialect_western - maq - mar - maw - maz - mbb - mbc - mbh - mbj - mbt - mbu - mbz - mca - mcb - mcd - mco - mcp - mcq - mcu - mda - mdf - mdv - mdy - med - mee - mej - men - meq - met - mev - mfe - mfh - mfi - mfk - mfq - mfy - mfz - mgd - mge - mgh - mgo - mhi - mhr - mhu - mhx - mhy - mib - mie - mif - mih - mil - mim - min - mio - mip - miq - mit - miy - miz - mjl - mjv - mkd - mkl - mkn - mlg - mlt - mmg - mnb - mnf - mnk - mnw - mnx - moa - mog - mon - mop - mor - mos - mox - moz - mpg - mpm - mpp - mpx - mqb - mqf - mqj - mqn - mri - mrw - msy - mtd - mtj - mto - muh - mup - mur - muv - muy - mvp - mwq - mwv - mxb - mxq - mxt - mxv - mya - myb - myk - myl - myv - myx - myy - mza - mzi - mzj - mzk - mzm - mzw - nab - nag - nan - nas - naw - nca - nch - ncj - ncl - ncu - ndj - ndp - ndv - ndy - ndz - neb - new - nfa - nfr - nga - ngl - ngp - ngu - nhe - nhi - nhu - nhw - nhx - nhy - nia - nij - nim - nin - nko - nlc - nld - nlg - nlk - nmz - nnb - nno - nnq - nnw - noa - nob - nod - nog - not - npi - npl - npy - nso - nst - nsu - ntm - ntr - nuj - nus - nuz - nwb - nxq - nya - nyf - nyn - nyo - nyy - nzi - obo - oci - ojb-script_latin - ojb-script_syllabics - oku - old - omw - onb - ood - orm - ory - oss - ote - otq - ozm - pab - pad - pag - pam - pan - pao - pap - pau - pbb - pbc - pbi - pce - pcm - peg - pez - pib - pil - pir - pis - pjt - pkb - pls - plw - pmf - pny - poh-dialect_eastern - poh-dialect_western - poi - pol - por - poy - ppk - pps - prf - prk - prt - pse - pss - ptu - pui - pus - pwg - pww - pxm - qub - quc-dialect_central - quc-dialect_east - quc-dialect_north - quf - quh - qul - quw - quy - quz - qvc - qve - qvh - qvm - qvn - qvo - qvs - qvw - qvz - qwh - qxh - qxl - qxn - qxo - qxr - rah - rai - rap - rav - raw - rej - rel - rgu - rhg - rif-script_arabic - rif-script_latin - ril - rim - rjs - rkt - rmc-script_cyrillic - rmc-script_latin - rmo - rmy-script_cyrillic - rmy-script_latin - rng - rnl - roh-dialect_sursilv - roh-dialect_vallader - rol - ron - rop - rro - rub - ruf - rug - run - rus - sab - sag - sah - saj - saq - sas - sat - sba - sbd - sbl - sbp - sch - sck - sda - sea - seh - ses - sey - sgb - sgj - sgw - shi - shk - shn - sho - shp - sid - sig - sil - sja - sjm - sld - slk - slu - slv - sml - smo - sna - snd - sne - snn - snp - snw - som - soy - spa - spp - spy - sqi - sri - srm - srn - srp-script_cyrillic - srp-script_latin - srx - stn - stp - suc - suk - sun - sur - sus - suv - suz - swe - swh - sxb - sxn - sya - syl - sza - tac - taj - tam - tao - tap - taq - tat - tav - tbc - tbg - tbk - tbl - tby - tbz - tca - tcc - tcs - tcz - tdj - ted - tee - tel - tem - teo - ter - tes - tew - tex - tfr - tgj - tgk - tgl - tgo - tgp - tha - thk - thl - tih - tik - tir - tkr - tlb - tlj - tly - tmc - tmf - tna - tng - tnk - tnn - tnp - tnr - tnt - tob - toc - toh - tom - tos - tpi - tpm - tpp - tpt - trc - tri - trn - trs - tso - tsz - ttc - tte - ttq-script_tifinagh - tue - tuf - tuk-script_arabic - tuk-script_latin - tuo - tur - tvw - twb - twe - twu - txa - txq - txu - tye - tzh-dialect_bachajon - tzh-dialect_tenejapa - tzj-dialect_eastern - tzj-dialect_western - tzo-dialect_chamula - tzo-dialect_chenalho - ubl - ubu - udm - udu - uig-script_arabic - uig-script_cyrillic - ukr - umb - unr - upv - ura - urb - urd-script_arabic - urd-script_devanagari - urd-script_latin - urk - urt - ury - usp - uzb-script_cyrillic - uzb-script_latin - vag - vid - vie - vif - vmw - vmy - vot - vun - vut - wal-script_ethiopic - wal-script_latin - wap - war - waw - way - wba - wlo - wlx - wmw - wob - wol - wsg - wwa - xal - xdy - xed - xer - xho - xmm - xnj - xnr - xog - xon - xrb - xsb - xsm - xsr - xsu - xta - xtd - xte - xtm - xtn - xua - xuo - yaa - yad - yal - yam - yao - yas - yat - yaz - yba - ybb - ycl - ycn - yea - yka - yli - yor - yre - yua - yue-script_traditional - yuz - yva - zaa - zab - zac - zad - zae - zai - zam - zao - zaq - zar - zas - zav - zaw - zca - zga - zim - ziw - zlm - zmz - zne - zos - zpc - zpg - zpi - zpl - zpm - zpo - zpt - zpu - zpz - ztq - zty - zul - zyb - zyp - zza
## Model details - **Developed by:** Vineel Pratap et al. - **Model type:** Multi-Lingual Automatic Speech Recognition model - **Language(s):** 1000+ languages, see [supported languages](#supported-languages) - **License:** CC-BY-NC 4.0 license - **Num parameters**: 1 billion - **Audio sampling rate**: 16,000 kHz - **Cite as:** @article{pratap2023mms, title={Scaling Speech Technology to 1,000+ Languages}, author={Vineel Pratap and Andros Tjandra and Bowen Shi and Paden Tomasello and Arun Babu and Sayani Kundu and Ali Elkahky and Zhaoheng Ni and Apoorv Vyas and Maryam Fazel-Zarandi and Alexei Baevski and Yossi Adi and Xiaohui Zhang and Wei-Ning Hsu and Alexis Conneau and Michael Auli}, journal={arXiv}, year={2023} } ## Additional Links - [Blog post](https://ai.facebook.com/blog/multilingual-model-speech-recognition/) - [Transformers documentation](https://huggingface.co/docs/transformers/main/en/model_doc/mms). - [Paper](https://arxiv.org/abs/2305.13516) - [GitHub Repository](https://github.com/facebookresearch/fairseq/tree/main/examples/mms#asr) - [Other **MMS** checkpoints](https://huggingface.co/models?other=mms) - MMS base checkpoints: - [facebook/mms-1b](https://huggingface.co/facebook/mms-1b) - [facebook/mms-300m](https://huggingface.co/facebook/mms-300m) - [Official Space](https://huggingface.co/spaces/facebook/MMS)