--- license: afl-3.0 datasets: - WillHeld/hinglish_top language: - en - hi metrics: - accuracy library_name: transformers pipeline_tag: fill-mask widget: - text: please ko cancel kardo example_title: Example 1 - text: New York me kesa he? example_title: Example 2 - text: Thoda bajao example_title: Example 3 tags: - Hinglish - MaskedLM --- ### HingMaskedLM MaskedLM is a pre-training technique used in Natural Language Processing (NLP) for deep-learning models like Transformers. It is a variant of language modeling where a portion of the input text is masked, and the model is trained to predict the masked tokens based on the context provided by the unmasked tokens. This model is trained for Masked Language Modeling for `Hinglish Data`. ### Dataset Hinglish-Top [Dataset](https://huggingface.co/datasets/WillHeld/hinglish_top) columns - en_query - cs_query - en_parse - cs_parse - domain ### Training |Epoch|Loss| |:--:|:--:| |1 |0.0465| |2 |0.0262| |3 |0.0116| |4 |0.00385| |5 |0.0103| |6 |0.00738| |7 |0.00892| |8 |0.00379| |9 |0.00126| |10 |0.000684| ### Inference ```python from transformers import AutoTokenizer, AutoModelForMaskedLM, pipeline tokenizer = AutoTokenizer.from_pretrained("SRDdev/HingMaskedLM") model = AutoModelForMaskedLM.from_pretrained("SRDdev/HingMaskedLM") fill = pipeline('fill-mask', model=model, tokenizer=tokenizer) ``` ```python fill(f'please {fill.tokenizer.mask_token} ko cancel kardo') ``` ### Citation Author: @[SRDdev](https://huggingface.co/SRDdev) ``` Name: Shreyas Dixit framework: Pytorch Year: Jan 2023 Pipeline: fill-mask Github: https://github.com/SRDdev LinkedIn: https://www.linkedin.com/in/srddev/ ```