--- language: "ar" tags: - translation - nllb - fine-tuning - darija - moroccan - transformers datasets: - json library_name: transformers model_name: tachicart/nllb-ft-darija --- # NLLB Fine-tuned for Darija to Modern Standard Arabic Translation This model is a fine-tuned version of `facebook/nllb-200-distilled-600M` for translating Moroccan Darija (ary) to Modern Standard Arabic (ar). The model was fine-tuned on a custom dataset using the Hugging Face `transformers` library. The model is developed by : Tachicart Ridouane, Bouzoubaa Karim ## Model Details - **Base Model**: `facebook/nllb-200-distilled-600M` - **Fine-tuning Library**: Hugging Face `transformers` - **Languages Supported**: Moroccan Darija (ary), Modern Standard Arabic (ar) - **Training Dataset**: Custom dataset of Moroccan Darija and Modern Standard Arabic pairs in JSON format. ## Performance The model has been evaluated on a validation set to ensure translation quality. While it excels at capturing colloquial Moroccan Arabic, ongoing improvements and additional data can further enhance its performance. ## Limitations Dataset Size: The custom dataset consists of 21,000 samples, which may limit coverage of diverse expressions and rare terms. Colloquial Variations: Moroccan Arabic has many dialectal variations, which might not all be covered equally. ## How to Use You can use the model with the `transformers` library as follows: ```python from transformers import AutoTokenizer, AutoModelForSeq2SeqLM tokenizer = AutoTokenizer.from_pretrained("tachicart/nllb-ft-darija") model = AutoModelForSeq2SeqLM.from_pretrained("tachicart/nllb-ft-darija") # Example translation inputs = tokenizer("كيفاش نقدر نربح بزاف ديال الفلوس بالزربة ", return_tensors="pt") outputs = model.generate(**inputs) print(tokenizer.decode(outputs[0], skip_special_tokens=True))