somosnlp-hackathon-2022
/

t5-small-spanish-nahuatl

text2text-generation

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

milmor commited on Apr 4, 2022

Commit

30ddd82

•

1 Parent(s): 8b332f6

Update README.md

Files changed (1) hide show

README.md +1 -1

README.md CHANGED Viewed

@@ -53,7 +53,7 @@ Since the Axolotl corpus contains misaligments, we just select the best samples
 Also, to increase the amount of data we collected 3,000 extra samples from the web.
 ### Model and training
-We employ two training-stages using a multilingual T5-small. This model was chosen because it can handle different vocabularies and suffixes. The model is pretrained on different tasks and languages (French, Romanian, English, German).
 ### Training-stage 1 (learning Spanish)
 In training stage 1 we first introduce Spanish to the model. The objective is to learn a new language rich in data (Spanish) and not lose the previous knowledge acquired. We use the English-Spanish [Anki](https://www.manythings.org/anki/) dataset, which consists of 118.964 text pairs. We train the model till convergence adding the suffix "Translate Spanish to English: ".

 Also, to increase the amount of data we collected 3,000 extra samples from the web.
 ### Model and training
+We employ two training-stages using a multilingual T5-small. This model was chosen because it can handle different vocabularies and suffixes. T5-small is pretrained on different tasks and languages (French, Romanian, English, German).
 ### Training-stage 1 (learning Spanish)
 In training stage 1 we first introduce Spanish to the model. The objective is to learn a new language rich in data (Spanish) and not lose the previous knowledge acquired. We use the English-Spanish [Anki](https://www.manythings.org/anki/) dataset, which consists of 118.964 text pairs. We train the model till convergence adding the suffix "Translate Spanish to English: ".