somosnlp-hackathon-2022
/

t5-small-spanish-nahuatl

text2text-generation

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

milmor commited on Apr 4, 2022

Commit

cd50eb4

•

1 Parent(s): 30ddd82

Update README.md

Files changed (1) hide show

README.md +2 -2

README.md CHANGED Viewed

@@ -31,7 +31,7 @@ outputs = tokenizer.batch_decode(outputs, skip_special_tokens=True)[0]
 ## Approach
 ### Dataset
-Since the Axolotl corpus contains misaligments, we just select the best samples (~8,000 samples). We also use the [bible-corpus](https://github.com/christos-c/bible-corpus) (7,821 samples).
 | Axolotl best aligned books                            |
 |:-----------------------------------------------------:|
@@ -73,7 +73,7 @@ For a fair comparison, the models are evaluated on the same 505 validation  Nahu
 | False                        | 1.34            | 6.17 | 26.96  |
 | True                         | 1.31            | 6.18 | 28.21  |
-The English-Spanish pretrained model improves BLEU and Chrf, and leads to faster convergence.
 ## References
 - Colin Raffel, Noam Shazeer, Adam Roberts, Katherine Lee, Sharan Narang, Michael Matena, Yanqi Zhou, Wei Li, and Peter J Liu. 2019. Exploring the limits

 ## Approach
 ### Dataset
+Since the Axolotl corpus contains misaligments, we just select the best samples (12,207 samples). We also use the [bible-corpus](https://github.com/christos-c/bible-corpus) (7,821 samples).
 | Axolotl best aligned books                            |
 |:-----------------------------------------------------:|
 | False                        | 1.34            | 6.17 | 26.96  |
 | True                         | 1.31            | 6.18 | 28.21  |
+The English-Spanish pretraining improves BLEU and Chrf, and leads to faster convergence.
 ## References
 - Colin Raffel, Noam Shazeer, Adam Roberts, Katherine Lee, Sharan Narang, Michael Matena, Yanqi Zhou, Wei Li, and Peter J Liu. 2019. Exploring the limits