Update README.md
Browse files
README.md
CHANGED
@@ -31,7 +31,7 @@ outputs = tokenizer.batch_decode(outputs, skip_special_tokens=True)[0]
|
|
31 |
|
32 |
## Approach
|
33 |
### Dataset
|
34 |
-
Since the Axolotl corpus contains misaligments, we just select the best samples (
|
35 |
|
36 |
| Axolotl best aligned books |
|
37 |
|:-----------------------------------------------------:|
|
@@ -73,7 +73,7 @@ For a fair comparison, the models are evaluated on the same 505 validation Nahu
|
|
73 |
| False | 1.34 | 6.17 | 26.96 |
|
74 |
| True | 1.31 | 6.18 | 28.21 |
|
75 |
|
76 |
-
The English-Spanish
|
77 |
|
78 |
## References
|
79 |
- Colin Raffel, Noam Shazeer, Adam Roberts, Katherine Lee, Sharan Narang, Michael Matena, Yanqi Zhou, Wei Li, and Peter J Liu. 2019. Exploring the limits
|
|
|
31 |
|
32 |
## Approach
|
33 |
### Dataset
|
34 |
+
Since the Axolotl corpus contains misaligments, we just select the best samples (12,207 samples). We also use the [bible-corpus](https://github.com/christos-c/bible-corpus) (7,821 samples).
|
35 |
|
36 |
| Axolotl best aligned books |
|
37 |
|:-----------------------------------------------------:|
|
|
|
73 |
| False | 1.34 | 6.17 | 26.96 |
|
74 |
| True | 1.31 | 6.18 | 28.21 |
|
75 |
|
76 |
+
The English-Spanish pretraining improves BLEU and Chrf, and leads to faster convergence.
|
77 |
|
78 |
## References
|
79 |
- Colin Raffel, Noam Shazeer, Adam Roberts, Katherine Lee, Sharan Narang, Michael Matena, Yanqi Zhou, Wei Li, and Peter J Liu. 2019. Exploring the limits
|