milmor commited on
Commit
cd50eb4
1 Parent(s): 30ddd82

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +2 -2
README.md CHANGED
@@ -31,7 +31,7 @@ outputs = tokenizer.batch_decode(outputs, skip_special_tokens=True)[0]
31
 
32
  ## Approach
33
  ### Dataset
34
- Since the Axolotl corpus contains misaligments, we just select the best samples (~8,000 samples). We also use the [bible-corpus](https://github.com/christos-c/bible-corpus) (7,821 samples).
35
 
36
  | Axolotl best aligned books |
37
  |:-----------------------------------------------------:|
@@ -73,7 +73,7 @@ For a fair comparison, the models are evaluated on the same 505 validation Nahu
73
  | False | 1.34 | 6.17 | 26.96 |
74
  | True | 1.31 | 6.18 | 28.21 |
75
 
76
- The English-Spanish pretrained model improves BLEU and Chrf, and leads to faster convergence.
77
 
78
  ## References
79
  - Colin Raffel, Noam Shazeer, Adam Roberts, Katherine Lee, Sharan Narang, Michael Matena, Yanqi Zhou, Wei Li, and Peter J Liu. 2019. Exploring the limits
 
31
 
32
  ## Approach
33
  ### Dataset
34
+ Since the Axolotl corpus contains misaligments, we just select the best samples (12,207 samples). We also use the [bible-corpus](https://github.com/christos-c/bible-corpus) (7,821 samples).
35
 
36
  | Axolotl best aligned books |
37
  |:-----------------------------------------------------:|
 
73
  | False | 1.34 | 6.17 | 26.96 |
74
  | True | 1.31 | 6.18 | 28.21 |
75
 
76
+ The English-Spanish pretraining improves BLEU and Chrf, and leads to faster convergence.
77
 
78
  ## References
79
  - Colin Raffel, Noam Shazeer, Adam Roberts, Katherine Lee, Sharan Narang, Michael Matena, Yanqi Zhou, Wei Li, and Peter J Liu. 2019. Exploring the limits