1-800-BAD-CODE commited on
Commit
3c7b25f
1 Parent(s): 0dc2ad3

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +2 -1
README.md CHANGED
@@ -178,7 +178,8 @@ This model was trained on news data, and may not perform well on conversational
178
  Further, this model is unlikely to be of production quality.
179
  It was trained with "only" 1M lines per language, and the dev sets may have been noisy due to the nature of web-scraped news data.
180
 
181
- This model over-predicts the inverted Spanish question mark, `¿` (see metrics below). Since `¿` is a rare token, especially in the
 
182
  context of a 47-language model, Spanish questions were over-sampled by selecting more of these sentences from
183
  additional training data that was not used. However, this seems to have "over-corrected" the problem and a lot
184
  of Spanish question marks are predicted. This can be fixed by exposing prior probabilities, but I'll fine-tune
 
178
  Further, this model is unlikely to be of production quality.
179
  It was trained with "only" 1M lines per language, and the dev sets may have been noisy due to the nature of web-scraped news data.
180
 
181
+ This model over-predicts Spanish question marks, especially the inverted question mark `¿` (see metrics below).
182
+ Since `¿` is a rare token, especially in the
183
  context of a 47-language model, Spanish questions were over-sampled by selecting more of these sentences from
184
  additional training data that was not used. However, this seems to have "over-corrected" the problem and a lot
185
  of Spanish question marks are predicted. This can be fixed by exposing prior probabilities, but I'll fine-tune