1-800-BAD-CODE commited on
Commit
5548a75
1 Parent(s): cad4273

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +4 -0
README.md CHANGED
@@ -178,6 +178,10 @@ This model was trained on news data, and may not perform well on conversational
178
  Further, this model is unlikely to be of production quality.
179
  It was trained with "only" 1M lines per language, and the dev sets may have been noisy due to the nature of web-scraped news data.
180
 
 
 
 
 
181
 
182
 
183
  # Evaluation
 
178
  Further, this model is unlikely to be of production quality.
179
  It was trained with "only" 1M lines per language, and the dev sets may have been noisy due to the nature of web-scraped news data.
180
 
181
+ This model over-predicts the inverted Spanish question mark, `¿`. Since `¿` is a rare token, especially in the
182
+ context of a 47-language model, Spanish questions were over-sampled by selecting more of these sentences from
183
+ additional training data that was not used. However, this seems to have "over-corrected" the problem and a lot
184
+ of Spanish question marks are predicted.
185
 
186
 
187
  # Evaluation