1-800-BAD-CODE commited on
Commit
affeb69
1 Parent(s): d44eb1a

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +2 -12
README.md CHANGED
@@ -289,21 +289,11 @@ Languages were chosen based on whether the News Crawl corpus contained enough re
289
 
290
  # Limitations
291
 
292
- ## Sentence Boundaries / Fullstops
293
- Fullstop (sentence boundary) detection is near-perfect with news data, but misses obvious sentence boundaries
294
- when several short sentences appear contiguously.
295
-
296
- With News crawl, SBD F1 is > 99.5%. With OpenSubtitles, SBD F1 drops unacceptably to < 90%.
297
-
298
- When I figure out why this is, I'll fine-tune the SBD head. It's likely due to pre-processing and domain mis-match.
299
-
300
  ## Domain
301
- This model was trained on news data, and may not perform well on conversational or informal data. Notably,
302
- when presented with many short sentences, the model misses obvious sentence boundaries since the model was
303
- trained on relatively-long sentences.
304
 
305
  ## Quality
306
- Further, this model is unlikely to be of production quality.
307
  It was trained with "only" 1M lines per language, and the dev sets may have been noisy due to the nature of web-scraped news data.
308
 
309
  ## Excessive Predictions
 
289
 
290
  # Limitations
291
 
 
 
 
 
 
 
 
 
292
  ## Domain
293
+ This model was trained on news data, and may not perform well on conversational or informal data.
 
 
294
 
295
  ## Quality
296
+ This model is unlikely to be of production quality.
297
  It was trained with "only" 1M lines per language, and the dev sets may have been noisy due to the nature of web-scraped news data.
298
 
299
  ## Excessive Predictions