1-800-BAD-CODE commited on
Commit
dc7df76
1 Parent(s): 0ad55a4

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +197 -134
README.md CHANGED
@@ -89,7 +89,7 @@ Applying all these predictions to the input text, we can punctuate, true-case, a
89
 
90
  ## Tokenizer
91
 
92
- Instead of the hacky wrapper used by FairSeq and strangely ported (not fixed) by HuggingFace, the xlm-roberta SentencePiece model was adjusted to correctly encode
93
  the text. Per HF's comments,
94
 
95
  ```python
@@ -128,6 +128,8 @@ with open("/path/to/new/sp.model", "wb") as f:
128
  f.write(m.SerializeToString())
129
  ```
130
 
 
 
131
 
132
  ## Post-Punctuation Tokens
133
  This model predicts the following set of punctuation tokens after each subtoken:
@@ -165,8 +167,7 @@ This model predicts the following set of punctuation tokens before each subword:
165
  # Training Details
166
  This model was trained in the NeMo framework.
167
 
168
- This model was trained on an A100 for slightly longer than 7 hours.
169
- For validation and train metrics, see the [Tensorboard Logs](https://tensorboard.dev/experiment/xxnULI1aTeK37vUDL4ejiw/).
170
 
171
  ## Training Data
172
  This model was trained with News Crawl data from WMT.
@@ -176,17 +177,19 @@ This model was trained with News Crawl data from WMT.
176
  Languages were chosen based on whether the News Crawl corpus contained enough reliable-quality data as judged by the author.
177
 
178
  # Limitations
179
- This model was trained on news data, and may not perform well on conversational or informal data.
 
 
180
 
181
  Further, this model is unlikely to be of production quality.
182
  It was trained with "only" 1M lines per language, and the dev sets may have been noisy due to the nature of web-scraped news data.
183
 
184
  This model over-predicts Spanish question marks, especially the inverted question mark `¿` (see metrics below).
185
- Since `¿` is a rare token, especially in the
186
- context of a 47-language model, Spanish questions were over-sampled by selecting more of these sentences from
187
- additional training data that was not used. However, this seems to have "over-corrected" the problem and a lot
188
- of Spanish question marks are predicted. This can be fixed by exposing prior probabilities, but I'll fine-tune
189
- it later to fix this the right way.
190
 
191
 
192
  # Evaluation
@@ -211,11 +214,17 @@ Each test example was generated using the following procedure:
211
  2. Lower-case the concatenated sentence
212
  3. Remove all punctuation
213
 
 
 
214
  The data is a held-out portion of News Crawl, which has been deduplicated.
215
  3,000 lines of data per language was used, generating 3,000 unique examples of 11 sentences each.
216
  We generate 3,000 examples, where example `i` begins with sentence `i` and is followed by 10 random
217
  sentences selected from the 3,000 sentence test set.
218
 
 
 
 
 
219
  ## Selected Language Evaluation Reports
220
  For now, metrics for a few selected languages are shown below.
221
  Given the amount of work required to collect and pretty-print metrics in 47 languages, I'll add more eventually.
@@ -229,11 +238,11 @@ Expand any of the following tabs to see metrics for that language.
229
  ```text
230
  punct_post test report:
231
  label precision recall f1 support
232
- <NULL> (label_id: 0) 99.18 98.47 98.82 538769
233
- <ACRONYM> (label_id: 1) 66.03 78.63 71.78 571
234
- . (label_id: 2) 90.66 93.68 92.14 30581
235
- , (label_id: 3) 74.18 82.93 78.31 23230
236
- ? (label_id: 4) 78.10 80.08 79.07 1024
237
  ? (label_id: 5) 0.00 0.00 0.00 0
238
  , (label_id: 6) 0.00 0.00 0.00 0
239
  。 (label_id: 7) 0.00 0.00 0.00 0
@@ -247,31 +256,31 @@ punct_post test report:
247
  ፣ (label_id: 15) 0.00 0.00 0.00 0
248
  ፧ (label_id: 16) 0.00 0.00 0.00 0
249
  -------------------
250
- micro avg 97.56 97.56 97.56 594175
251
- macro avg 81.63 86.76 84.03 594175
252
- weighted avg 97.70 97.56 97.62 594175
253
  ```
254
 
255
- ```text
256
  cap test report:
257
  label precision recall f1 support
258
- LOWER (label_id: 0) 99.71 99.85 99.78 2036824
259
- UPPER (label_id: 1) 96.40 93.27 94.81 87747
260
  -------------------
261
- micro avg 99.58 99.58 99.58 2124571
262
- macro avg 98.06 96.56 97.30 2124571
263
- weighted avg 99.57 99.58 99.58 2124571
264
  ```
265
 
266
- ```text
267
  seg test report:
268
  label precision recall f1 support
269
- NOSTOP (label_id: 0) 99.97 99.98 99.98 564228
270
- FULLSTOP (label_id: 1) 99.73 99.54 99.64 32947
271
  -------------------
272
- micro avg 99.96 99.96 99.96 597175
273
- macro avg 99.85 99.76 99.81 597175
274
- weighted avg 99.96 99.96 99.96 597175
275
  ```
276
 
277
  </details>
@@ -284,22 +293,22 @@ seg test report:
284
  ```text
285
  punct_pre test report:
286
  label precision recall f1 support
287
- <NULL> (label_id: 0) 99.96 99.76 99.86 609200
288
- ¿ (label_id: 1) 39.66 77.89 52.56 1221
289
  -------------------
290
- micro avg 99.72 99.72 99.72 610421
291
- macro avg 69.81 88.82 76.21 610421
292
- weighted avg 99.83 99.72 99.76 610421
 
 
293
  ```
294
-
295
- ```text
296
  punct_post test report:
297
  label precision recall f1 support
298
- <NULL> (label_id: 0) 99.17 98.44 98.80 553100
299
- <ACRONYM> (label_id: 1) 23.33 43.75 30.43 48
300
- . (label_id: 2) 91.92 92.58 92.25 29623
301
- , (label_id: 3) 73.07 82.04 77.30 26432
302
- ? (label_id: 4) 49.44 71.84 58.57 1218
303
  ? (label_id: 5) 0.00 0.00 0.00 0
304
  , (label_id: 6) 0.00 0.00 0.00 0
305
  。 (label_id: 7) 0.00 0.00 0.00 0
@@ -313,31 +322,31 @@ punct_post test report:
313
  ፣ (label_id: 15) 0.00 0.00 0.00 0
314
  ፧ (label_id: 16) 0.00 0.00 0.00 0
315
  -------------------
316
- micro avg 97.39 97.39 97.39 610421
317
- macro avg 67.39 77.73 71.47 610421
318
- weighted avg 97.58 97.39 97.47 610421
319
  ```
320
 
321
- ```text
322
  cap test report:
323
  label precision recall f1 support
324
- LOWER (label_id: 0) 99.82 99.86 99.84 2222062
325
- UPPER (label_id: 1) 95.96 94.64 95.29 75940
326
  -------------------
327
- micro avg 99.69 99.69 99.69 2298002
328
- macro avg 97.89 97.25 97.57 2298002
329
- weighted avg 99.69 99.69 99.69 2298002
330
  ```
331
 
332
- ```text
333
  seg test report:
334
  label precision recall f1 support
335
- NOSTOP (label_id: 0) 99.99 99.97 99.98 580519
336
- FULLSTOP (label_id: 1) 99.52 99.81 99.66 32902
337
  -------------------
338
- micro avg 99.96 99.96 99.96 613421
339
- macro avg 99.75 99.89 99.82 613421
340
- weighted avg 99.96 99.96 99.96 613421
341
  ```
342
 
343
  </details>
@@ -349,7 +358,7 @@ seg test report:
349
  ```text
350
  punct_post test report:
351
  label precision recall f1 support
352
- <NULL> (label_id: 0) 99.81 99.40 99.60 729695
353
  <ACRONYM> (label_id: 1) 0.00 0.00 0.00 0
354
  . (label_id: 2) 0.00 0.00 0.00 0
355
  , (label_id: 3) 0.00 0.00 0.00 0
@@ -363,35 +372,35 @@ punct_post test report:
363
  ؟ (label_id: 11) 0.00 0.00 0.00 0
364
  ، (label_id: 12) 0.00 0.00 0.00 0
365
  ; (label_id: 13) 0.00 0.00 0.00 0
366
- ። (label_id: 14) 91.44 97.78 94.50 25288
367
- ፣ (label_id: 15) 66.93 80.45 73.07 5774
368
- ፧ (label_id: 16) 72.14 77.01 74.49 1170
369
  -------------------
370
- micro avg 99.17 99.17 99.17 761927
371
- macro avg 82.58 88.66 85.42 761927
372
- weighted avg 99.24 99.17 99.19 761927
373
  ```
374
 
375
- ```text
376
  cap test report:
377
  label precision recall f1 support
378
- LOWER (label_id: 0) 98.50 97.22 97.86 1150
379
- UPPER (label_id: 1) 56.16 70.69 62.60 58
380
  -------------------
381
- micro avg 95.94 95.94 95.94 1208
382
- macro avg 77.33 83.95 80.23 1208
383
- weighted avg 96.47 95.94 96.16 1208
384
  ```
385
 
386
- ```text
387
  seg test report:
388
  label precision recall f1 support
389
- NOSTOP (label_id: 0) 99.97 99.91 99.94 743103
390
- FULLSTOP (label_id: 1) 97.16 99.04 98.09 21824
391
  -------------------
392
- micro avg 99.89 99.89 99.89 764927
393
- macro avg 98.57 99.48 99.02 764927
394
- weighted avg 99.89 99.89 99.89 764927
395
  ```
396
 
397
  </details>
@@ -403,14 +412,14 @@ seg test report:
403
  ```text
404
  punct_post test report:
405
  label precision recall f1 support
406
- <NULL> (label_id: 0) 99.47 97.46 98.45 414383
407
  <ACRONYM> (label_id: 1) 0.00 0.00 0.00 0
408
  . (label_id: 2) 0.00 0.00 0.00 0
409
  , (label_id: 3) 0.00 0.00 0.00 0
410
  ? (label_id: 4) 0.00 0.00 0.00 0
411
- ? (label_id: 5) 81.41 85.80 83.55 1444
412
- , (label_id: 6) 74.93 92.79 82.91 34094
413
- 。 (label_id: 7) 96.35 96.86 96.60 30668
414
  、 (label_id: 8) 0.00 0.00 0.00 0
415
  ・ (label_id: 9) 0.00 0.00 0.00 0
416
  । (label_id: 10) 0.00 0.00 0.00 0
@@ -421,31 +430,31 @@ punct_post test report:
421
  ፣ (label_id: 15) 0.00 0.00 0.00 0
422
  ፧ (label_id: 16) 0.00 0.00 0.00 0
423
  -------------------
424
- micro avg 97.05 97.05 97.05 480589
425
- macro avg 88.04 93.23 90.38 480589
426
- weighted avg 97.47 97.05 97.19 480589
427
  ```
428
 
429
- ```text
430
  cap test report:
431
  label precision recall f1 support
432
- LOWER (label_id: 0) 94.82 93.97 94.39 2786
433
- UPPER (label_id: 1) 79.23 81.76 80.48 784
434
  -------------------
435
- micro avg 91.29 91.29 91.29 3570
436
- macro avg 87.03 87.87 87.44 3570
437
- weighted avg 91.40 91.29 91.34 3570
438
  ```
439
 
440
- ```text
441
  seg test report:
442
  label precision recall f1 support
443
- NOSTOP (label_id: 0) 99.99 99.98 99.98 450589
444
- FULLSTOP (label_id: 1) 99.75 99.81 99.78 33000
445
  -------------------
446
- micro avg 99.97 99.97 99.97 483589
447
- macro avg 99.87 99.89 99.88 483589
448
- weighted avg 99.97 99.97 99.97 483589
449
  ```
450
 
451
  </details>
@@ -457,16 +466,16 @@ seg test report:
457
  ```text
458
  punct_post test report:
459
  label precision recall f1 support
460
- <NULL> (label_id: 0) 99.32 95.84 97.55 387103
461
  <ACRONYM> (label_id: 1) 0.00 0.00 0.00 0
462
  . (label_id: 2) 0.00 0.00 0.00 0
463
  , (label_id: 3) 0.00 0.00 0.00 0
464
  ? (label_id: 4) 0.00 0.00 0.00 0
465
- ? (label_id: 5) 75.12 68.14 71.46 1378
466
  , (label_id: 6) 0.00 0.00 0.00 0
467
- 。 (label_id: 7) 93.30 97.44 95.33 31110
468
- 、 (label_id: 8) 53.91 87.52 66.72 17710
469
- ・ (label_id: 9) 29.33 64.60 40.35 1048
470
  । (label_id: 10) 0.00 0.00 0.00 0
471
  ؟ (label_id: 11) 0.00 0.00 0.00 0
472
  ، (label_id: 12) 0.00 0.00 0.00 0
@@ -475,31 +484,31 @@ punct_post test report:
475
  ፣ (label_id: 15) 0.00 0.00 0.00 0
476
  ፧ (label_id: 16) 0.00 0.00 0.00 0
477
  -------------------
478
- micro avg 95.46 95.46 95.46 438349
479
- macro avg 70.20 82.71 74.28 438349
480
- weighted avg 96.81 95.46 95.93 438349
 
 
481
  ```
482
-
483
- ```text
484
  cap test report:
485
  label precision recall f1 support
486
- LOWER (label_id: 0) 92.64 92.67 92.65 4036
487
- UPPER (label_id: 1) 80.75 80.70 80.73 1539
488
  -------------------
489
- micro avg 89.36 89.36 89.36 5575
490
- macro avg 86.70 86.68 86.69 5575
491
- weighted avg 89.36 89.36 89.36 5575
492
  ```
493
 
494
- ```text
495
  seg test report:
496
  label precision recall f1 support
497
- NOSTOP (label_id: 0) 99.98 99.95 99.97 408349
498
- FULLSTOP (label_id: 1) 99.36 99.78 99.57 33000
499
  -------------------
500
- micro avg 99.94 99.94 99.94 441349
501
- macro avg 99.67 99.86 99.77 441349
502
- weighted avg 99.94 99.94 99.94 441349
503
  ```
504
 
505
  </details>
@@ -511,17 +520,17 @@ seg test report:
511
  ```text
512
  punct_post test report:
513
  label precision recall f1 support
514
- <NULL> (label_id: 0) 99.73 99.47 99.60 533761
515
  <ACRONYM> (label_id: 1) 0.00 0.00 0.00 0
516
  . (label_id: 2) 0.00 0.00 0.00 0
517
- , (label_id: 3) 70.69 76.48 73.47 7713
518
- ? (label_id: 4) 65.41 74.75 69.77 301
519
  ? (label_id: 5) 0.00 0.00 0.00 0
520
  , (label_id: 6) 0.00 0.00 0.00 0
521
  。 (label_id: 7) 0.00 0.00 0.00 0
522
  、 (label_id: 8) 0.00 0.00 0.00 0
523
  ・ (label_id: 9) 0.00 0.00 0.00 0
524
- । (label_id: 10) 96.46 98.81 97.62 30641
525
  ؟ (label_id: 11) 0.00 0.00 0.00 0
526
  ، (label_id: 12) 0.00 0.00 0.00 0
527
  ; (label_id: 13) 0.00 0.00 0.00 0
@@ -529,31 +538,85 @@ punct_post test report:
529
  ፣ (label_id: 15) 0.00 0.00 0.00 0
530
  ፧ (label_id: 16) 0.00 0.00 0.00 0
531
  -------------------
532
- micro avg 99.11 99.11 99.11 572416
533
- macro avg 83.07 87.38 85.11 572416
534
- weighted avg 99.15 99.11 99.13 572416
 
 
535
  ```
536
-
537
- ```text
538
  cap test report:
539
  label precision recall f1 support
540
- LOWER (label_id: 0) 97.46 96.50 96.98 2346
541
- UPPER (label_id: 1) 89.01 91.84 90.40 723
 
 
 
 
 
 
 
 
 
 
 
542
  -------------------
543
- micro avg 95.41 95.41 95.41 3069
544
- macro avg 93.23 94.17 93.69 3069
545
- weighted avg 95.47 95.41 95.43 3069
546
  ```
 
 
 
 
 
 
547
 
548
  ```text
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
549
  seg test report:
550
  label precision recall f1 support
551
- NOSTOP (label_id: 0) 100.00 100.00 100.00 542437
552
- FULLSTOP (label_id: 1) 99.92 99.97 99.95 32979
553
  -------------------
554
- micro avg 99.99 99.99 99.99 575416
555
- macro avg 99.96 99.98 99.97 575416
556
- weighted avg 99.99 99.99 99.99 575416
557
  ```
558
 
559
  </details>
 
89
 
90
  ## Tokenizer
91
 
92
+ Instead of the hacky wrapper used by FairSeq and strangely ported (not fixed) by HuggingFace, the `xlm-roberta` SentencePiece model was adjusted to correctly encode
93
  the text. Per HF's comments,
94
 
95
  ```python
 
128
  f.write(m.SerializeToString())
129
  ```
130
 
131
+ Now we can use just the SP model without a wrapper.
132
+
133
 
134
  ## Post-Punctuation Tokens
135
  This model predicts the following set of punctuation tokens after each subtoken:
 
167
  # Training Details
168
  This model was trained in the NeMo framework.
169
 
170
+ This model was trained on an A100 for approximately 9 hours.
 
171
 
172
  ## Training Data
173
  This model was trained with News Crawl data from WMT.
 
177
  Languages were chosen based on whether the News Crawl corpus contained enough reliable-quality data as judged by the author.
178
 
179
  # Limitations
180
+ This model was trained on news data, and may not perform well on conversational or informal data. Notably,
181
+ when presented with many short sentences, the model misses obvious sentence boundaries since the model was
182
+ trained on relatively-long sentences.
183
 
184
  Further, this model is unlikely to be of production quality.
185
  It was trained with "only" 1M lines per language, and the dev sets may have been noisy due to the nature of web-scraped news data.
186
 
187
  This model over-predicts Spanish question marks, especially the inverted question mark `¿` (see metrics below).
188
+ Since `¿` is a rare token, especially in the context of a 47-language model, Spanish questions were over-sampled
189
+ by selecting more of these sentences from additional training data that was not used. However, this seems to have
190
+ "over-corrected" the problem and a lot of Spanish question marks are predicted.
191
+
192
+ The model may also over-predict commas.
193
 
194
 
195
  # Evaluation
 
214
  2. Lower-case the concatenated sentence
215
  3. Remove all punctuation
216
 
217
+ Targets are generated as we lower-case letters and remove punctuation.
218
+
219
  The data is a held-out portion of News Crawl, which has been deduplicated.
220
  3,000 lines of data per language was used, generating 3,000 unique examples of 11 sentences each.
221
  We generate 3,000 examples, where example `i` begins with sentence `i` and is followed by 10 random
222
  sentences selected from the 3,000 sentence test set.
223
 
224
+ For measuring true-casing and sentence boundary detection, reference punctuation tokens were used for
225
+ conditioning (see graph above). If we use predicted punctuation instead, then incorrect punctuation will
226
+ result in true-casing and SBD targets not aligning correctly and these metrics will be artificially low.
227
+
228
  ## Selected Language Evaluation Reports
229
  For now, metrics for a few selected languages are shown below.
230
  Given the amount of work required to collect and pretty-print metrics in 47 languages, I'll add more eventually.
 
238
  ```text
239
  punct_post test report:
240
  label precision recall f1 support
241
+ <NULL> (label_id: 0) 99.25 98.43 98.84 564908
242
+ <ACRONYM> (label_id: 1) 63.14 84.67 72.33 613
243
+ . (label_id: 2) 90.97 93.91 92.42 32040
244
+ , (label_id: 3) 73.95 84.32 78.79 24271
245
+ ? (label_id: 4) 79.05 81.94 80.47 1041
246
  ? (label_id: 5) 0.00 0.00 0.00 0
247
  , (label_id: 6) 0.00 0.00 0.00 0
248
  。 (label_id: 7) 0.00 0.00 0.00 0
 
256
  ፣ (label_id: 15) 0.00 0.00 0.00 0
257
  ፧ (label_id: 16) 0.00 0.00 0.00 0
258
  -------------------
259
+ micro avg 97.60 97.60 97.60 622873
260
+ macro avg 81.27 88.65 84.57 622873
261
+ weighted avg 97.77 97.60 97.67 622873
262
  ```
263
 
264
+ ```
265
  cap test report:
266
  label precision recall f1 support
267
+ LOWER (label_id: 0) 99.72 99.85 99.78 2134956
268
+ UPPER (label_id: 1) 96.33 93.52 94.91 91996
269
  -------------------
270
+ micro avg 99.59 99.59 99.59 2226952
271
+ macro avg 98.03 96.68 97.34 2226952
272
+ weighted avg 99.58 99.59 99.58 2226952
273
  ```
274
 
275
+ ```
276
  seg test report:
277
  label precision recall f1 support
278
+ NOSTOP (label_id: 0) 99.99 99.98 99.99 591540
279
+ FULLSTOP (label_id: 1) 99.61 99.89 99.75 34333
280
  -------------------
281
+ micro avg 99.97 99.97 99.97 625873
282
+ macro avg 99.80 99.93 99.87 625873
283
+ weighted avg 99.97 99.97 99.97 625873
284
  ```
285
 
286
  </details>
 
293
  ```text
294
  punct_pre test report:
295
  label precision recall f1 support
296
+ <NULL> (label_id: 0) 99.94 99.89 99.92 636941
297
+ ¿ (label_id: 1) 56.73 71.35 63.20 1288
298
  -------------------
299
+ micro avg 99.83 99.83 99.83 638229
300
+ macro avg 78.34 85.62 81.56 638229
301
+ weighted avg 99.85 99.83 99.84 638229
302
+ ```
303
+
304
  ```
 
 
305
  punct_post test report:
306
  label precision recall f1 support
307
+ <NULL> (label_id: 0) 99.19 98.41 98.80 578271
308
+ <ACRONYM> (label_id: 1) 30.10 56.36 39.24 55
309
+ . (label_id: 2) 91.92 93.12 92.52 30856
310
+ , (label_id: 3) 72.98 82.44 77.42 27761
311
+ ? (label_id: 4) 52.77 71.85 60.85 1286
312
  ? (label_id: 5) 0.00 0.00 0.00 0
313
  , (label_id: 6) 0.00 0.00 0.00 0
314
  。 (label_id: 7) 0.00 0.00 0.00 0
 
322
  ፣ (label_id: 15) 0.00 0.00 0.00 0
323
  ፧ (label_id: 16) 0.00 0.00 0.00 0
324
  -------------------
325
+ micro avg 97.40 97.40 97.40 638229
326
+ macro avg 69.39 80.44 73.77 638229
327
+ weighted avg 97.60 97.40 97.48 638229
328
  ```
329
 
330
+ ```
331
  cap test report:
332
  label precision recall f1 support
333
+ LOWER (label_id: 0) 99.82 99.86 99.84 2324724
334
+ UPPER (label_id: 1) 95.92 94.70 95.30 79266
335
  -------------------
336
+ micro avg 99.69 99.69 99.69 2403990
337
+ macro avg 97.87 97.28 97.57 2403990
338
+ weighted avg 99.69 99.69 99.69 2403990
339
  ```
340
 
341
+ ```
342
  seg test report:
343
  label precision recall f1 support
344
+ NOSTOP (label_id: 0) 99.99 99.96 99.98 607057
345
+ FULLSTOP (label_id: 1) 99.31 99.88 99.60 34172
346
  -------------------
347
+ micro avg 99.96 99.96 99.96 641229
348
+ macro avg 99.65 99.92 99.79 641229
349
+ weighted avg 99.96 99.96 99.96 641229
350
  ```
351
 
352
  </details>
 
358
  ```text
359
  punct_post test report:
360
  label precision recall f1 support
361
+ <NULL> (label_id: 0) 99.83 99.28 99.56 729664
362
  <ACRONYM> (label_id: 1) 0.00 0.00 0.00 0
363
  . (label_id: 2) 0.00 0.00 0.00 0
364
  , (label_id: 3) 0.00 0.00 0.00 0
 
372
  ؟ (label_id: 11) 0.00 0.00 0.00 0
373
  ، (label_id: 12) 0.00 0.00 0.00 0
374
  ; (label_id: 13) 0.00 0.00 0.00 0
375
+ ። (label_id: 14) 91.27 97.90 94.47 25341
376
+ ፣ (label_id: 15) 61.93 82.11 70.60 5818
377
+ ፧ (label_id: 16) 67.41 81.73 73.89 1177
378
  -------------------
379
+ micro avg 99.08 99.08 99.08 762000
380
+ macro avg 80.11 90.26 84.63 762000
381
+ weighted avg 99.21 99.08 99.13 762000
382
  ```
383
 
384
+ ```
385
  cap test report:
386
  label precision recall f1 support
387
+ LOWER (label_id: 0) 98.40 98.03 98.21 1064
388
+ UPPER (label_id: 1) 71.23 75.36 73.24 69
389
  -------------------
390
+ micro avg 96.65 96.65 96.65 1133
391
+ macro avg 84.81 86.69 85.73 1133
392
+ weighted avg 96.74 96.65 96.69 1133
393
  ```
394
 
395
+ ```
396
  seg test report:
397
  label precision recall f1 support
398
+ NOSTOP (label_id: 0) 99.99 99.85 99.92 743158
399
+ FULLSTOP (label_id: 1) 95.20 99.62 97.36 21842
400
  -------------------
401
+ micro avg 99.85 99.85 99.85 765000
402
+ macro avg 97.59 99.74 98.64 765000
403
+ weighted avg 99.85 99.85 99.85 765000
404
  ```
405
 
406
  </details>
 
412
  ```text
413
  punct_post test report:
414
  label precision recall f1 support
415
+ <NULL> (label_id: 0) 99.53 97.31 98.41 435611
416
  <ACRONYM> (label_id: 1) 0.00 0.00 0.00 0
417
  . (label_id: 2) 0.00 0.00 0.00 0
418
  , (label_id: 3) 0.00 0.00 0.00 0
419
  ? (label_id: 4) 0.00 0.00 0.00 0
420
+ ? (label_id: 5) 81.85 87.31 84.49 1513
421
+ , (label_id: 6) 74.08 93.67 82.73 35921
422
+ 。 (label_id: 7) 96.51 96.93 96.72 32097
423
  、 (label_id: 8) 0.00 0.00 0.00 0
424
  ・ (label_id: 9) 0.00 0.00 0.00 0
425
  । (label_id: 10) 0.00 0.00 0.00 0
 
430
  ፣ (label_id: 15) 0.00 0.00 0.00 0
431
  ፧ (label_id: 16) 0.00 0.00 0.00 0
432
  -------------------
433
+ micro avg 97.00 97.00 97.00 505142
434
+ macro avg 87.99 93.81 90.59 505142
435
+ weighted avg 97.48 97.00 97.15 505142
436
  ```
437
 
438
+ ```
439
  cap test report:
440
  label precision recall f1 support
441
+ LOWER (label_id: 0) 94.89 94.98 94.94 2951
442
+ UPPER (label_id: 1) 81.34 81.03 81.18 796
443
  -------------------
444
+ micro avg 92.02 92.02 92.02 3747
445
+ macro avg 88.11 88.01 88.06 3747
446
+ weighted avg 92.01 92.02 92.01 3747
447
  ```
448
 
449
+ ```
450
  seg test report:
451
  label precision recall f1 support
452
+ NOSTOP (label_id: 0) 99.99 99.97 99.98 473642
453
+ FULLSTOP (label_id: 1) 99.55 99.90 99.72 34500
454
  -------------------
455
+ micro avg 99.96 99.96 99.96 508142
456
+ macro avg 99.77 99.93 99.85 508142
457
+ weighted avg 99.96 99.96 99.96 508142
458
  ```
459
 
460
  </details>
 
466
  ```text
467
  punct_post test report:
468
  label precision recall f1 support
469
+ <NULL> (label_id: 0) 99.34 95.90 97.59 406341
470
  <ACRONYM> (label_id: 1) 0.00 0.00 0.00 0
471
  . (label_id: 2) 0.00 0.00 0.00 0
472
  , (label_id: 3) 0.00 0.00 0.00 0
473
  ? (label_id: 4) 0.00 0.00 0.00 0
474
+ ? (label_id: 5) 70.55 73.56 72.02 1456
475
  , (label_id: 6) 0.00 0.00 0.00 0
476
+ 。 (label_id: 7) 94.38 96.95 95.65 32537
477
+ 、 (label_id: 8) 54.28 87.62 67.03 18610
478
+ ・ (label_id: 9) 28.18 71.64 40.45 1100
479
  । (label_id: 10) 0.00 0.00 0.00 0
480
  ؟ (label_id: 11) 0.00 0.00 0.00 0
481
  ، (label_id: 12) 0.00 0.00 0.00 0
 
484
  ፣ (label_id: 15) 0.00 0.00 0.00 0
485
  ፧ (label_id: 16) 0.00 0.00 0.00 0
486
  -------------------
487
+ micro avg 95.51 95.51 95.51 460044
488
+ macro avg 69.35 85.13 74.55 460044
489
+ weighted avg 96.91 95.51 96.00 460044
490
+ ```
491
+
492
  ```
 
 
493
  cap test report:
494
  label precision recall f1 support
495
+ LOWER (label_id: 0) 92.33 94.03 93.18 4174
496
+ UPPER (label_id: 1) 83.51 79.46 81.43 1587
497
  -------------------
498
+ micro avg 90.02 90.02 90.02 5761
499
+ macro avg 87.92 86.75 87.30 5761
500
+ weighted avg 89.90 90.02 89.94 5761
501
  ```
502
 
503
+ ```
504
  seg test report:
505
  label precision recall f1 support
506
+ NOSTOP (label_id: 0) 99.99 99.92 99.96 428544
507
+ FULLSTOP (label_id: 1) 99.07 99.87 99.47 34500
508
  -------------------
509
+ micro avg 99.92 99.92 99.92 463044
510
+ macro avg 99.53 99.90 99.71 463044
511
+ weighted avg 99.92 99.92 99.92 463044
512
  ```
513
 
514
  </details>
 
520
  ```text
521
  punct_post test report:
522
  label precision recall f1 support
523
+ <NULL> (label_id: 0) 99.75 99.44 99.59 560358
524
  <ACRONYM> (label_id: 1) 0.00 0.00 0.00 0
525
  . (label_id: 2) 0.00 0.00 0.00 0
526
+ , (label_id: 3) 69.55 78.48 73.75 8084
527
+ ? (label_id: 4) 63.30 87.07 73.31 317
528
  ? (label_id: 5) 0.00 0.00 0.00 0
529
  , (label_id: 6) 0.00 0.00 0.00 0
530
  。 (label_id: 7) 0.00 0.00 0.00 0
531
  、 (label_id: 8) 0.00 0.00 0.00 0
532
  ・ (label_id: 9) 0.00 0.00 0.00 0
533
+ । (label_id: 10) 96.92 98.66 97.78 32118
534
  ؟ (label_id: 11) 0.00 0.00 0.00 0
535
  ، (label_id: 12) 0.00 0.00 0.00 0
536
  ; (label_id: 13) 0.00 0.00 0.00 0
 
538
  ፣ (label_id: 15) 0.00 0.00 0.00 0
539
  ፧ (label_id: 16) 0.00 0.00 0.00 0
540
  -------------------
541
+ micro avg 99.11 99.11 99.11 600877
542
+ macro avg 82.38 90.91 86.11 600877
543
+ weighted avg 99.17 99.11 99.13 600877
544
+ ```
545
+
546
  ```
 
 
547
  cap test report:
548
  label precision recall f1 support
549
+ LOWER (label_id: 0) 97.19 96.72 96.95 2466
550
+ UPPER (label_id: 1) 89.14 90.60 89.86 734
551
+ -------------------
552
+ micro avg 95.31 95.31 95.31 3200
553
+ macro avg 93.17 93.66 93.41 3200
554
+ weighted avg 95.34 95.31 95.33 3200
555
+ ```
556
+
557
+ ```
558
+ seg test report:
559
+ label precision recall f1 support
560
+ NOSTOP (label_id: 0) 100.00 99.99 99.99 569472
561
+ FULLSTOP (label_id: 1) 99.82 99.99 99.91 34405
562
  -------------------
563
+ micro avg 99.99 99.99 99.99 603877
564
+ macro avg 99.91 99.99 99.95 603877
565
+ weighted avg 99.99 99.99 99.99 603877
566
  ```
567
+
568
+ </details>
569
+
570
+
571
+ <details>
572
+ <summary>Arabic</summary>
573
 
574
  ```text
575
+ punct_post test report:
576
+ label precision recall f1 support
577
+ <NULL> (label_id: 0) 99.30 96.94 98.10 688043
578
+ <ACRONYM> (label_id: 1) 93.33 77.78 84.85 18
579
+ . (label_id: 2) 93.31 93.78 93.54 28175
580
+ , (label_id: 3) 0.00 0.00 0.00 0
581
+ ? (label_id: 4) 0.00 0.00 0.00 0
582
+ ? (label_id: 5) 0.00 0.00 0.00 0
583
+ , (label_id: 6) 0.00 0.00 0.00 0
584
+ 。 (label_id: 7) 0.00 0.00 0.00 0
585
+ 、 (label_id: 8) 0.00 0.00 0.00 0
586
+ ・ (label_id: 9) 0.00 0.00 0.00 0
587
+ । (label_id: 10) 0.00 0.00 0.00 0
588
+ ؟ (label_id: 11) 65.93 82.79 73.40 860
589
+ ، (label_id: 12) 44.89 79.20 57.30 20941
590
+ ; (label_id: 13) 0.00 0.00 0.00 0
591
+ ። (label_id: 14) 0.00 0.00 0.00 0
592
+ ፣ (label_id: 15) 0.00 0.00 0.00 0
593
+ ፧ (label_id: 16) 0.00 0.00 0.00 0
594
+ -------------------
595
+ micro avg 96.29 96.29 96.29 738037
596
+ macro avg 79.35 86.10 81.44 738037
597
+ weighted avg 97.49 96.29 96.74 738037
598
+ ```
599
+
600
+ ```
601
+ cap test report:
602
+ label precision recall f1 support
603
+ LOWER (label_id: 0) 97.10 99.49 98.28 4137
604
+ UPPER (label_id: 1) 98.71 92.89 95.71 1729
605
+ -------------------
606
+ micro avg 97.55 97.55 97.55 5866
607
+ macro avg 97.90 96.19 96.99 5866
608
+ weighted avg 97.57 97.55 97.52 5866
609
+ ```
610
+
611
+ ```
612
  seg test report:
613
  label precision recall f1 support
614
+ NOSTOP (label_id: 0) 99.99 99.97 99.98 710456
615
+ FULLSTOP (label_id: 1) 99.39 99.85 99.62 30581
616
  -------------------
617
+ micro avg 99.97 99.97 99.97 741037
618
+ macro avg 99.69 99.91 99.80 741037
619
+ weighted avg 99.97 99.97 99.97 741037
620
  ```
621
 
622
  </details>