magistermilitum commited on
Commit
f91cfa4
1 Parent(s): dd45bb1

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +10 -10
README.md CHANGED
@@ -23,15 +23,15 @@ The train dataset entails 650M of tokens coming from texts on classical and medi
23
 
24
  Several big corpora were cleaned ans transformed to be used during the process training:
25
 
26
- | dataset | size | Lang |
27
- | ------------- |:-------------:| -----:|
28
- | CC100 | 3,2Gb | la |
29
- | Corpus Corporum | 3,0Gb | la |
30
- | CEMA | 320Mb | la+fro |
31
- | HOME | 38Mb | la+fro |
32
- | BFM | 34Mb | fro |
33
- | AND | 19Mb | fro |
34
- | CODEA | 13Mb | spa |
35
  | | ~6,5Gb | |
36
- | | 650M tk (4,5Gb) | |
37
 
 
23
 
24
  Several big corpora were cleaned ans transformed to be used during the process training:
25
 
26
+ | dataset | size | Lang | dates |
27
+ | ------------- |:-------------:| -----:|-----:|
28
+ | CC100 | 3,2Gb | la | 5th BC - 18th|
29
+ | Corpus Corporum | 3,0Gb | la | 5th BC - 16th |
30
+ | CEMA | 320Mb | la+fro |9th - 15th |
31
+ | HOME | 38Mb | la+fro | 12th - 15th |
32
+ | BFM | 34Mb | fro | 13th - 15th|
33
+ | AND | 19Mb | fro | 13th - 15th|
34
+ | CODEA | 13Mb | spa |12th - 16th |
35
  | | ~6,5Gb | |
36
+ | | 650M tk (4,5Gb) | | |
37