magistermilitum
/

bert_medieval_multilingual

Inference Endpoints

Model card Files Files and versions Community

magistermilitum commited on Mar 12

Commit

f91cfa4

•

1 Parent(s): dd45bb1

Update README.md

Files changed (1) hide show

README.md +10 -10

README.md CHANGED Viewed

@@ -23,15 +23,15 @@ The train dataset entails 650M of tokens coming from texts on classical and medi
 Several big corpora were cleaned ans transformed to be used during the process training:
-| dataset        | size          | Lang  |
-| ------------- |:-------------:| -----:|
-| CC100      | 3,2Gb | la |
-| Corpus Corporum     | 3,0Gb      |   la |
-| CEMA | 320Mb      |  la+fro   |
-| HOME | 38Mb     |  la+fro   |
-| BFM | 34Mb      |  fro   |
-| AND | 19Mb      |  fro   |
-| CODEA | 13Mb      |  spa   |
 |  | ~6,5Gb      |    |
-|  | 650M tk (4,5Gb)     |   |

 Several big corpora were cleaned ans transformed to be used during the process training:
+| dataset        | size          | Lang  | dates  |
+| ------------- |:-------------:| -----:|-----:|
+| CC100      | 3,2Gb | la | 5th BC - 18th|
+| Corpus Corporum     | 3,0Gb      |   la | 5th BC - 16th |
+| CEMA | 320Mb      |  la+fro   |9th - 15th |
+| HOME | 38Mb     |  la+fro   | 12th - 15th |
+| BFM | 34Mb      |  fro   | 13th - 15th|
+| AND | 19Mb      |  fro   | 13th - 15th|
+| CODEA | 13Mb      |  spa   |12th - 16th |
 |  | ~6,5Gb      |    |
+|  | 650M tk (4,5Gb)     |   | |