magistermilitum
/

bert_medieval_multilingual

Inference Endpoints

Model card Files Files and versions Community

magistermilitum commited on Mar 12

Commit

dd45bb1

•

1 Parent(s): f17825a

Update README.md

Files changed (1) hide show

README.md +16 -4

README.md CHANGED Viewed

@@ -17,9 +17,21 @@ language:
 ## Model Details
-This is a Fine-tuned version of the multilingual Roberta model on medieval charters. The model is intended to recognize Locations and persons in medieval texts
-in a Flat and nested manner. The train dataset entails 8k annotated texts on medieval latin, french and Spanish from a period ranging from 11th to 15th centuries.
-### How to Get Started with the Model
-The model is intended to be used in a simple way manner:

 ## Model Details
+This is a Fine-tuned version of the multilingual Bert model on medieval texts. The model is intended to be used as a fondation for other ML tasks on NLP and HTR environments.
+The train dataset entails 650M of tokens coming from texts on classical and medieval latin; old french and old Spanish from a period ranging from 5th BC to 16th centuries.
+Several big corpora were cleaned ans transformed to be used during the process training:
+| dataset        | size          | Lang  |
+| ------------- |:-------------:| -----:|
+| CC100      | 3,2Gb | la |
+| Corpus Corporum     | 3,0Gb      |   la |
+| CEMA | 320Mb      |  la+fro   |
+| HOME | 38Mb     |  la+fro   |
+| BFM | 34Mb      |  fro   |
+| AND | 19Mb      |  fro   |
+| CODEA | 13Mb      |  spa   |
+|  | ~6,5Gb      |    |
+|  | 650M tk (4,5Gb)     |   |