magistermilitum commited on
Commit
dd45bb1
1 Parent(s): f17825a

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +16 -4
README.md CHANGED
@@ -17,9 +17,21 @@ language:
17
 
18
  ## Model Details
19
 
20
- This is a Fine-tuned version of the multilingual Roberta model on medieval charters. The model is intended to recognize Locations and persons in medieval texts
21
- in a Flat and nested manner. The train dataset entails 8k annotated texts on medieval latin, french and Spanish from a period ranging from 11th to 15th centuries.
22
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
23
 
24
- ### How to Get Started with the Model
25
- The model is intended to be used in a simple way manner:
 
17
 
18
  ## Model Details
19
 
20
+ This is a Fine-tuned version of the multilingual Bert model on medieval texts. The model is intended to be used as a fondation for other ML tasks on NLP and HTR environments.
 
21
 
22
+ The train dataset entails 650M of tokens coming from texts on classical and medieval latin; old french and old Spanish from a period ranging from 5th BC to 16th centuries.
23
+
24
+ Several big corpora were cleaned ans transformed to be used during the process training:
25
+
26
+ | dataset | size | Lang |
27
+ | ------------- |:-------------:| -----:|
28
+ | CC100 | 3,2Gb | la |
29
+ | Corpus Corporum | 3,0Gb | la |
30
+ | CEMA | 320Mb | la+fro |
31
+ | HOME | 38Mb | la+fro |
32
+ | BFM | 34Mb | fro |
33
+ | AND | 19Mb | fro |
34
+ | CODEA | 13Mb | spa |
35
+ | | ~6,5Gb | |
36
+ | | 650M tk (4,5Gb) | |
37