--- license: mit widget: - text: Universis presentes [MASK] inspecturis - text: eandem [MASK] per omnia parati observare - text: yo [MASK] rey de Galicia, de las Indias - text: en avant contre les choses [MASK] contenues datasets: - cc100 - bigscience-historical-texts/Open_Medieval_French - latinwikipedia language: - la - fr - es --- ## Model Details This is a Fine-tuned version of the multilingual Bert model on medieval texts. The model is intended to be used as a fondation for other ML tasks on NLP and HTR environments. The train dataset entails 650M of tokens coming from texts on classical and medieval latin; old french and old Spanish from a period ranging from 5th BC to 16th centuries. Several big corpora were cleaned ans transformed to be used during the process training: | dataset | size | Lang | | ------------- |:-------------:| -----:| | CC100 | 3,2Gb | la | | Corpus Corporum | 3,0Gb | la | | CEMA | 320Mb | la+fro | | HOME | 38Mb | la+fro | | BFM | 34Mb | fro | | AND | 19Mb | fro | | CODEA | 13Mb | spa | | | ~6,5Gb | | | | 650M tk (4,5Gb) | |