--- license: mit widget: - text: Universis presentes [MASK] inspecturis - text: eandem [MASK] per omnia parati observare - text: yo [MASK] rey de Galicia, de las Indias - text: en avant contre les choses [MASK] contenues datasets: - cc100 - bigscience-historical-texts/Open_Medieval_French - latinwikipedia language: - la - fr - es --- ## Model Details This is a Fine-tuned version of the multilingual Bert model on medieval texts. The model is intended to be used as a fondation for other ML tasks on NLP and HTR environments. The train dataset entails 650M of tokens coming from texts on classical and medieval latin; old french and old Spanish from a period ranging from 5th BC to 16th centuries. Several big corpora were cleaned ans transformed to be used during the process training: | dataset | size | Lang | dates | | ------------- |:-------------:| -----:|-----:| | CC100 | 3,2Gb | la | 5th BC - 18th| | Corpus Corporum | 3,0Gb | la | 5th BC - 16th | | CEMA | 320Mb | la+fro |9th - 15th | | HOME | 38Mb | la+fro | 12th - 15th | | BFM | 34Mb | fro | 13th - 15th| | AND | 19Mb | fro | 13th - 15th| | CODEA | 13Mb | spa |12th - 16th | | | ~6,5Gb | | | | 650M tk (4,5Gb) | | |