---
license: mit
widget:
- text: Universis presentes [MASK] inspecturis
- text: eandem [MASK] per omnia parati observare
- text: yo [MASK] rey de Galicia, de las Indias
- text: en avant contre les choses [MASK] contenues
datasets:
- cc100
- bigscience-historical-texts/Open_Medieval_French
- latinwikipedia
language:
- la
- fr
- es
---

## Model Details

This is a Fine-tuned version of the multilingual Bert model on medieval texts. The model is intended to be used as a fondation for other ML tasks on NLP and HTR environments.

The train dataset entails 650M of tokens coming from texts on classical and medieval latin; old french and old Spanish from a period ranging from 5th BC to 16th centuries.

Several big corpora were cleaned ans transformed to be used during the process training:

| dataset        | size          | Lang  |
| ------------- |:-------------:| -----:|
| CC100      | 3,2Gb | la |
| Corpus Corporum     | 3,0Gb      |   la |
| CEMA | 320Mb      |  la+fro   |
| HOME | 38Mb     |  la+fro   |
| BFM | 34Mb      |  fro   |
| AND | 19Mb      |  fro   |
| CODEA | 13Mb      |  spa   |
|  | ~6,5Gb      |    |
|  | 650M tk (4,5Gb)     |   |