--- language: - es metrics: - accuracy - f1 pipeline_tag: visual-question-answering --- # Model Card for Model ID This is a multimodal model for VQA in Spanish **Github**:https://github.com/pvbastidas/spanish-vqa ## Performance These are the training results of the 5 epoch. Text Transformer: MarIA Image Transformer: BEiT | Epoch | Step | Loss | eval_loss | eval_wups | eval_acc | eval_f1 | # Trainable Parameters | | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | | 1 | 624 | 5.046 | 4.231 | 0.173 | 0.135 | 0.006 | 211M | | 2 | 1248 | 4.198 | 3.896 | 0.224 | 0.198 | 0.013 | 211M | | 3 | 1872 | 3.834 | 3.729 | 0.260 | 0.236 | 0.024 | 211M | | 4 | 2496 | 3.569 | 3.598 | 0.272 | 0.249 | 0.029 | 211M | | 5 | 4680 | 3.358 | 3.566 | 0.274 | 0.251 | 0.030 | 211M |