---
language:
- es
metrics:
- accuracy
- f1
pipeline_tag: visual-question-answering
---
# Model Card for Model ID

<!-- Provide a quick summary of what the model is/does. -->

This is a multimodal model for VQA in Spanish

**Github**:https://github.com/pvbastidas/spanish-vqa

## Performance

These are the training results of the 5 epoch.

Text Transformer: MarIA

Image Transformer: BEiT

| Epoch | Step | Loss | eval_loss | eval_wups | eval_acc | eval_f1 | # Trainable Parameters |
| :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | 
  | 1 | 624 | 5.046 | 4.231 | 0.173 | 0.135 | 0.006 | 211M |
| 2 | 1248 | 4.198 | 3.896 | 0.224 | 0.198 | 0.013 | 211M |
| 3 | 1872 | 3.834 | 3.729 | 0.260 | 0.236 | 0.024 | 211M |
| 4 | 2496 | 3.569 | 3.598 | 0.272 | 0.249 | 0.029 | 211M |
| 5 | 4680 | 3.358 | 3.566 | 0.274 | 0.251 | 0.030 | 211M |