import gradio as gr from transformers import pipeline import torch title = "Extractive QA Biomedicine" description = """

Taking into account the existence of masked language models trained on Spanish Biomedical corpus, the objective of this project is to use them to generate extractice QA models for Biomedicine and compare their effectiveness with general masked language models. The models were trained on the SQUAD_ES Dataset (automatic translation of the Stanford Question Answering Dataset into Spanish). SQUAD v2 version was chosen in order to include questions that cannot be answered based on a provided context. The models were evaluated on https://huggingface.co/datasets/hackathon-pln-es/biomed_squad_es_v2 , a subset of the SQUAD_ES dev dataset containing questions related to the Biomedical domain.

""" article = """

Results

Model Base Model Domain exact f1 HasAns_exact HasAns_f1 NoAns_exact NoAns_f1
hackathon-pln-es/roberta-base-bne-squad2-es General 67.6341 75.6988 53.7367 70.0526 81.2174 81.2174
hackathon-pln-es/roberta-base-biomedical-clinical-es-squad2-es Biomedical 66.8426 75.2346 53.0249 70.0031 80.3478 80.3478
hackathon-pln-es/roberta-base-biomedical-es-squad2-es Biomedical 67.6341 74.5612 47.6868 61.7012 87.1304 87.1304
hackathon-pln-es/biomedtra-small-es-squad2-es Biomedical 29.6394 36.317 32.2064 45.716 27.1304 27.1304

Conclusion and Future Work

If F1 score is considered, the results show that there may be no advantage in using domain-specific masked language models to generate Biomedical QA models. In any case, close results are observed for the biomedical roberta-based models in comparison with the general roberta-based model.