Spaces:

somosnlp-hackathon-2022
/

extractive-qa-biomedicine

Sleeping

App Files Files Community

smaximo commited on Apr 3, 2022

Commit

64a1c47

•

1 Parent(s): fcfe744

add links

Browse files

Files changed (1) hide show

app.py +8 -7

app.py CHANGED Viewed

@@ -7,9 +7,9 @@ description = """
 <p style="text-align: justify;">
 Taking into account the existence of masked language models trained on Spanish Biomedical corpus, the objective of this project is to use them to generate extractice QA models for Biomedicine and compare their effectiveness with general masked language models.
-The models were trained on the SQUAD_ES Dataset (automatic translation of the Stanford Question Answering Dataset into Spanish). SQUAD v2 version was chosen in order to include questions that cannot be answered based on a provided context.
-The models were evaluated on https://huggingface.co/datasets/hackathon-pln-es/biomed_squad_es_v2 , a subset of the SQUAD_ES dev dataset containing questions related to the Biomedical domain.
 </p>
 """
 article = """
@@ -26,7 +26,7 @@ article = """
 <th title="Field #8">NoAns_f1</th>
 </tr></thead>
 <tbody><tr>
-<td>hackathon-pln-es/roberta-base-bne-squad2-es</td>
 <td>General</td>
 <td align="right">67.6341</td>
 <td align="right">75.6988</td>
@@ -36,7 +36,7 @@ article = """
 <td align="right">81.2174</td>
 </tr>
 <tr>
-<td>hackathon-pln-es/roberta-base-biomedical-clinical-es-squad2-es</td>
 <td>Biomedical</td>
 <td align="right">66.8426</td>
 <td align="right">75.2346</td>
@@ -46,7 +46,7 @@ article = """
 <td align="right">80.3478</td>
 </tr>
 <tr>
-<td>hackathon-pln-es/roberta-base-biomedical-es-squad2-es</td>
 <td>Biomedical</td>
 <td align="right">67.6341</td>
 <td align="right">74.5612</td>
@@ -56,7 +56,7 @@ article = """
 <td align="right"> 87.1304</td>
 </tr>
 <tr>
-<td>hackathon-pln-es/biomedtra-small-es-squad2-es</td>
 <td>Biomedical</td>
 <td align="right">29.6394</td>
 <td align="right">36.317</td>
@@ -76,10 +76,11 @@ As future work, the following experiments could be carried out:
 <ul>
 <li>Use Biomedical masked-language models that were not trained from scratch from a Biomedical corpus but have been adapted from a general model, so as not to lose words and features of Spanish that are also present in Biomedical questions and articles.
 <li>Create a Biomedical training dataset with SQUAD v2 format.
-<li>Generate a new and bigger validation dataset based on questions and contexts generated directly in Spanish and not translated as in SQUAD_Es v2.
 <li>Ensamble different models.
 </ul>
 </p>
 <h3>Team</h3>
 Santiago Maximo
 """

 <p style="text-align: justify;">
 Taking into account the existence of masked language models trained on Spanish Biomedical corpus, the objective of this project is to use them to generate extractice QA models for Biomedicine and compare their effectiveness with general masked language models.
+The models were trained on the <a href="https://huggingface.co/datasets/squad_es">SQUAD_ES Dataset</a> (automatic translation of the Stanford Question Answering Dataset into Spanish). SQUAD v2 version was chosen in order to include questions that cannot be answered based on a provided context.
+The models were evaluated on <a href="https://huggingface.co/datasets/hackathon-pln-es/biomed_squad_es_v2">BIOMED_SQUAD_ES_V2 Dataset</a> , a subset of the SQUAD_ES dev dataset containing questions related to the Biomedical domain.
 </p>
 """
 article = """
 <th title="Field #8">NoAns_f1</th>
 </tr></thead>
 <tbody><tr>
+<td><a href="https://huggingface.co/hackathon-pln-es/roberta-base-bne-squad2-es">hackathon-pln-es/roberta-base-bne-squad2-es</a></td>
 <td>General</td>
 <td align="right">67.6341</td>
 <td align="right">75.6988</td>
 <td align="right">81.2174</td>
 </tr>
 <tr>
+<td><a href="https://huggingface.co/hackathon-pln-es/roberta-base-biomedical-clinical-es-squad2-es">hackathon-pln-es/roberta-base-biomedical-clinical-es-squad2-es</a></td>
 <td>Biomedical</td>
 <td align="right">66.8426</td>
 <td align="right">75.2346</td>
 <td align="right">80.3478</td>
 </tr>
 <tr>
+<td><a href="https://huggingface.co/hackathon-pln-es/roberta-base-biomedical-es-squad2-es">hackathon-pln-es/roberta-base-biomedical-es-squad2-es</a></td>
 <td>Biomedical</td>
 <td align="right">67.6341</td>
 <td align="right">74.5612</td>
 <td align="right"> 87.1304</td>
 </tr>
 <tr>
+<td><a href="https://huggingface.co/hackathon-pln-es/biomedtra-small-es-squad2-es">hackathon-pln-es/biomedtra-small-es-squad2-es</a></td>
 <td>Biomedical</td>
 <td align="right">29.6394</td>
 <td align="right">36.317</td>
 <ul>
 <li>Use Biomedical masked-language models that were not trained from scratch from a Biomedical corpus but have been adapted from a general model, so as not to lose words and features of Spanish that are also present in Biomedical questions and articles.
 <li>Create a Biomedical training dataset with SQUAD v2 format.
+<li>Generate a new and larger Spanish Biomedical validation dataset, not translated from English as in the case of SQUAD_ES Dataset.
 <li>Ensamble different models.
 </ul>
 </p>
 <h3>Team</h3>
 Santiago Maximo
 """