Edit model card

bert_image

Overview

Language model: gelectra-base-germanquad-distilled
Language: German
Training data: GermanQuAD train set (~ 12MB)
Eval data: GermanQuAD test set (~ 5MB)
Infrastructure: 1x V100 GPU
Published: Apr 21st, 2021

Details

  • We trained a German question answering model with a gelectra-base model as its basis.
  • The dataset is GermanQuAD, a new, German language dataset, which we hand-annotated and published online.
  • The training dataset is one-way annotated and contains 11518 questions and 11518 answers, while the test dataset is three-way annotated so that there are 2204 questions and with 2204·3−76 = 6536answers, because we removed 76 wrong answers.
  • In addition to the annotations in GermanQuAD, haystack's distillation feature was used for training. deepset/gelectra-large-germanquad was used as the teacher model.

See https://deepset.ai/germanquad for more details and dataset download in SQuAD format.

Hyperparameters

batch_size = 24
n_epochs = 6
max_seq_len = 384
learning_rate = 3e-5
lr_schedule = LinearWarmup
embeds_dropout_prob = 0.1
temperature = 2
distillation_loss_weight = 0.75

Performance

We evaluated the extractive question answering performance on our GermanQuAD test set. Model types and training data are included in the model name. For finetuning XLM-Roberta, we use the English SQuAD v2.0 dataset. The GELECTRA models are warm started on the German translation of SQuAD v1.1 and finetuned on \\germanquad. The human baseline was computed for the 3-way test set by taking one answer as prediction and the other two as ground truth.

"exact": 62.4773139745916
"f1": 80.9488017070188

performancetable

Authors

  • Timo Möller: timo.moeller [at] deepset.ai
  • Julian Risch: julian.risch [at] deepset.ai
  • Malte Pietsch: malte.pietsch [at] deepset.ai
  • Michel Bartels: michel.bartels [at] deepset.ai

About us

deepset logo We bring NLP to the industry via open source!
Our focus: Industry specific language models & large scale QA systems.

Some of our work:

Get in touch: Twitter | LinkedIn | Slack | GitHub Discussions | Website

By the way: we're hiring!

Downloads last month
37,985
Safetensors
Model size
109M params
Tensor type
I64
·
F32
·
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Dataset used to train deepset/gelectra-base-germanquad-distilled