rigonsallauka commited on
Commit
4a4def2
1 Parent(s): 5d2bc75

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +48 -3
README.md CHANGED
@@ -1,3 +1,48 @@
1
- ---
2
- license: apache-2.0
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ ---
4
+ # German Medical NER
5
+
6
+ ## Use
7
+ - **Primary Use Case**: This model is designed to extract medical entities such as symptoms, diagnostic tests, and treatments from clinical text in the German language.
8
+ - **Applications**: Suitable for healthcare professionals, clinical data analysis, and research into medical text processing.
9
+ - **Supported Entity Types**:
10
+ - `PROBLEM`: Diseases, symptoms, and medical conditions.
11
+ - `TEST`: Diagnostic procedures and laboratory tests.
12
+ - `TREATMENT`: Medications, therapies, and other medical interventions.
13
+
14
+ ## Training Data
15
+ - **Data Sources**: Annotated datasets, including clinical data and translations of English medical text into German.
16
+ - **Data Augmentation**: The training dataset underwent data augmentation techniques to improve the model's ability to generalize to different text structures.
17
+ - **Dataset Split**:
18
+ - **Training Set**: 80%
19
+ - **Validation Set**: 10%
20
+ - **Test Set**: 10%
21
+
22
+ ## Model Training
23
+ - **Training Configuration**:
24
+ - **Optimizer**: AdamW
25
+ - **Learning Rate**: 3e-5
26
+ - **Batch Size**: 64
27
+ - **Epochs**: 200
28
+ - **Loss Function**: Focal Loss to handle class imbalance
29
+ - **Frameworks**: PyTorch, Hugging Face Transformers, SimpleTransformers
30
+
31
+ ## How to Use
32
+ You can easily use this model with the Hugging Face `transformers` library. Here's an example of how to load and use the model for inference:
33
+
34
+ ```python
35
+ from transformers import AutoTokenizer, AutoModelForTokenClassification
36
+ import torch
37
+
38
+ model_name = "rigonsallauka/german_medical_ner"
39
+
40
+ # Load the tokenizer and model
41
+ tokenizer = AutoTokenizer.from_pretrained(model_name)
42
+ model = AutoModelForTokenClassification.from_pretrained(model_name)
43
+
44
+ # Sample text for inference
45
+ text = "Der Patient klagte über starke Kopfschmerzen und Übelkeit, die seit zwei Tagen anhielten. Zur Linderung der Symptome wurde ihm Paracetamol verschrieben, und er wurde angewiesen, sich auszuruhen und viel Flüssigkeit zu trinken."
46
+
47
+ # Tokenize the input text
48
+ inputs = tokenizer(text, return_tensors="pt")