rigonsallauka commited on
Commit
cb4040c
1 Parent(s): f615ee4

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +67 -3
README.md CHANGED
@@ -1,3 +1,67 @@
1
- ---
2
- license: apache-2.0
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ datasets:
4
+ - rigonsallauka/english_ner_dataset
5
+ language:
6
+ - en
7
+ metrics:
8
+ - f1
9
+ - precision
10
+ - recall
11
+ - confusion_matrix
12
+ base_model:
13
+ - google-bert/bert-base-cased
14
+ pipeline_tag: token-classification
15
+ tags:
16
+ - NER
17
+ - medical
18
+ - symptom
19
+ - extraction
20
+ - english
21
+ ---
22
+
23
+ # Slovenian Medical NER
24
+
25
+ ## Use
26
+ - **Primary Use Case**: This model is designed to extract medical entities such as symptoms, diagnostic tests, and treatments from clinical text in the Slovenian language.
27
+ - **Applications**: Suitable for healthcare professionals, clinical data analysis, and research into medical text processing.
28
+ - **Supported Entity Types**:
29
+ - `PROBLEM`: Diseases, symptoms, and medical conditions.
30
+ - `TEST`: Diagnostic procedures and laboratory tests.
31
+ - `TREATMENT`: Medications, therapies, and other medical interventions.
32
+
33
+ ## Training Data
34
+ - **Data Sources**: Annotated datasets, including clinical data and translations of English medical text into Slovenian.
35
+ - **Data Augmentation**: The training dataset underwent data augmentation techniques to improve the model's ability to generalize to different text structures.
36
+ - **Dataset Split**:
37
+ - **Training Set**: 80%
38
+ - **Validation Set**: 10%
39
+ - **Test Set**: 10%
40
+
41
+ ## Model Training
42
+ - **Training Configuration**:
43
+ - **Optimizer**: AdamW
44
+ - **Learning Rate**: 3e-5
45
+ - **Batch Size**: 64
46
+ - **Epochs**: 200
47
+ - **Loss Function**: Focal Loss to handle class imbalance
48
+ - **Frameworks**: PyTorch, Hugging Face Transformers, SimpleTransformers
49
+
50
+ ## How to Use
51
+ You can easily use this model with the Hugging Face `transformers` library. Here's an example of how to load and use the model for inference:
52
+
53
+ ```python
54
+ from transformers import AutoTokenizer, AutoModelForTokenClassification
55
+ import torch
56
+
57
+ model_name = "rigonsallauka/english_medical_ner"
58
+
59
+ # Load the tokenizer and model
60
+ tokenizer = AutoTokenizer.from_pretrained(model_name)
61
+ model = AutoModelForTokenClassification.from_pretrained(model_name)
62
+
63
+ # Sample text for inference
64
+ text = "The patient complained of severe headaches and nausea that had persisted for two days. To alleviate the symptoms, he was prescribed paracetamol and advised to rest and drink plenty of fluids."
65
+
66
+ # Tokenize the input text
67
+ inputs = tokenizer(text, return_tensors="pt")