HUMADEX
/

english_medical_ner

Token Classification

Model card Files Files and versions Community

rigonsallauka commited on 11 days ago

Commit

cb4040c

•

1 Parent(s): f615ee4

Update README.md

Files changed (1) hide show

README.md +67 -3

README.md CHANGED Viewed

@@ -1,3 +1,67 @@
----
-license: apache-2.0
----

+---
+license: apache-2.0
+datasets:
+- rigonsallauka/english_ner_dataset
+language:
+- en
+metrics:
+- f1
+- precision
+- recall
+- confusion_matrix
+base_model:
+- google-bert/bert-base-cased
+pipeline_tag: token-classification
+tags:
+- NER
+- medical
+- symptom
+- extraction
+- english
+---
+# Slovenian Medical NER
+## Use
+- **Primary Use Case**: This model is designed to extract medical entities such as symptoms, diagnostic tests, and treatments from clinical text in the Slovenian language.
+- **Applications**: Suitable for healthcare professionals, clinical data analysis, and research into medical text processing.
+- **Supported Entity Types**:
+  - `PROBLEM`: Diseases, symptoms, and medical conditions.
+  - `TEST`: Diagnostic procedures and laboratory tests.
+  - `TREATMENT`: Medications, therapies, and other medical interventions.
+## Training Data
+- **Data Sources**: Annotated datasets, including clinical data and translations of English medical text into Slovenian.
+- **Data Augmentation**: The training dataset underwent data augmentation techniques to improve the model's ability to generalize to different text structures.
+- **Dataset Split**:
+  - **Training Set**: 80%
+  - **Validation Set**: 10%
+  - **Test Set**: 10%
+## Model Training
+- **Training Configuration**:
+  - **Optimizer**: AdamW
+  - **Learning Rate**: 3e-5
+  - **Batch Size**: 64
+  - **Epochs**: 200
+  - **Loss Function**: Focal Loss to handle class imbalance
+- **Frameworks**: PyTorch, Hugging Face Transformers, SimpleTransformers
+## How to Use
+You can easily use this model with the Hugging Face `transformers` library. Here's an example of how to load and use the model for inference:
+```python
+from transformers import AutoTokenizer, AutoModelForTokenClassification
+import torch
+model_name = "rigonsallauka/english_medical_ner"
+# Load the tokenizer and model
+tokenizer = AutoTokenizer.from_pretrained(model_name)
+model = AutoModelForTokenClassification.from_pretrained(model_name)
+# Sample text for inference
+text = "The patient complained of severe headaches and nausea that had persisted for two days. To alleviate the symptoms, he was prescribed paracetamol and advised to rest and drink plenty of fluids."
+# Tokenize the input text
+inputs = tokenizer(text, return_tensors="pt")