---
library_name: transformers
license: apache-2.0
base_model: bert-base-uncased
tags:
- generated_from_keras_callback
model-index:
- name: huseyincenik/conll_ner_with_bert
  results: []
datasets:
- tner/conll2003
language:
- en
metrics:
- accuracy
pipeline_tag: token-classification
---

<!-- This model card has been generated automatically according to the information Keras had access to. You should
probably proofread and complete it, then remove this comment. -->

# huseyincenik/conll_ner_with_bert

This model is a fine-tuned version of [bert-base-uncased](https://huggingface.co/bert-base-uncased) on the CoNLL-2003 dataset for Named Entity Recognition (NER). 

## Model description

This model has been trained to perform Named Entity Recognition (NER) and is based on the BERT architecture. It was fine-tuned on the CoNLL-2003 dataset, a standard dataset for NER tasks. 

## Intended uses & limitations

### Intended Uses

- **Named Entity Recognition**: This model is designed to identify and classify named entities in text into categories such as location (LOC), organization (ORG), person (PER), and miscellaneous (MISC).

### Limitations

- **Domain Specificity**: The model was fine-tuned on the CoNLL-2003 dataset, which consists of news articles. It may not generalize well to other domains or types of text not represented in the training data.
- **Subword Tokens**: The model may occasionally tag subword tokens as entities, requiring post-processing to handle these cases.

## Training and evaluation data
- **Training Dataset**: CoNLL-2003

- **Training Evaluation Metrics**:
| Label   | Precision | Recall | F1-Score | Support |
|---------|-----------|--------|----------|---------|
| B-PER   | 0.98      | 0.98   | 0.98     | 11273   |
| I-PER   | 0.98      | 0.99   | 0.99     | 9323    |
| B-ORG   | 0.88      | 0.92   | 0.90     | 10447   |
| I-ORG   | 0.81      | 0.92   | 0.86     | 5137    |
| B-LOC   | 0.86      | 0.94   | 0.90     | 9621    |
| I-LOC   | 1.00      | 0.08   | 0.14     | 1267    |
| B-MISC  | 0.81      | 0.73   | 0.77     | 4793    |
| I-MISC  | 0.83      | 0.36   | 0.50     | 1329    |
| **Micro Avg** | **0.90** | **0.90** | **0.90** | **53190** |
| **Macro Avg** | **0.89** | **0.74** | **0.75** | **53190** |
| **Weighted Avg** | **0.90** | **0.90** | **0.89** | **53190** |


- **Validation Evaluation Metrics**:
| Label   | Precision | Recall | F1-Score | Support |
|---------|-----------|--------|----------|---------|
| B-PER   | 0.97      | 0.98   | 0.97     | 3018    |
| I-PER   | 0.98      | 0.98   | 0.98     | 2741    |
| B-ORG   | 0.86      | 0.91   | 0.88     | 2056    |
| I-ORG   | 0.77      | 0.81   | 0.79     | 900     |
| B-LOC   | 0.86      | 0.94   | 0.90     | 2618    |
| I-LOC   | 1.00      | 0.10   | 0.18     | 281     |
| B-MISC  | 0.77      | 0.74   | 0.76     | 1231    |
| I-MISC  | 0.77      | 0.34   | 0.48     | 390     |
| **Micro Avg** | **0.90** | **0.89** | **0.89** | **13235** |
| **Macro Avg** | **0.87** | **0.73** | **0.74** | **13235** |
| **Weighted Avg** | **0.90** | **0.89** | **0.88** | **13235** |


- **Test Evaluation Metrics**:
| Label   | Precision | Recall | F1-Score | Support |
|---------|-----------|--------|----------|---------|
| B-PER   | 0.96      | 0.95   | 0.96     | 2714    |
| I-PER   | 0.98      | 0.99   | 0.98     | 2487    |
| B-ORG   | 0.81      | 0.87   | 0.84     | 2588    |
| I-ORG   | 0.74      | 0.87   | 0.80     | 1050    |
| B-LOC   | 0.81      | 0.90   | 0.85     | 2121    |
| I-LOC   | 0.89      | 0.12   | 0.22     | 276     |
| B-MISC  | 0.75      | 0.67   | 0.71     | 996     |
| I-MISC  | 0.85      | 0.49   | 0.62     | 241     |
| **Micro Avg** | **0.87** | **0.88** | **0.87** | **12473** |
| **Macro Avg** | **0.85** | **0.73** | **0.75** | **12473** |
| **Weighted Avg** | **0.87** | **0.88** | **0.86** | **12473** |


## Training procedure

### Training Hyperparameters

- **Optimizer**: AdamWeightDecay
  - Learning Rate: 2e-05
  - Decay Schedule: PolynomialDecay
  - Warmup Steps: 0.1
  - Weight Decay Rate: 0.01

- training_precision: float32

### Training results

| Train Loss | Validation Loss | Epoch |
|:----------:|:---------------:|:-----:|
| 0.1016     | 0.0254          | 0     |
| 0.0228     | 0.0180          | 1     |

### Optimizer Details

```python
from transformers import create_optimizer

batch_size = 32
num_train_epochs = 2
num_train_steps = (len(tokenized_conll["train"]) // batch_size) * num_train_epochs

optimizer, lr_schedule = create_optimizer(
    init_lr=2e-5,
    num_train_steps=num_train_steps,
    weight_decay_rate=0.01,
    num_warmup_steps=0.1
)
```

## How to Use

### Using a Pipeline

```python
from transformers import pipeline

pipe = pipeline("token-classification", model="huseyincenik/conll_ner_with_bert")

from transformers import AutoTokenizer, AutoModelForTokenClassification

tokenizer = AutoTokenizer.from_pretrained("huseyincenik/conll_ner_with_bert")
model = AutoModelForTokenClassification.from_pretrained("huseyincenik/conll_ner_with_bert")

```

Abbreviation|Description
-|-
O|Outside of a named entity
B-MISC |Beginning of a miscellaneous entity right after another miscellaneous entity
I-MISC | Miscellaneous entity
B-PER |Beginning of a person’s name right after another person’s name
I-PER |Person’s name
B-ORG |Beginning of an organization right after another organization
I-ORG |organization
B-LOC |Beginning of a location right after another location
I-LOC |Location


### CoNLL-2003 English Dataset Statistics
This dataset was derived from the Reuters corpus which consists of Reuters news stories. You can read more about how this dataset was created in the CoNLL-2003 paper. 

#### # of training examples per entity type
Dataset|LOC|MISC|ORG|PER
-|-|-|-|-
Train|7140|3438|6321|6600
Dev|1837|922|1341|1842
Test|1668|702|1661|1617

#### # of articles/sentences/tokens per dataset
Dataset |Articles |Sentences |Tokens
-|-|-|-
Train |946 |14,987 |203,621
Dev |216 |3,466 |51,362
Test |231 |3,684 |46,435

### Framework versions

- Transformers 4.45.0.dev0
- TensorFlow 2.17.0
- Datasets 2.21.0
- Tokenizers 0.19.1