conll_ner_with_bert / README.md

huseyincenik

Update README.md

0f8a1bf verified 23 days ago

preview code

raw

history blame contribute delete

No virus

6.14 kB

	---
	library_name: transformers
	license: apache-2.0
	base_model: bert-base-uncased
	tags:
	- generated_from_keras_callback
	model-index:
	- name: huseyincenik/conll_ner_with_bert
	results: []
	datasets:
	- tner/conll2003
	language:
	- en
	metrics:
	- accuracy
	pipeline_tag: token-classification
	---

	<!-- This model card has been generated automatically according to the information Keras had access to. You should
	probably proofread and complete it, then remove this comment. -->

	# huseyincenik/conll_ner_with_bert

	This model is a fine-tuned version of [bert-base-uncased](https://huggingface.co/bert-base-uncased) on the CoNLL-2003 dataset for Named Entity Recognition (NER).

	## Model description

	This model has been trained to perform Named Entity Recognition (NER) and is based on the BERT architecture. It was fine-tuned on the CoNLL-2003 dataset, a standard dataset for NER tasks.

	## Intended uses & limitations

	### Intended Uses

	- Named Entity Recognition: This model is designed to identify and classify named entities in text into categories such as location (LOC), organization (ORG), person (PER), and miscellaneous (MISC).

	### Limitations

	- Domain Specificity: The model was fine-tuned on the CoNLL-2003 dataset, which consists of news articles. It may not generalize well to other domains or types of text not represented in the training data.
	- Subword Tokens: The model may occasionally tag subword tokens as entities, requiring post-processing to handle these cases.

	## Training and evaluation data
	- Training Dataset: CoNLL-2003

	- Training Evaluation Metrics:
	\| Label \| Precision \| Recall \| F1-Score \| Support \|
	\|---------\|-----------\|--------\|----------\|---------\|
	\| B-PER \| 0.98 \| 0.98 \| 0.98 \| 11273 \|
	\| I-PER \| 0.98 \| 0.99 \| 0.99 \| 9323 \|
	\| B-ORG \| 0.88 \| 0.92 \| 0.90 \| 10447 \|
	\| I-ORG \| 0.81 \| 0.92 \| 0.86 \| 5137 \|
	\| B-LOC \| 0.86 \| 0.94 \| 0.90 \| 9621 \|
	\| I-LOC \| 1.00 \| 0.08 \| 0.14 \| 1267 \|
	\| B-MISC \| 0.81 \| 0.73 \| 0.77 \| 4793 \|
	\| I-MISC \| 0.83 \| 0.36 \| 0.50 \| 1329 \|
	\| Micro Avg \| 0.90 \| 0.90 \| 0.90 \| 53190 \|
	\| Macro Avg \| 0.89 \| 0.74 \| 0.75 \| 53190 \|
	\| Weighted Avg \| 0.90 \| 0.90 \| 0.89 \| 53190 \|


	- Validation Evaluation Metrics:
	\| Label \| Precision \| Recall \| F1-Score \| Support \|
	\|---------\|-----------\|--------\|----------\|---------\|
	\| B-PER \| 0.97 \| 0.98 \| 0.97 \| 3018 \|
	\| I-PER \| 0.98 \| 0.98 \| 0.98 \| 2741 \|
	\| B-ORG \| 0.86 \| 0.91 \| 0.88 \| 2056 \|
	\| I-ORG \| 0.77 \| 0.81 \| 0.79 \| 900 \|
	\| B-LOC \| 0.86 \| 0.94 \| 0.90 \| 2618 \|
	\| I-LOC \| 1.00 \| 0.10 \| 0.18 \| 281 \|
	\| B-MISC \| 0.77 \| 0.74 \| 0.76 \| 1231 \|
	\| I-MISC \| 0.77 \| 0.34 \| 0.48 \| 390 \|
	\| Micro Avg \| 0.90 \| 0.89 \| 0.89 \| 13235 \|
	\| Macro Avg \| 0.87 \| 0.73 \| 0.74 \| 13235 \|
	\| Weighted Avg \| 0.90 \| 0.89 \| 0.88 \| 13235 \|


	- Test Evaluation Metrics:
	\| Label \| Precision \| Recall \| F1-Score \| Support \|
	\|---------\|-----------\|--------\|----------\|---------\|
	\| B-PER \| 0.96 \| 0.95 \| 0.96 \| 2714 \|
	\| I-PER \| 0.98 \| 0.99 \| 0.98 \| 2487 \|
	\| B-ORG \| 0.81 \| 0.87 \| 0.84 \| 2588 \|
	\| I-ORG \| 0.74 \| 0.87 \| 0.80 \| 1050 \|
	\| B-LOC \| 0.81 \| 0.90 \| 0.85 \| 2121 \|
	\| I-LOC \| 0.89 \| 0.12 \| 0.22 \| 276 \|
	\| B-MISC \| 0.75 \| 0.67 \| 0.71 \| 996 \|
	\| I-MISC \| 0.85 \| 0.49 \| 0.62 \| 241 \|
	\| Micro Avg \| 0.87 \| 0.88 \| 0.87 \| 12473 \|
	\| Macro Avg \| 0.85 \| 0.73 \| 0.75 \| 12473 \|
	\| Weighted Avg \| 0.87 \| 0.88 \| 0.86 \| 12473 \|




	## Training procedure

	### Training Hyperparameters

	- Optimizer: AdamWeightDecay
	- Learning Rate: 2e-05
	- Decay Schedule: PolynomialDecay
	- Warmup Steps: 0.1
	- Weight Decay Rate: 0.01

	- training_precision: float32

	### Training results

	\| Train Loss \| Validation Loss \| Epoch \|
	\|:----------:\|:---------------:\|:-----:\|
	\| 0.1016 \| 0.0254 \| 0 \|
	\| 0.0228 \| 0.0180 \| 1 \|

	### Optimizer Details

	```python
	from transformers import create_optimizer

	batch_size = 32
	num_train_epochs = 2
	num_train_steps = (len(tokenized_conll["train"]) // batch_size) * num_train_epochs

	optimizer, lr_schedule = create_optimizer(
	init_lr=2e-5,
	num_train_steps=num_train_steps,
	weight_decay_rate=0.01,
	num_warmup_steps=0.1
	)
	```

	## How to Use

	### Using a Pipeline

	```python
	from transformers import pipeline

	pipe = pipeline("token-classification", model="huseyincenik/conll_ner_with_bert")

	from transformers import AutoTokenizer, AutoModelForTokenClassification

	tokenizer = AutoTokenizer.from_pretrained("huseyincenik/conll_ner_with_bert")
	model = AutoModelForTokenClassification.from_pretrained("huseyincenik/conll_ner_with_bert")

	```

	Abbreviation\|Description
	-\|-
	O\|Outside of a named entity
	B-MISC \|Beginning of a miscellaneous entity right after another miscellaneous entity
	I-MISC \| Miscellaneous entity
	B-PER \|Beginning of a person’s name right after another person’s name
	I-PER \|Person’s name
	B-ORG \|Beginning of an organization right after another organization
	I-ORG \|organization
	B-LOC \|Beginning of a location right after another location
	I-LOC \|Location


	### CoNLL-2003 English Dataset Statistics
	This dataset was derived from the Reuters corpus which consists of Reuters news stories. You can read more about how this dataset was created in the CoNLL-2003 paper.

	#### # of training examples per entity type
	Dataset\|LOC\|MISC\|ORG\|PER
	-\|-\|-\|-\|-
	Train\|7140\|3438\|6321\|6600
	Dev\|1837\|922\|1341\|1842
	Test\|1668\|702\|1661\|1617

	#### # of articles/sentences/tokens per dataset
	Dataset \|Articles \|Sentences \|Tokens
	-\|-\|-\|-
	Train \|946 \|14,987 \|203,621
	Dev \|216 \|3,466 \|51,362
	Test \|231 \|3,684 \|46,435

	### Framework versions

	- Transformers 4.45.0.dev0
	- TensorFlow 2.17.0
	- Datasets 2.21.0
	- Tokenizers 0.19.1

	---
	library_name: transformers
	license: apache-2.0
	base_model: bert-base-uncased
	tags:
	- generated_from_keras_callback
	model-index:
	- name: huseyincenik/conll_ner_with_bert
	results: []
	datasets:
	- tner/conll2003
	language:
	- en
	metrics:
	- accuracy
	pipeline_tag: token-classification
	---

	<!-- This model card has been generated automatically according to the information Keras had access to. You should
	probably proofread and complete it, then remove this comment. -->

	# huseyincenik/conll_ner_with_bert

	This model is a fine-tuned version of [bert-base-uncased](https://huggingface.co/bert-base-uncased) on the CoNLL-2003 dataset for Named Entity Recognition (NER).

	## Model description

	This model has been trained to perform Named Entity Recognition (NER) and is based on the BERT architecture. It was fine-tuned on the CoNLL-2003 dataset, a standard dataset for NER tasks.

	## Intended uses & limitations

	### Intended Uses

	- Named Entity Recognition: This model is designed to identify and classify named entities in text into categories such as location (LOC), organization (ORG), person (PER), and miscellaneous (MISC).

	### Limitations

	- Domain Specificity: The model was fine-tuned on the CoNLL-2003 dataset, which consists of news articles. It may not generalize well to other domains or types of text not represented in the training data.
	- Subword Tokens: The model may occasionally tag subword tokens as entities, requiring post-processing to handle these cases.

	## Training and evaluation data
	- Training Dataset: CoNLL-2003

	- Training Evaluation Metrics:
	\| Label \| Precision \| Recall \| F1-Score \| Support \|
	\|---------\|-----------\|--------\|----------\|---------\|
	\| B-PER \| 0.98 \| 0.98 \| 0.98 \| 11273 \|
	\| I-PER \| 0.98 \| 0.99 \| 0.99 \| 9323 \|
	\| B-ORG \| 0.88 \| 0.92 \| 0.90 \| 10447 \|
	\| I-ORG \| 0.81 \| 0.92 \| 0.86 \| 5137 \|
	\| B-LOC \| 0.86 \| 0.94 \| 0.90 \| 9621 \|
	\| I-LOC \| 1.00 \| 0.08 \| 0.14 \| 1267 \|
	\| B-MISC \| 0.81 \| 0.73 \| 0.77 \| 4793 \|
	\| I-MISC \| 0.83 \| 0.36 \| 0.50 \| 1329 \|
	\| Micro Avg \| 0.90 \| 0.90 \| 0.90 \| 53190 \|
	\| Macro Avg \| 0.89 \| 0.74 \| 0.75 \| 53190 \|
	\| Weighted Avg \| 0.90 \| 0.90 \| 0.89 \| 53190 \|


	- Validation Evaluation Metrics:
	\| Label \| Precision \| Recall \| F1-Score \| Support \|
	\|---------\|-----------\|--------\|----------\|---------\|
	\| B-PER \| 0.97 \| 0.98 \| 0.97 \| 3018 \|
	\| I-PER \| 0.98 \| 0.98 \| 0.98 \| 2741 \|
	\| B-ORG \| 0.86 \| 0.91 \| 0.88 \| 2056 \|
	\| I-ORG \| 0.77 \| 0.81 \| 0.79 \| 900 \|
	\| B-LOC \| 0.86 \| 0.94 \| 0.90 \| 2618 \|
	\| I-LOC \| 1.00 \| 0.10 \| 0.18 \| 281 \|
	\| B-MISC \| 0.77 \| 0.74 \| 0.76 \| 1231 \|
	\| I-MISC \| 0.77 \| 0.34 \| 0.48 \| 390 \|
	\| Micro Avg \| 0.90 \| 0.89 \| 0.89 \| 13235 \|
	\| Macro Avg \| 0.87 \| 0.73 \| 0.74 \| 13235 \|
	\| Weighted Avg \| 0.90 \| 0.89 \| 0.88 \| 13235 \|


	- Test Evaluation Metrics:
	\| Label \| Precision \| Recall \| F1-Score \| Support \|
	\|---------\|-----------\|--------\|----------\|---------\|
	\| B-PER \| 0.96 \| 0.95 \| 0.96 \| 2714 \|
	\| I-PER \| 0.98 \| 0.99 \| 0.98 \| 2487 \|
	\| B-ORG \| 0.81 \| 0.87 \| 0.84 \| 2588 \|
	\| I-ORG \| 0.74 \| 0.87 \| 0.80 \| 1050 \|
	\| B-LOC \| 0.81 \| 0.90 \| 0.85 \| 2121 \|
	\| I-LOC \| 0.89 \| 0.12 \| 0.22 \| 276 \|
	\| B-MISC \| 0.75 \| 0.67 \| 0.71 \| 996 \|
	\| I-MISC \| 0.85 \| 0.49 \| 0.62 \| 241 \|
	\| Micro Avg \| 0.87 \| 0.88 \| 0.87 \| 12473 \|
	\| Macro Avg \| 0.85 \| 0.73 \| 0.75 \| 12473 \|
	\| Weighted Avg \| 0.87 \| 0.88 \| 0.86 \| 12473 \|




	## Training procedure

	### Training Hyperparameters

	- Optimizer: AdamWeightDecay
	- Learning Rate: 2e-05
	- Decay Schedule: PolynomialDecay
	- Warmup Steps: 0.1
	- Weight Decay Rate: 0.01

	- training_precision: float32

	### Training results

	\| Train Loss \| Validation Loss \| Epoch \|
	\|:----------:\|:---------------:\|:-----:\|
	\| 0.1016 \| 0.0254 \| 0 \|
	\| 0.0228 \| 0.0180 \| 1 \|

	### Optimizer Details

	```python
	from transformers import create_optimizer

	batch_size = 32
	num_train_epochs = 2
	num_train_steps = (len(tokenized_conll["train"]) // batch_size) * num_train_epochs

	optimizer, lr_schedule = create_optimizer(
	init_lr=2e-5,
	num_train_steps=num_train_steps,
	weight_decay_rate=0.01,
	num_warmup_steps=0.1
	)
	```

	## How to Use

	### Using a Pipeline

	```python
	from transformers import pipeline

	pipe = pipeline("token-classification", model="huseyincenik/conll_ner_with_bert")

	from transformers import AutoTokenizer, AutoModelForTokenClassification

	tokenizer = AutoTokenizer.from_pretrained("huseyincenik/conll_ner_with_bert")
	model = AutoModelForTokenClassification.from_pretrained("huseyincenik/conll_ner_with_bert")

	```

	Abbreviation\|Description
	-\|-
	O\|Outside of a named entity
	B-MISC \|Beginning of a miscellaneous entity right after another miscellaneous entity
	I-MISC \| Miscellaneous entity
	B-PER \|Beginning of a person’s name right after another person’s name
	I-PER \|Person’s name
	B-ORG \|Beginning of an organization right after another organization
	I-ORG \|organization
	B-LOC \|Beginning of a location right after another location
	I-LOC \|Location


	### CoNLL-2003 English Dataset Statistics
	This dataset was derived from the Reuters corpus which consists of Reuters news stories. You can read more about how this dataset was created in the CoNLL-2003 paper.

	#### # of training examples per entity type
	Dataset\|LOC\|MISC\|ORG\|PER
	-\|-\|-\|-\|-
	Train\|7140\|3438\|6321\|6600
	Dev\|1837\|922\|1341\|1842
	Test\|1668\|702\|1661\|1617

	#### # of articles/sentences/tokens per dataset
	Dataset \|Articles \|Sentences \|Tokens
	-\|-\|-\|-
	Train \|946 \|14,987 \|203,621
	Dev \|216 \|3,466 \|51,362
	Test \|231 \|3,684 \|46,435

	### Framework versions

	- Transformers 4.45.0.dev0
	- TensorFlow 2.17.0
	- Datasets 2.21.0
	- Tokenizers 0.19.1