metadata

library_name: span-marker
tags:
  - span-marker
  - token-classification
  - ner
  - named-entity-recognition
  - generated_from_span_marker_trainer
datasets:
  - imvladikon/nemo_corpus
metrics:
  - precision
  - recall
  - f1
widget:
  - text: >-
      אחר כך הצטרף ל דאלאס מאווריקס מ ה אנ.בי.איי ו חזר לשחק ב אירופה ב ספרד ב
      מדי קאחה בילבאו ו חירונה
  - text: >-
      ב קיץ 1982 ניסה טל ברודי (אז עוזר ה מאמן) להחתימו, אבל בריאנט, ש סבתו
      יהודיה, חתם אז ב פורד קאנטו ו זכה עמ היא ב אותה עונה ב גביע אירופה ל
      אלופות.
  - text: יו"ר ועדת ה נוער נתן סלובטיק אמר ש ה שחקנים של אנחנו לא משתלבים ב אירופה.
  - text: >-
      ב ה סגל ש יתכנס מחר אחר ה צהריים ל מחנה אימונים ב שפיים 17 שחקנים, כולל
      מוזמן חדש שירן אדירי מ מכבי תל אביב.
  - text: >-
      תוצאות אחרות: טורינו 2 (מורלו עצמי, מולר) לצה 0; קאליארי 0 לאציו 1 (פסטה,
      שער עצמי); פיורנטינה 2 (נאפי, פאציונה) גנואה 2 (אורלאנדו, שקוראווי).
pipeline_tag: token-classification
model-index:
  - name: SpanMarker
    results:
      - task:
          type: token-classification
          name: Named Entity Recognition
        dataset:
          name: Unknown
          type: imvladikon/nemo_corpus
          split: test
        metrics:
          - type: f1
            value: 0.7338129496402878
            name: F1
          - type: precision
            value: 0.7577142857142857
            name: Precision
          - type: recall
            value: 0.7113733905579399
            name: Recall

SpanMarker

This is a SpanMarker model trained on the imvladikon/nemo_corpus dataset that can be used for Named Entity Recognition.

Model Details

Model Description

Model Type: SpanMarker
Maximum Sequence Length: 512 tokens
Maximum Entity Length: 100 words
Training Dataset: imvladikon/nemo_corpus

Model Sources

Repository: SpanMarker on GitHub
Thesis: SpanMarker For Named Entity Recognition

Model Labels

Label	Examples
ANG	"יידיש", "גרמנית", "אנגלית"
DUC	"דינמיט", "סובארו", "מרצדס"
EVE	"מצדה", "הצהרת בלפור", "ה שואה"
FAC	"ברזילי", "כלא עזה", "תל - ה שומר"
GPE	"ה שטחים", "שפרעם", "רצועת עזה"
LOC	"שייח רדואן", "גיבאליה", "חאן יונס"
ORG	"כך", "ה ארץ", "מרחב ה גליל"
PER	"רמי רהב", "נימר חוסיין", "איברהים נימר חוסיין"
WOA	"קיטש ו מוות", "קדיש", "ה ארץ"

Evaluation

Metrics

Label	Precision	Recall	F1
all	0.7577	0.7114	0.7338
ANG	0.0	0.0	0.0
DUC	0.0	0.0	0.0
FAC	0.0	0.0	0.0
GPE	0.7085	0.8103	0.7560
LOC	0.5714	0.1951	0.2909
ORG	0.7460	0.6912	0.7176
PER	0.8301	0.8052	0.8175
WOA	0.0	0.0	0.0

Uses

Direct Use for Inference

from span_marker import SpanMarkerModel

# Download from the 🤗 Hub
model = SpanMarkerModel.from_pretrained("span_marker_model_id")
# Run inference
entities = model.predict("יו\"ר ועדת ה נוער נתן סלובטיק אמר ש ה שחקנים של אנחנו לא משתלבים ב אירופה.")

Downstream Use

You can finetune this model on your own dataset.

Click to expand

from span_marker import SpanMarkerModel, Trainer

# Download from the 🤗 Hub
model = SpanMarkerModel.from_pretrained("span_marker_model_id")

# Specify a Dataset with "tokens" and "ner_tag" columns
dataset = load_dataset("conll2003") # For example CoNLL2003

# Initialize a Trainer using the pretrained model & dataset
trainer = Trainer(
    model=model,
    train_dataset=dataset["train"],
    eval_dataset=dataset["validation"],
)
trainer.train()
trainer.save_model("span_marker_model_id-finetuned")

Training Details

Training Set Metrics

Training set	Min	Median	Max
Sentence length	1	25.4427	117
Entities per sentence	0	1.2472	20

Training Hyperparameters

learning_rate: 1e-05
train_batch_size: 2
eval_batch_size: 2
seed: 42
gradient_accumulation_steps: 2
total_train_batch_size: 4
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: linear
lr_scheduler_warmup_ratio: 0.1
num_epochs: 4
mixed_precision_training: Native AMP

Training Results

Epoch	Step	Validation Loss	Validation Precision	Validation Recall	Validation F1	Validation Accuracy
0.4070	1000	0.0352	0.0	0.0	0.0	0.8980
0.8140	2000	0.0327	0.0	0.0	0.0	0.8980
1.2210	3000	0.0224	0.0	0.0	0.0	0.8980
1.6280	4000	0.0149	0.5874	0.2200	0.3201	0.9134
2.0350	5000	0.0137	0.55	0.3895	0.4560	0.9248
2.4420	6000	0.0113	0.6204	0.4313	0.5089	0.9298
2.8490	7000	0.0121	0.5733	0.5075	0.5384	0.9310
3.2560	8000	0.0115	0.5782	0.5236	0.5495	0.9334
3.6630	9000	0.0108	0.6100	0.5354	0.5703	0.9359
0.4070	1000	0.0103	0.6321	0.5880	0.6092	0.9381
0.8140	2000	0.0088	0.6968	0.6288	0.6610	0.9471
1.2210	3000	0.0091	0.6790	0.6695	0.6742	0.9484
1.6280	4000	0.0086	0.6845	0.6845	0.6845	0.9480
2.0350	5000	0.0089	0.6802	0.6845	0.6824	0.9492
2.4420	6000	0.0084	0.6938	0.6953	0.6945	0.9539
2.8490	7000	0.0088	0.6884	0.7039	0.6960	0.9512
3.2560	8000	0.0086	0.6895	0.7124	0.7008	0.9514
3.6630	9000	0.0082	0.6989	0.7049	0.7019	0.9526
0.4070	1000	0.0080	0.7109	0.7124	0.7117	0.9535
0.8140	2000	0.0074	0.7577	0.7114	0.7338	0.9567
1.2210	3000	0.0083	0.7183	0.7414	0.7297	0.9554
1.6280	4000	0.0088	0.6987	0.7339	0.7159	0.9510
2.0350	5000	0.0086	0.7135	0.7296	0.7215	0.9541
2.4420	6000	0.0086	0.7167	0.7382	0.7273	0.9559
2.8490	7000	0.0088	0.7133	0.7554	0.7337	0.9541
3.2560	8000	0.0085	0.7165	0.7511	0.7334	0.9551
3.6630	9000	0.0083	0.7263	0.7489	0.7375	0.9561

Framework Versions

Python: 3.10.12
SpanMarker: 1.5.0
Transformers: 4.35.2
PyTorch: 2.1.0+cu118
Datasets: 2.15.0
Tokenizers: 0.15.0

Citation

BibTeX

@software{Aarsen_SpanMarker,
    author = {Aarsen, Tom},
    license = {Apache-2.0},
    title = {{SpanMarker for Named Entity Recognition}},
    url = {https://github.com/tomaarsen/SpanMarkerNER}
}