README.md · poltextlab/HunEmBERT8 at 90341ce6c6321f8c4d7c824d7dbceac23ea49e2b

metadata

license: apache-2.0
language:
  - hu
metrics:
  - accuracy
model-index:
  - name: huBERTPlain
    results:
      - task:
          type: text-classification
        metrics:
          - type: f1
            value: 0.77

Model description

Cased fine-tuned BERT model for Hungarian, trained on (manuallay anniated) parliamentary pre-agenda speeches scraped from parlament.hu.

Intended uses & limitations

The model can be used as any other (cased) BERT model. It has been tested recognizing emotions at the sentence level in (parliamentary) pre-agenda speeches, where:

'Label_0': Neutral
'Label_1': Fear
'Label_2': Sadness
'Label_3': Anger
'Label_4': Disgust
'Label_5': Success
'Label_6': Joy
'Label_7': Trust

Training

Fine-tuned version of the original huBERT model (SZTAKI-HLT/hubert-base-cc), trained on HunEmPoli corpus.

Eval results

Class	Precision	Recall	F-Score
Fear	0.625	0.625	0.625
Sadness	0.8535	0.6291	0.7243
Anger	0.7857	0.3437	0.4782
Disgust	0.7154	0.8790	0.7888
Success	0.8579	0.8683	0.8631
Joy	0.549	0.6363	0.5894
Trust	0.4705	0.5581	0.5106
Macro AVG	0.7134	0.6281	0.6497
Weighted AVG	0.791	0.7791	0.7743

Usage

from transformers import AutoTokenizer, AutoModelForSequenceClassification

tokenizer = AutoTokenizer.from_pretrained("poltextlab/HunEmBERT8")
model = AutoModelForSequenceClassification.from_pretrained("poltextlab/HunEmBERT8")

BibTeX entry and citation info

If you use the model, please cite the following paper:

Bibtex:

@ARTICLE{10149341,
  author={{"U}veges, Istv{\'a}n and Ring, Orsolya},
  journal={IEEE Access}, 
  title={HunEmBERT: a fine-tuned BERT-model for classifying sentiment and emotion in political communication}, 
  year={2023},
  volume={11},
  number={},
  pages={60267-60278},
  doi={10.1109/ACCESS.2023.3285536}
}