poltextlab
commited on
Commit
•
dbee375
1
Parent(s):
e082a4d
Update README.md
Browse files
README.md
CHANGED
@@ -1,25 +1,52 @@
|
|
1 |
---
|
2 |
license: apache-2.0
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
3 |
---
|
4 |
|
5 |
## Model description
|
6 |
|
7 |
-
Cased fine-tuned BERT model for Hungarian, trained on
|
8 |
|
9 |
## Intended uses & limitations
|
10 |
|
11 |
-
The model can be used as any other (cased) BERT model. It has been tested recognizing
|
12 |
-
*
|
13 |
-
*
|
|
|
|
|
|
|
|
|
|
|
14 |
|
15 |
## Training
|
16 |
|
17 |
-
Fine-tuned version of the original huBERT model (`SZTAKI-HLT/hubert-base-cc`), trained on
|
18 |
|
19 |
## Eval results
|
20 |
|
21 |
| Class | Precision | Recall | F-Score |
|
22 |
|-----|------------|------------|------|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
23 |
|
24 |
|
25 |
## Usage
|
@@ -27,8 +54,8 @@ Fine-tuned version of the original huBERT model (`SZTAKI-HLT/hubert-base-cc`), t
|
|
27 |
```py
|
28 |
from transformers import AutoTokenizer, AutoModelForSequenceClassification
|
29 |
|
30 |
-
tokenizer = AutoTokenizer.from_pretrained("")
|
31 |
-
model = AutoModelForSequenceClassification.from_pretrained("")
|
32 |
```
|
33 |
|
34 |
### BibTeX entry and citation info
|
|
|
1 |
---
|
2 |
license: apache-2.0
|
3 |
+
language:
|
4 |
+
- hu
|
5 |
+
metrics:
|
6 |
+
- accuracy
|
7 |
+
model-index:
|
8 |
+
- name: huBERTPlain
|
9 |
+
results:
|
10 |
+
- task:
|
11 |
+
type: text-classification
|
12 |
+
metrics:
|
13 |
+
- type: f1
|
14 |
+
value: 0.77
|
15 |
---
|
16 |
|
17 |
## Model description
|
18 |
|
19 |
+
Cased fine-tuned BERT model for Hungarian, trained on (manuallay anniated) parliamentary pre-agenda speeches scraped from `parlament.hu`.
|
20 |
|
21 |
## Intended uses & limitations
|
22 |
|
23 |
+
The model can be used as any other (cased) BERT model. It has been tested recognizing emotions at the sentence level in (parliamentary) pre-agenda speeches, where:
|
24 |
+
* 'Label_0': Neutral
|
25 |
+
* 'Label_1': Fear
|
26 |
+
* 'Label_3': Sadness
|
27 |
+
* 'Label_4': Anger
|
28 |
+
* 'Label_5': Disgust
|
29 |
+
* 'Label_6': Success
|
30 |
+
* 'Label_7': Joy
|
31 |
|
32 |
## Training
|
33 |
|
34 |
+
Fine-tuned version of the original huBERT model (`SZTAKI-HLT/hubert-base-cc`), trained on HunEmPoli corpus.
|
35 |
|
36 |
## Eval results
|
37 |
|
38 |
| Class | Precision | Recall | F-Score |
|
39 |
|-----|------------|------------|------|
|
40 |
+
| Fear | 0.625 | 0.625 | 0.625 |
|
41 |
+
| Sadness | 0.8535 | 0.6291 | 0.7243 |
|
42 |
+
| Anger | 0.7857 | 0.3437 | 0.4782 |
|
43 |
+
| Disgust | 0.7154 | 0.8790 | 0.7888 |
|
44 |
+
| Success | 0.8579 | 0.8683 | 0.8631 |
|
45 |
+
| Joy | 0.549 | 0.6363 | 0.5894 |
|
46 |
+
| Trust | 0.4705 | 0.5581 | 0.5106 |
|
47 |
+
| ------------ | ------ | ------ | ------ |
|
48 |
+
| Macro AVG | 0.7134 | 0.6281 | 0.6497 |
|
49 |
+
| Weighted AVG | 0.791 | 0.7791 | 0.7743 |
|
50 |
|
51 |
|
52 |
## Usage
|
|
|
54 |
```py
|
55 |
from transformers import AutoTokenizer, AutoModelForSequenceClassification
|
56 |
|
57 |
+
tokenizer = AutoTokenizer.from_pretrained("poltextlab/HunEmBERT8")
|
58 |
+
model = AutoModelForSequenceClassification.from_pretrained("poltextlab/HunEmBERT8")
|
59 |
```
|
60 |
|
61 |
### BibTeX entry and citation info
|