poltextlab
commited on
Commit
•
15c7171
1
Parent(s):
70a98f5
Update README.md
Browse files
README.md
CHANGED
@@ -1,25 +1,53 @@
|
|
1 |
---
|
2 |
license: apache-2.0
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
3 |
---
|
4 |
|
5 |
## Model description
|
6 |
|
7 |
-
Cased fine-tuned BERT model for Hungarian, trained on
|
8 |
|
9 |
## Intended uses & limitations
|
10 |
|
11 |
-
The model can be used as any other (cased) BERT model. It has been tested recognizing
|
12 |
-
*
|
13 |
-
*
|
|
|
14 |
|
15 |
## Training
|
16 |
|
17 |
-
Fine-tuned version of the original huBERT model (`SZTAKI-HLT/hubert-base-cc`), trained on
|
18 |
|
19 |
## Eval results
|
20 |
|
21 |
| Class | Precision | Recall | F-Score |
|
22 |
|-----|------------|------------|------|
|
|
|
|
|
|
|
|
|
|
|
23 |
|
24 |
|
25 |
## Usage
|
@@ -27,8 +55,8 @@ Fine-tuned version of the original huBERT model (`SZTAKI-HLT/hubert-base-cc`), t
|
|
27 |
```py
|
28 |
from transformers import AutoTokenizer, AutoModelForSequenceClassification
|
29 |
|
30 |
-
tokenizer = AutoTokenizer.from_pretrained("")
|
31 |
-
model = AutoModelForSequenceClassification.from_pretrained("")
|
32 |
```
|
33 |
|
34 |
### BibTeX entry and citation info
|
|
|
1 |
---
|
2 |
license: apache-2.0
|
3 |
+
language:
|
4 |
+
- hu
|
5 |
+
metrics:
|
6 |
+
- accuracy
|
7 |
+
model-index:
|
8 |
+
- name: huBERTPlain
|
9 |
+
results:
|
10 |
+
- task:
|
11 |
+
type: text-classification
|
12 |
+
metrics:
|
13 |
+
- type: f1
|
14 |
+
value: 0.91
|
15 |
+
widget:
|
16 |
+
- text: "A vegetációs időben az országban rendszeresen jelentkező jégesők ellen is van mód védekezni lokálisan, ki-ki a saját nagy értékű ültetvényén."
|
17 |
+
example_title: "Positive"
|
18 |
+
|
19 |
+
- text: "Magyarország több évtizede küzd demográfiai válsággal, és egyre több gyermekre vágyó pár meddőségi problémákkal néz szembe."
|
20 |
+
exmaple_title: "Negative"
|
21 |
+
|
22 |
+
- text: "Tisztelt fideszes, KDNP-s Képviselőtársaim!"
|
23 |
+
example_title: "Neutral"
|
24 |
+
|
25 |
---
|
26 |
|
27 |
## Model description
|
28 |
|
29 |
+
Cased fine-tuned BERT model for Hungarian, trained on (manuallay anniated) parliamentary pre-agenda speeches scraped from `parlament.hu`.
|
30 |
|
31 |
## Intended uses & limitations
|
32 |
|
33 |
+
The model can be used as any other (cased) BERT model. It has been tested recognizing positive, negative and neutral sentences in (parliamentary) pre-agenda speeches, where:
|
34 |
+
* 'Label_0': Neutral
|
35 |
+
* 'Label_1': Positive
|
36 |
+
* 'Label_2': Negative
|
37 |
|
38 |
## Training
|
39 |
|
40 |
+
Fine-tuned version of the original huBERT model (`SZTAKI-HLT/hubert-base-cc`), trained on HunEmPoli corpus.
|
41 |
|
42 |
## Eval results
|
43 |
|
44 |
| Class | Precision | Recall | F-Score |
|
45 |
|-----|------------|------------|------|
|
46 |
+
|Neutral|0.83|0.71|0.76|
|
47 |
+
|Positive|0.87|0.91|0.9|
|
48 |
+
|Negative|0.94|0.91|0.93|
|
49 |
+
|Macro AVG|0.88|0.85|0.86|
|
50 |
+
|Weighted WVG|0.91|0.91|0.91|
|
51 |
|
52 |
|
53 |
## Usage
|
|
|
55 |
```py
|
56 |
from transformers import AutoTokenizer, AutoModelForSequenceClassification
|
57 |
|
58 |
+
tokenizer = AutoTokenizer.from_pretrained("poltextlab/HunEmBERT3")
|
59 |
+
model = AutoModelForSequenceClassification.from_pretrained("poltextlab/HunEmBERT3")
|
60 |
```
|
61 |
|
62 |
### BibTeX entry and citation info
|