File size: 1,854 Bytes
b4f4823
 
 
 
 
 
e87bb77
b4f4823
 
e87bb77
b4f4823
e87bb77
60a5b1e
b4f4823
 
 
 
60a5b1e
e87bb77
 
b4f4823
 
 
60a5b1e
 
 
 
 
 
b4f4823
 
 
 
 
e87bb77
 
 
60a5b1e
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
---
thumbnail: https://huggingface.co/front/thumbnails/dialogpt.png
language:
- en
license: cc-by-4.0
tags:
- text classification
- transformers
datasets:
- PCL
metrics:
- F1
inference: false
---

## T5Base-PCL
This is a fine-tuned model of T5 (base) on the patronizing and condenscending language (PCL) dataset by Pérez-Almendros et al (2020) used for Task 4 competition of SemEval-2022.
It is intended to be used as a classification model for identifying PCL (0 - neg; 1 - pos). The task prefix we used for the T5 model is 'classification: '.

The dataset it's trained on is limited in scope, as it covers only some news texts covering about 20 English-speaking countries.
The macro F1 score achieved on the test set, based on the official evaluation, is 0.5452.
More information about the original pre-trained model can be found [here](https://huggingface.co/t5-base)

* Classification examples:
|Prediction | Input |
|---------|------------|
|0 | "selective kindness : in europe , some refugees are more equal than others" |
|1 | he said their efforts should not stop only at creating many graduates but also extended to students from poor families so that they could break away from the cycle of poverty |

### How to use

```python
from transformers import T5ForConditionalGeneration, T5Tokenizer
import torch
tokenizer = T5Tokenizer.from_pretrained("tosin/pcl_22")
model = T5ForConditionalGeneration.from_pretrained("tosin/pcl_22")
tokenizer.pad_token = tokenizer.eos_token
input_ids = tokenizer("he said their efforts should not stop only at creating many graduates but also extended to students from poor families so that they could break away from the cycle of poverty", padding=True, truncation=True, return_tensors='pt').input_ids
outputs = model.generate(input_ids)
pred = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(pred)