tosin
/

pcl_22

Text2Text Generation

text classification

text-generation-inference

Model card Files Files and versions Community

pcl_22 / README.md

oluwatosin adewumi

README remove inference widget

60a5b1e over 2 years ago

|

1.85 kB

	---
	thumbnail: https://huggingface.co/front/thumbnails/dialogpt.png
	language:
	- en
	license: cc-by-4.0
	tags:
	- text classification
	- transformers
	datasets:
	- PCL
	metrics:
	- F1
	inference: false
	---

	## T5Base-PCL
	This is a fine-tuned model of T5 (base) on the patronizing and condenscending language (PCL) dataset by Pérez-Almendros et al (2020) used for Task 4 competition of SemEval-2022.
	It is intended to be used as a classification model for identifying PCL (0 - neg; 1 - pos). The task prefix we used for the T5 model is 'classification: '.

	The dataset it's trained on is limited in scope, as it covers only some news texts covering about 20 English-speaking countries.
	The macro F1 score achieved on the test set, based on the official evaluation, is 0.5452.
	More information about the original pre-trained model can be found [here](https://huggingface.co/t5-base)

	* Classification examples:
	\|Prediction \| Input \|
	\|---------\|------------\|
	\|0 \| "selective kindness : in europe , some refugees are more equal than others" \|
	\|1 \| he said their efforts should not stop only at creating many graduates but also extended to students from poor families so that they could break away from the cycle of poverty \|

	### How to use

	```python
	from transformers import T5ForConditionalGeneration, T5Tokenizer
	import torch
	tokenizer = T5Tokenizer.from_pretrained("tosin/pcl_22")
	model = T5ForConditionalGeneration.from_pretrained("tosin/pcl_22")
	tokenizer.pad_token = tokenizer.eos_token
	input_ids = tokenizer("he said their efforts should not stop only at creating many graduates but also extended to students from poor families so that they could break away from the cycle of poverty", padding=True, truncation=True, return_tensors='pt').input_ids
	outputs = model.generate(input_ids)
	pred = tokenizer.decode(outputs[0], skip_special_tokens=True)
	print(pred)