Edit model card

This model was trained for multilingual toxicity labeling. Label_1 means TOXIC, Label_0 means NOT TOXIC.

The model was fine-tuned based off the xlm_roberta_base model for 4 languages: EN, RU, FR, DE

The validation accuracy is 92%.

The model was finetuned on the total sum of 100933k sentences. The train data for English and Russian came from https://github.com/s-nlp/multilingual_detox, French data comprised the translated to French data from https://github.com/s-nlp/multilingual_detox as well as all the French data from the Jigsaw dataset, the German data was similarly composed using translations and semi-manual data collection techniquies, in particular for offensive words and phrases were crawled the dict.cc dictionary (https://www.dict.cc/) and the Reverso Context (https://context.reverso.net/translation/).

Downloads last month
855
Safetensors
Model size
278M params
Tensor type
I64
·
F32
·
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.