File size: 3,742 Bytes
67c3ab6 cc1bbd9 67c3ab6 cc1bbd9 67c3ab6 cc1bbd9 67c3ab6 cc1bbd9 67c3ab6 cc1bbd9 67c3ab6 cc1bbd9 67c3ab6 cc1bbd9 67c3ab6 cc1bbd9 67c3ab6 cc1bbd9 67c3ab6 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 |
---
widget:
- text: "El dólar se dispara tras la reunión de la Fed"
---
# Spanish News Classification Headlines
SNCH: this model was developed by [M47Labs](https://www.m47labs.com/es/) the goal is text classification, the base model use was [BETO](https://huggingface.co/dccuchile/bert-base-spanish-wwm-cased), however this model has not been fine-tuned on any dataset. The objective is to show the performance of this model when is used with the objective of inference without training at all.
## Dataset validation Sample
Dataset size : 1000
Columns: idTask,task content 1,idTag,tag.
|task content|tag|
|------|------|
|Alcalá de Guadaíra celebra la IV Semana de la Diversidad Sexual con acciones de sensibilización|sociedad|
|El Archipiélago Chinijo Graciplus se impone en el Trofeo Centro Comercial Rubicón|deportes|
|Un total de 39 personas padecen ELA actualmente en la provincia|sociedad|
|Eurocopa 2021 : Italia vence a Gales y pasa a octavos con su candidatura reforzada|deportes|
|Resolución de 10 de junio de 2021, del Ayuntamiento de Tarazona de La Mancha (Albacete), referente a la convocatoria para proveer una plaza.|sociedad|
|El primer ministro sueco pierde una moción de censura|politica|
|El dólar se dispara tras la reunión de la Fed|economia|
## Labels:
* ciencia_tecnologia
* clickbait
* cultura
* deportes
* economia
* educacion
* medio_ambiente
* opinion
* politica
* sociedad
## Example of Use
### Pipeline
```{python}
import torch
from transformers import AutoTokenizer, BertForSequenceClassification,TextClassificationPipeline
review_text = 'los vehiculos que esten esperando pasajaeros deberan estar apagados para reducir emisiones'
path = "M47Labs/spanish_news_classification_headlines_untrained"
tokenizer = AutoTokenizer.from_pretrained(path)
model = BertForSequenceClassification.from_pretrained(path)
nlp = TextClassificationPipeline(task = "text-classification",
model = model,
tokenizer = tokenizer)
print(nlp(review_text))
```
```[{'label': 'medio_ambiente', 'score': 0.2834321384291023}]```
### Pytorch
```{python}
import torch
from transformers import AutoTokenizer, BertForSequenceClassification,TextClassificationPipeline
from numpy import np
model_name = 'M47Labs/spanish_news_classification_headlines_untrained'
MAX_LEN = 32
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForSequenceClassification.from_pretrained(model_name)
texto = "las emisiones estan bajando, debido a las medidas ambientales tomadas por el gobierno"
encoded_review = tokenizer.encode_plus(
texto,
max_length=MAX_LEN,
add_special_tokens=True,
#return_token_type_ids=False,
pad_to_max_length=True,
return_attention_mask=True,
return_tensors='pt',
)
input_ids = encoded_review['input_ids']
attention_mask = encoded_review['attention_mask']
output = model(input_ids, attention_mask)
_, prediction = torch.max(output['logits'], dim=1)
print(f'Review text: {texto}')
print(f'Sentiment : {model.config.id2label[prediction.detach().cpu().numpy()[0]]}')
```
```Review text: las emisiones estan bajando, debido a las medidas ambientales tomadas por el gobierno```
```Sentiment : opinion```
A more in depth example on how to use the model can be found in this colab notebook: https://colab.research.google.com/drive/1XsKea6oMyEckye2FePW_XN7Rf8v41Cw_?usp=sharing
## Validation Results
|Full Dataset||
|------|------|
|Accuracy Score|0.362|
|Precision (Macro)|0.21|
|Recall (Macro)|0.22|
![alt text](https://media-exp1.licdn.com/dms/image/C4D0BAQHpfgjEyhtE1g/company-logo_200_200/0/1625210573748?e=1638403200&v=beta&t=toQNpiOlyim5Ja4f7Ejv8yKoCWifMsLWjkC7XnyXICI "Logo M47")
|