--- license: openrail --- # StackOverflow-RoBERTa-base for Sentiment Analysis on Software Engineering Texts This is a RoBERTa-base model for sentiment analysis on software engineering texts. It is re-finetuned from [cardiffnlp/twitter-roberta-base-sentiment](https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment) with [StackOverflow4423](https://arxiv.org/abs/1709.02984) dataset. You can access the demo [here](https://huggingface.co/spaces/Cloudy1225/stackoverflow-sentiment-analysis). ## Example of Pipeline ```python from transformers import pipeline MODEL = 'Cloudy1225/stackoverflow-roberta-base-sentiment' sentiment_task = pipeline(task="sentiment-analysis", model=MODEL) sentiment_task(["Excellent, happy to help!", "This can probably be done using JavaScript.", "Yes, but it's tricky, since datetime parsing in SQL is a pain in the neck."]) ``` [{'label': 'positive', 'score': 0.9997847676277161}, {'label': 'neutral', 'score': 0.999783456325531}, {'label': 'negative', 'score': 0.9996368885040283}] ## Example of Classification ```python from scipy.special import softmax from transformers import AutoTokenizer, AutoModelForSequenceClassification def preprocess(text): """Preprocess text (username and link placeholders)""" new_text = [] for t in text.split(' '): t = '@user' if t.startswith('@') and len(t) > 1 else t t = 'http' if t.startswith('http') else t new_text.append(t) return ' '.join(new_text).strip() MODEL = 'Cloudy1225/stackoverflow-roberta-base-sentiment' tokenizer = AutoTokenizer.from_pretrained(MODEL) model = AutoModelForSequenceClassification.from_pretrained(MODEL) text = "Excellent, happy to help!" text = preprocess(text) encoded_input = tokenizer(text, return_tensors='pt') output = model(**encoded_input) scores = output[0][0].detach().numpy() scores = softmax(scores) print("negative", scores[0]) print("neutral", scores[1]) print("positive", scores[2]) ``` negative 0.00015578205 neutral 5.9470447e-05 positive 0.99978495