--- license: mit tags: - sentiment analysis - financial sentiment analysis - bert - text-classification - finance - finbert - financial --- # Trading Hero Financial Sentiment Analysis Model Description: This model is a fine-tuned version of [FinBERT](https://huggingface.co/yiyanghkust/finbert-pretrain), a BERT model pre-trained on financial texts. The fine-tuning process was conducted to adapt the model to specific financial NLP tasks, enhancing its performance on domain-specific applications for sentiment analysis. ## Model Use Primary Users: Financial analysts, NLP researchers, and developers working on financial data. ## Training Data Training Dataset: The model was fine-tuned on a custom dataset of financial communication texts. The dataset was split into training, validation, and test sets as follows: Training Set: 10,918,272 tokens Validation Set: 1,213,184 tokens Test Set: 1,347,968 tokens Pre-training Dataset: FinBERT was pre-trained on a large financial corpus totaling 4.9 billion tokens, including: Corporate Reports (10-K & 10-Q): 2.5 billion tokens Earnings Call Transcripts: 1.3 billion tokens Analyst Reports: 1.1 billion tokens ## Evaluation * Test Accuracy = 0.908469 * Test Precision = 0.927788 * Test Recall = 0.908469 * Test F1 = 0.913267 * **Labels**: 0 -> Neutral; 1 -> Positive; 2 -> Negative ## Usage ``` import torch from transformers import AutoTokenizer, AutoModelForSequenceClassification, pipeline tokenizer = AutoTokenizer.from_pretrained("fuchenru/Trading-Hero-LLM") model = AutoModelForSequenceClassification.from_pretrained("fuchenru/Trading-Hero-LLM") nlp = pipeline("text-classification", model=model, tokenizer=tokenizer) # Preprocess the input text def preprocess(text, tokenizer, max_length=128): inputs = tokenizer(text, truncation=True, padding='max_length', max_length=max_length, return_tensors='pt') return inputs # Function to perform prediction def predict_sentiment(input_text): # Tokenize the input text inputs = tokenizer(input_text, return_tensors="pt", truncation=True, padding=True) # Perform inference with torch.no_grad(): outputs = model(**inputs) # Get predicted label predicted_label = torch.argmax(outputs.logits, dim=1).item() # Map the predicted label to the original labels label_map = {0: 'neutral', 1: 'positive', 2: 'negative'} predicted_sentiment = label_map[predicted_label] return predicted_sentiment stock_news = [ "Market analysts predict a stable outlook for the coming weeks.", "The market remained relatively flat today, with minimal movement in stock prices.", "Investor sentiment improved following news of a potential trade deal.", ....... ] for i in stock_news: predicted_sentiment = predict_sentiment(i) print("Predicted Sentiment:", predicted_sentiment) ``` ``` Predicted Sentiment: neutral Predicted Sentiment: neutral Predicted Sentiment: positive ``` ## Citation ``` @misc{yang2020finbert, title={FinBERT: A Pretrained Language Model for Financial Communications}, author={Yi Yang and Mark Christopher Siy UY and Allen Huang}, year={2020}, eprint={2006.08097}, archivePrefix={arXiv}, } ```