Related papers: Combining Financial Data and News Articles for Stock Price Movement Prediction Using Large Language Models

Combining Financial Data and News Articles for Stock Price Movement Prediction Using Large Language Models

URL: http://arxiv.org/abs/2411.01368v1
Date: Sat, 02 Nov 2024 21:53:20 GMT
Title: Combining Financial Data and News Articles for Stock Price Movement Prediction Using Large Language Models
Authors: Ali Elahi, Fatemeh Taghvaei,
Abstract summary: We employ Large Language Models (LLMs) to predict market movements. Our dataset contains news articles collected from different sources, historic stock price, and financial report data for 20 companies. By using this model, we predicted the movement of a given stock's price in our dataset with a weighted F1-score of 58.5% and 59.1%.
Score: 0.0
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Predicting financial markets and stock price movements requires analyzing a company's performance, historic price movements, industry-specific events alongside the influence of human factors such as social media and press coverage. We assume that financial reports (such as income statements, balance sheets, and cash flow statements), historical price data, and recent news articles can collectively represent aforementioned factors. We combine financial data in tabular format with textual news articles and employ pre-trained Large Language Models (LLMs) to predict market movements. Recent research in LLMs has demonstrated that they are able to perform both tabular and text classification tasks, making them our primary model to classify the multi-modal data. We utilize retrieval augmentation techniques to retrieve and attach relevant chunks of news articles to financial metrics related to a company and prompt the LLMs in zero, two, and four-shot settings. Our dataset contains news articles collected from different sources, historic stock price, and financial report data for 20 companies with the highest trading volume across different industries in the stock market. We utilized recently released language models for our LLM-based classifier, including GPT- 3 and 4, and LLaMA- 2 and 3 models. We introduce an LLM-based classifier capable of performing classification tasks using combination of tabular (structured) and textual (unstructured) data. By using this model, we predicted the movement of a given stock's price in our dataset with a weighted F1-score of 58.5% and 59.1% and Matthews Correlation Coefficient of 0.175 for both 3-month and 6-month periods.

Related papers

MiMIC: Multi-Modal Indian Earnings Calls Dataset to Predict Stock Prices [0.21301560294088315]
This study investigates the impact of corporate earnings calls on stock prices by introducing a multi-modal predictive model. We leverage textual data from earnings call transcripts, along with images and tables from accompanying presentations, to forecast stock price movements. We present a multimodal analytical framework that integrates quantitative variables with predictive signals derived from textual and visual modalities.
arXiv Detail & Related papers (2025-04-12T15:31:40Z)
FinTMMBench: Benchmarking Temporal-Aware Multi-Modal RAG in Finance [79.78247299859656]
FinTMMBench is the first comprehensive benchmark for evaluating temporal-aware multi-modal Retrieval-Augmented Generation systems in finance. Built from heterologous data of NASDAQ 100 companies, FinTMMBench offers three significant advantages.
arXiv Detail & Related papers (2025-03-07T07:13:59Z)
Multimodal Stock Price Prediction: A Case Study of the Russian Securities Market [0.0]
This paper addresses the problem of forecasting financial asset prices using the multimodal approach that combines candlestick time series and news flow data. A unique dataset was collected, which includes time series for 176 Russian stocks traded on the Moscow Exchange and 79,555 financial news articles in Russian. Experiments showed that incorporating textual modality reduced the MAPE value by 55%.
arXiv Detail & Related papers (2025-03-05T21:20:32Z)
StockTime: A Time Series Specialized Large Language Model Architecture for Stock Price Prediction [13.52020491768311]
We introduce StockTime, a novel LLM-based architecture designed specifically for stock price time series data. Unlike recent FinLLMs, StockTime is specifically designed for stock price time series data. By fusing this multimodal data, StockTime effectively predicts stock prices across arbitrary look-back periods.
arXiv Detail & Related papers (2024-08-25T00:50:33Z)
NIFTY Financial News Headlines Dataset [14.622656548420073]
The NIFTY Financial News Headlines dataset is designed to facilitate and advance research in financial market forecasting using large language models (LLMs) This dataset comprises two distinct versions tailored for different modeling approaches: (i) NIFTY-LM, which targets supervised fine-tuning (SFT) of LLMs with an auto-regressive, causal language-modeling objective, and (ii) NIFTY-RL, formatted specifically for alignment methods (like reinforcement learning from human feedback) to align LLMs via rejection sampling and reward modeling.
arXiv Detail & Related papers (2024-05-16T01:09:33Z)
AlphaFin: Benchmarking Financial Analysis with Retrieval-Augmented Stock-Chain Framework [48.3060010653088]
We release AlphaFin datasets, combining traditional research datasets, real-time financial data, and handwritten chain-of-thought (CoT) data. We then use AlphaFin datasets to benchmark a state-of-the-art method, called Stock-Chain, for effectively tackling the financial analysis task.
arXiv Detail & Related papers (2024-03-19T09:45:33Z)
D\'olares or Dollars? Unraveling the Bilingual Prowess of Financial LLMs Between Spanish and English [67.48541936784501]
Tois'on de Oro is the first framework that establishes instruction datasets, finetuned LLMs, and evaluation benchmark for financial LLMs in Spanish joint with English. We construct a rigorously curated bilingual instruction dataset including over 144K Spanish and English samples from 15 datasets covering 7 tasks. We evaluate our model and existing LLMs using FLARE-ES, the first comprehensive bilingual evaluation benchmark with 21 datasets covering 9 tasks.
arXiv Detail & Related papers (2024-02-12T04:50:31Z)
Embrace Divergence for Richer Insights: A Multi-document Summarization Benchmark and a Case Study on Summarizing Diverse Information from News Articles [136.84278943588652]
We propose a new task of summarizing diverse information encountered in multiple news articles encompassing the same event. To facilitate this task, we outlined a data collection schema for identifying diverse information and curated a dataset named DiverseSumm. The dataset includes 245 news stories, with each story comprising 10 news articles and paired with a human-validated reference.
arXiv Detail & Related papers (2023-09-17T20:28:17Z)
PIXIU: A Large Language Model, Instruction Data and Evaluation Benchmark for Finance [63.51545277822702]
PIXIU is a comprehensive framework including the first financial large language model (LLMs) based on fine-tuning LLaMA with instruction data. We propose FinMA by fine-tuning LLaMA with the constructed dataset to be able to follow instructions for various financial tasks. We conduct a detailed analysis of FinMA and several existing LLMs, uncovering their strengths and weaknesses in handling critical financial tasks.
arXiv Detail & Related papers (2023-06-08T14:20:29Z)
Can ChatGPT Forecast Stock Price Movements? Return Predictability and Large Language Models [51.3422222472898]
We document the capability of large language models (LLMs) like ChatGPT to predict stock price movements using news headlines. We develop a theoretical model incorporating information capacity constraints, underreaction, limits-to-arbitrage, and LLMs.
arXiv Detail & Related papers (2023-04-15T19:22:37Z)
BloombergGPT: A Large Language Model for Finance [42.73350054822628]
We present BloombergGPT, a 50 billion parameter language model that is trained on a wide range of financial data. We construct a 363 billion token dataset based on Bloomberg's extensive data sources, augmented with 345 billion tokens from general purpose datasets. Our mixed dataset training leads to a model that outperforms existing models on financial tasks by significant margins.
arXiv Detail & Related papers (2023-03-30T17:30:36Z)
You can't pick your neighbors, or can you? When and how to rely on retrieval in the $k$NN-LM [65.74934004876914]
Retrieval-enhanced language models (LMs) condition their predictions on text retrieved from large external datastores. One such approach, the $k$NN-LM, interpolates any existing LM's predictions with the output of a $k$-nearest neighbors model. We empirically measure the effectiveness of our approach on two English language modeling datasets.
arXiv Detail & Related papers (2022-10-28T02:57:40Z)
Graph-Based Learning for Stock Movement Prediction with Textual and Relational Data [0.0]
We propose a new stock movement prediction framework: Multi-Graph Recurrent Network for Stock Forecasting (MGRN) This architecture allows to combine the textual sentiment from financial news and multiple relational information extracted from other financial data. Through an accuracy test and a trading simulation on the stocks in the STOXX Europe 600 index, we demonstrate a better performance from our model than other benchmarks.
arXiv Detail & Related papers (2021-07-22T21:57:18Z)
Text Mining of Stocktwits Data for Predicting Stock Prices [3.3554367023486935]
FinALBERT is trained to handle financial domain text classification tasks by labelling Stocktwits text data based on stock price change. We collected Stocktwits data for over ten years for 25 different companies, including the major five FAANG (Facebook, Amazon, Apple, Netflix, Google)
arXiv Detail & Related papers (2021-03-13T03:29:14Z)

This list is automatically generated from the titles and abstracts of the papers in this site.