GroupSHAP-Guided Integration of Financial News Keywords and Technical Indicators for Stock Price Prediction
- URL: http://arxiv.org/abs/2510.23112v3
- Date: Mon, 03 Nov 2025 13:06:41 GMT
- Title: GroupSHAP-Guided Integration of Financial News Keywords and Technical Indicators for Stock Price Prediction
- Authors: Minjoo Kim, Jinwoong Kim, Sangjin Park,
- Abstract summary: GroupSHAP quantifies contributions of semantically related keyword groups rather than individual tokens.<n>We employed FinBERT to embed news articles from 2015 to 2024, clustered them into coherent semantic groups, and applied GroupSHAP to measure each group's contribution to stock price movements.<n> Empirical results from one-day-ahead forecasting of the S&P 500 index throughout 2024 demonstrate that our approach achieves a 32.2% reduction in MAE and a 40.5% reduction in RMSE.
- Score: 5.287763385823119
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Recent advances in finance-specific language models such as FinBERT have enabled the quantification of public sentiment into index-based measures, yet compressing diverse linguistic signals into single metrics overlooks contextual nuances and limits interpretability. To address this limitation, explainable AI techniques, particularly SHAP (SHapley Additive Explanations), have been employed to identify influential features. However, SHAP's computational cost grows exponentially with input features, making it impractical for large-scale text-based financial data. This study introduces a GRU-based forecasting framework enhanced with GroupSHAP, which quantifies contributions of semantically related keyword groups rather than individual tokens, substantially reducing computational burden while preserving interpretability. We employed FinBERT to embed news articles from 2015 to 2024, clustered them into coherent semantic groups, and applied GroupSHAP to measure each group's contribution to stock price movements. The resulting group-level SHAP variables across multiple topics were used as input features for the prediction model. Empirical results from one-day-ahead forecasting of the S&P 500 index throughout 2024 demonstrate that our approach achieves a 32.2% reduction in MAE and a 40.5% reduction in RMSE compared with benchmark models without the GroupSHAP mechanism. This research presents the first application of GroupSHAP in news-driven financial forecasting, showing that grouped sentiment representations simultaneously enhance interpretability and predictive performance.
Related papers
- IKNet: Interpretable Stock Price Prediction via Keyword-Guided Integration of News and Technical Indicators [3.5795275871379704]
We propose an interpretable keyword-guided network (IKNet) to model the semantic association between individual news keywords and stock price movements.<n>IKNet identifies salient keywords via FinBERTbased contextual analysis, processes each embedding through a separate nonlinear projection layer, and integrates their representations with the time-series data of technical indicators to forecast next-day closing prices.<n> Empirical evaluations of S&P 500 data from 2015 to 2024 demonstrate that IKNet outperforms baselines, including recurrent neural networks and transformer models, reducing RMSE by up to 32.9% and improving cumulative returns by 18.5%.
arXiv Detail & Related papers (2025-10-09T01:30:30Z) - FinDPO: Financial Sentiment Analysis for Algorithmic Trading through Preference Optimization of LLMs [2.06242362470764]
We introduce FinDPO, the first finance-specific sentiment analysis framework based on post-training human preference alignment.<n>The proposed FinDPO achieves state-of-the-art performance on standard sentiment classification benchmarks.<n>We show that FinDPO is the first sentiment-based approach to maintain substantial positive returns of 67% annually and strong risk-adjusted performance.
arXiv Detail & Related papers (2025-07-24T13:57:05Z) - Interpretable Machine Learning for Macro Alpha: A News Sentiment Case Study [1.57731592348751]
We process the Global Database of Events, Language, and Tone (GDELT) Project's worldwide news feed using FinBERT.<n>We construct daily sentiment indices incorporating mean tone, dispersion, and event impact.<n>These indices drive an XGBoost, benchmarked against logistic regression, to predict next-day returns.
arXiv Detail & Related papers (2025-05-22T02:24:45Z) - FinTSB: A Comprehensive and Practical Benchmark for Financial Time Series Forecasting [58.70072722290475]
Financial time series (FinTS) record the behavior of human-brain-augmented decision-making.<n>FinTSB is a comprehensive and practical benchmark for financial time series forecasting.
arXiv Detail & Related papers (2025-02-26T05:19:16Z) - Consistency Checks for Language Model Forecasters [54.62507816753479]
We measure the performance of forecasters in terms of the consistency of their predictions on different logically-related questions.<n>We build an automated evaluation system that generates a set of base questions, instantiates consistency checks from these questions, elicits predictions of the forecaster, and measures the consistency of the predictions.
arXiv Detail & Related papers (2024-12-24T16:51:35Z) - BreakGPT: Leveraging Large Language Models for Predicting Asset Price Surges [55.2480439325792]
This paper introduces BreakGPT, a novel large language model (LLM) architecture adapted specifically for time series forecasting and the prediction of sharp upward movements in asset prices.
We showcase BreakGPT as a promising solution for financial forecasting with minimal training and as a strong competitor for capturing both local and global temporal dependencies.
arXiv Detail & Related papers (2024-11-09T05:40:32Z) - Enhanced forecasting of stock prices based on variational mode decomposition, PatchTST, and adaptive scale-weighted layer [1.9635048365486127]
This study introduces a novel composite forecasting framework that integrates variational mode decomposition (VMD), PatchTST, and adaptive scale-weighted layer (ASWL)
The VMD-PatchTST-ASWL framework demonstrates significant improvements in forecasting accuracy compared to traditional models.
This innovative approach provides a powerful tool for stock index price forecasting, with potential applications in various financial analysis and investment decision-making contexts.
arXiv Detail & Related papers (2024-08-29T17:00:47Z) - Harnessing Earnings Reports for Stock Predictions: A QLoRA-Enhanced LLM Approach [6.112119533910774]
This paper introduces an advanced approach by employing Large Language Models (LLMs) instruction fine-tuned with a novel combination of instruction-based techniques and quantized low-rank adaptation (QLoRA) compression.
Our methodology integrates 'base factors', such as financial metric growth and earnings transcripts, with 'external factors', including recent market indices performances and analyst grades, to create a rich, supervised dataset.
This study not only demonstrates the power of integrating cutting-edge AI with fine-tuned financial data but also paves the way for future research in enhancing AI-driven financial analysis tools.
arXiv Detail & Related papers (2024-08-13T04:53:31Z) - AlphaFin: Benchmarking Financial Analysis with Retrieval-Augmented Stock-Chain Framework [48.3060010653088]
We release AlphaFin datasets, combining traditional research datasets, real-time financial data, and handwritten chain-of-thought (CoT) data.
We then use AlphaFin datasets to benchmark a state-of-the-art method, called Stock-Chain, for effectively tackling the financial analysis task.
arXiv Detail & Related papers (2024-03-19T09:45:33Z) - Can ChatGPT Forecast Stock Price Movements? Return Predictability and Large Language Models [51.3422222472898]
We document the capability of large language models (LLMs) like ChatGPT to predict stock price movements using news headlines.
We develop a theoretical model incorporating information capacity constraints, underreaction, limits-to-arbitrage, and LLMs.
arXiv Detail & Related papers (2023-04-15T19:22:37Z) - Measuring Fairness of Text Classifiers via Prediction Sensitivity [63.56554964580627]
ACCUMULATED PREDICTION SENSITIVITY measures fairness in machine learning models based on the model's prediction sensitivity to perturbations in input features.
We show that the metric can be theoretically linked with a specific notion of group fairness (statistical parity) and individual fairness.
arXiv Detail & Related papers (2022-03-16T15:00:33Z) - Three Steps to Multimodal Trajectory Prediction: Modality Clustering,
Classification and Synthesis [54.249502356251085]
We present a novel insight along with a brand-new prediction framework.
Our proposed method surpasses state-of-the-art works even without introducing social and map information.
arXiv Detail & Related papers (2021-03-14T06:21:03Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.