Related papers: Can LLM-based Financial Investing Strategies Outperform the Market in Long Run?

Can LLM-based Financial Investing Strategies Outperform the Market in Long Run?

URL: http://arxiv.org/abs/2505.07078v2
Date: Tue, 20 May 2025 14:51:24 GMT
Title: Can LLM-based Financial Investing Strategies Outperform the Market in Long Run?
Authors: Weixian Waylon Li, Hyeonjun Kim, Mihai Cucuringu, Tiejun Ma,
Abstract summary: Large Language Models (LLMs) have been leveraged for asset pricing tasks and stock trading applications, enabling AI agents to generate investment decisions from unstructured financial data.<n>We critically assess their generalizability and robustness by proposing FINSABER, a backtesting framework evaluating timing-based strategies across longer periods and a larger universe of symbols.
Score: 5.968528974532717
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Large Language Models (LLMs) have recently been leveraged for asset pricing tasks and stock trading applications, enabling AI agents to generate investment decisions from unstructured financial data. However, most evaluations of LLM timing-based investing strategies are conducted on narrow timeframes and limited stock universes, overstating effectiveness due to survivorship and data-snooping biases. We critically assess their generalizability and robustness by proposing FINSABER, a backtesting framework evaluating timing-based strategies across longer periods and a larger universe of symbols. Systematic backtests over two decades and 100+ symbols reveal that previously reported LLM advantages deteriorate significantly under broader cross-section and over a longer-term evaluation. Our market regime analysis further demonstrates that LLM strategies are overly conservative in bull markets, underperforming passive benchmarks, and overly aggressive in bear markets, incurring heavy losses. These findings highlight the need to develop LLM strategies that are able to prioritise trend detection and regime-aware risk controls over mere scaling of framework complexity.

Related papers

Language Model Guided Reinforcement Learning in Quantitative Trading [0.0]
Large language models (LLMs) have recently demonstrated strategic reasoning and multi-modal financial signal interpretation.<n>We propose a hybrid system where LLMs generate high-level trading strategies to guide RL agents in their actions.
arXiv Detail & Related papers (2025-08-04T12:52:11Z)
Your AI, Not Your View: The Bias of LLMs in Investment Analysis [55.328782443604986]
Large Language Models (LLMs) face frequent knowledge conflicts due to discrepancies between pre-trained parametric knowledge and real-time market data.<n>This paper offers the first quantitative analysis of confirmation bias in LLM-based investment analysis.<n>We observe a consistent preference for large-cap stocks and contrarian strategies across most models.
arXiv Detail & Related papers (2025-07-28T16:09:38Z)
Time Travel is Cheating: Going Live with DeepFund for Real-Time Fund Investment Benchmarking [12.837781884216227]
Large Language Models (LLMs) have demonstrated notable capabilities across financial tasks.<n>Their real-world effectiveness in managing complex fund investment remains inadequately assessed.<n>We introduce DeepFund, a live fund benchmark tool designed to rigorously evaluate LLM in real-time market conditions.
arXiv Detail & Related papers (2025-05-16T10:00:56Z)
Cross-Asset Risk Management: Integrating LLMs for Real-Time Monitoring of Equity, Fixed Income, and Currency Markets [30.815524322885754]
Large language models (LLMs) have emerged as powerful tools in the field of finance.<n>We introduce a Cross-Asset Risk Management framework that utilizes LLMs to facilitate real-time monitoring of equity, fixed income, and currency markets.
arXiv Detail & Related papers (2025-04-05T22:28:35Z)
Do We Truly Need So Many Samples? Multi-LLM Repeated Sampling Efficiently Scales Test-Time Compute [54.22256089592864]
This paper presents a simple, effective, and cost-efficient strategy to improve LLM performance by scaling test-time compute.<n>Our strategy builds upon the repeated-sampling-then-voting framework, with a novel twist: incorporating multiple models, even weaker ones, to leverage their complementary strengths.
arXiv Detail & Related papers (2025-04-01T13:13:43Z)
DeepFund: Will LLM be Professional at Fund Investment? A Live Arena Perspective [10.932591941137698]
This paper introduces DeepFund, a comprehensive platform for evaluating Large Language Models (LLMs) in a simulated live environment.<n>Our approach implements a multi agent framework where LLMs serve as both analysts and managers, creating a realistic simulation of investment decision making.<n>We provide a web interface that visualizes model performance across different market conditions and investment parameters, enabling detailed comparative analysis.
arXiv Detail & Related papers (2025-03-24T03:32:13Z)
A Deep Reinforcement Learning Approach to Automated Stock Trading, using xLSTM Networks [0.26249027950824505]
This study explores the usage of Extended Long Short Term Memory (xLSTM) network in combination with a deep reinforcement learning (DRL) approach for automated stock trading.<n>Our proposed method utilizes xLSTM networks in both actor and critic components, enabling effective handling of time series data and dynamic market environments.
arXiv Detail & Related papers (2025-03-12T10:56:03Z)
Confident or Seek Stronger: Exploring Uncertainty-Based On-device LLM Routing From Benchmarking to Generalization [61.02719787737867]
Large language models (LLMs) are increasingly deployed and democratized on edge devices.<n>One promising solution is uncertainty-based SLM routing, offloading high-stakes queries to stronger LLMs when resulting in low-confidence responses on SLM.<n>We conduct a comprehensive investigation into benchmarking and generalization of uncertainty-driven routing strategies from SLMs to LLMs over 1500+ settings.
arXiv Detail & Related papers (2025-02-06T18:59:11Z)
Adversarial Reasoning at Jailbreaking Time [49.70772424278124]
We develop an adversarial reasoning approach to automatic jailbreaking via test-time computation.<n>Our approach introduces a new paradigm in understanding LLM vulnerabilities, laying the foundation for the development of more robust and trustworthy AI systems.
arXiv Detail & Related papers (2025-02-03T18:59:01Z)
Reinforcement Learning Framework for Quantitative Trading [0.0]
There is a significant disconnect in the effective utilization of financial indicators to better understand the potential market trends of individual securities. This research endeavors to address these complexities by enhancing the ability of RL agents to effectively differentiate between positive and negative buy/sell actions using financial indicators.
arXiv Detail & Related papers (2024-11-12T06:44:28Z)
Automate Strategy Finding with LLM in Quant Investment [15.475504003134787]
We present a novel three-stage framework leveraging Large Language Models (LLMs) within a risk-aware multi-agent system for automate strategy finding in quantitative finance.<n>Our approach addresses the brittleness of traditional deep learning models in financial applications by: employing prompt-engineered LLMs to generate executable alpha factor candidates across diverse financial data.<n> Experimental results demonstrate the robust performance of the strategy in Chinese & US market regimes compared to established benchmarks.
arXiv Detail & Related papers (2024-09-10T07:42:28Z)
AlphaFin: Benchmarking Financial Analysis with Retrieval-Augmented Stock-Chain Framework [48.3060010653088]
We release AlphaFin datasets, combining traditional research datasets, real-time financial data, and handwritten chain-of-thought (CoT) data. We then use AlphaFin datasets to benchmark a state-of-the-art method, called Stock-Chain, for effectively tackling the financial analysis task.
arXiv Detail & Related papers (2024-03-19T09:45:33Z)
FinBen: A Holistic Financial Benchmark for Large Language Models [75.09474986283394]
FinBen is the first extensive open-source evaluation benchmark, including 36 datasets spanning 24 financial tasks. FinBen offers several key innovations: a broader range of tasks and datasets, the first evaluation of stock trading, novel agent and Retrieval-Augmented Generation (RAG) evaluation, and three novel open-source evaluation datasets for text summarization, question answering, and stock trading.
arXiv Detail & Related papers (2024-02-20T02:16:16Z)
Can ChatGPT Forecast Stock Price Movements? Return Predictability and Large Language Models [51.3422222472898]
We document the capability of large language models (LLMs) like ChatGPT to predict stock price movements using news headlines. We develop a theoretical model incorporating information capacity constraints, underreaction, limits-to-arbitrage, and LLMs.
arXiv Detail & Related papers (2023-04-15T19:22:37Z)
Time your hedge with Deep Reinforcement Learning [0.0]
Deep Reinforcement Learning (DRL) can tackle this challenge by creating a dynamic dependency between market information and hedging strategies allocation decisions. We present a realistic and augmented DRL framework that: (i) uses additional contextual information to decide an action, (ii) has a one period lag between observations and actions to account for one day lag turnover of common asset managers to rebalance their hedge, (iii) is fully tested in terms of stability and robustness thanks to a repetitive train test method called anchored walk forward training, similar in spirit to k fold cross validation for time series and (iv) allows managing leverage of our hedging
arXiv Detail & Related papers (2020-09-16T06:43:41Z)

This list is automatically generated from the titles and abstracts of the papers in this site.