Related papers: Will LLMs be Professional at Fund Investment? DeepFund: A Live Arena Perspective

Will LLMs be Professional at Fund Investment? DeepFund: A Live Arena Perspective

URL: http://arxiv.org/abs/2503.18313v2
Date: Thu, 26 Jun 2025 03:57:07 GMT
Title: Will LLMs be Professional at Fund Investment? DeepFund: A Live Arena Perspective
Authors: Changlun Li, Yao Shi, Yuyu Luo, Nan Tang,
Abstract summary: Large Language Models (LLMs) have demonstrated impressive capabilities across various domains, but their effectiveness in financial decision-making remains inadequately evaluated.<n>We introduce DeepFund, a comprehensive arena platform for evaluating LLM-based trading strategies in a live environment.<n>Our approach implements a multi-agent framework where they serve as multiple key roles that realize the real-world investment decision processes.
Score: 10.932591941137698
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Large Language Models (LLMs) have demonstrated impressive capabilities across various domains, but their effectiveness in financial decision-making remains inadequately evaluated. Current benchmarks primarily assess LLMs' understanding on financial documents rather than the ability to manage assets or dig out trading opportunities in dynamic market conditions. Despite the release of new benchmarks for evaluating diversified tasks on the financial domain, we identified four major problems in these benchmarks, which are data leakage, navel-gazing, over-intervention, and maintenance-hard. To pave the research gap, we introduce DeepFund, a comprehensive arena platform for evaluating LLM-based trading strategies in a live environment. Our approach implements a multi-agent framework where they serve as multiple key roles that realize the real-world investment decision processes. Moreover, we provide a web interface that visualizes LLMs' performance with fund investment metrics across different market conditions, enabling detailed comparative analysis. Through DeepFund, we aim to provide a more realistic and fair assessment on LLM's capabilities in fund investment, offering diversified insights and revealing their potential applications in real-world financial markets. Our code is publicly available at https://github.com/HKUSTDial/DeepFund.

Related papers

Your AI, Not Your View: The Bias of LLMs in Investment Analysis [55.328782443604986]
Large Language Models (LLMs) face frequent knowledge conflicts due to discrepancies between pre-trained parametric knowledge and real-time market data.<n>This paper offers the first quantitative analysis of confirmation bias in LLM-based investment analysis.<n>We observe a consistent preference for large-cap stocks and contrarian strategies across most models.
arXiv Detail & Related papers (2025-07-28T16:09:38Z)
On the Performance of LLMs for Real Estate Appraisal [5.812129569528997]
This study examines how Large Language Models (LLMs) can democratize access to real estate insights by generating competitive and interpretable house price estimates.<n>We evaluate leading LLMs on diverse international housing datasets, comparing zero-shot, few-shot, market report-enhanced, and hybrid prompting techniques.<n>Our results show that LLMs effectively leverage hedonic variables, such as property size and amenities, to produce meaningful estimates.
arXiv Detail & Related papers (2025-06-13T14:14:40Z)
Time Travel is Cheating: Going Live with DeepFund for Real-Time Fund Investment Benchmarking [12.837781884216227]
Large Language Models (LLMs) have demonstrated notable capabilities across financial tasks.<n>Their real-world effectiveness in managing complex fund investment remains inadequately assessed.<n>We introduce DeepFund, a live fund benchmark tool designed to rigorously evaluate LLM in real-time market conditions.
arXiv Detail & Related papers (2025-05-16T10:00:56Z)
Bridging Language Models and Financial Analysis [49.361943182322385]
The rapid advancements in Large Language Models (LLMs) have unlocked transformative possibilities in natural language processing. Financial data is often embedded in intricate relationships across textual content, numerical tables, and visual charts. Despite the fast pace of innovation in LLM research, there remains a significant gap in their practical adoption within the finance industry.
arXiv Detail & Related papers (2025-03-14T01:35:20Z)
LLM-Powered Multi-Agent System for Automated Crypto Portfolio Management [9.9661459222949]
We propose an explainable, multi-modal, multi-agent framework for cryptocurrency investment.<n>Our framework uses specialized agents that collaborate within and across teams to handle subtasks such as data analysis, literature integration, and investment decision-making.
arXiv Detail & Related papers (2025-01-01T13:08:17Z)
INVESTORBENCH: A Benchmark for Financial Decision-Making Tasks with LLM-based Agent [15.562784986263654]
InvestorBench is a benchmark for evaluating large language model (LLM)-based agents in financial decision-making contexts.<n>It provides a comprehensive suite of tasks applicable to different financial products, including single equities like stocks, cryptocurrencies and exchange-traded funds (ETFs)<n>We also assess the reasoning and decision-making capabilities of our agent framework using thirteen different LLMs as backbone models.
arXiv Detail & Related papers (2024-12-24T05:22:33Z)
AI in Investment Analysis: LLMs for Equity Stock Ratings [0.2916558661202724]
This paper explores the application of Large Language Models (LLMs) to generate multi-horizon stock ratings. Our study addresses these issues by leveraging LLMs to improve the accuracy and consistency of stock ratings. Our results show that our benchmark method outperforms traditional stock rating methods when assessed by forward returns.
arXiv Detail & Related papers (2024-10-30T15:06:57Z)
Understanding the Role of LLMs in Multimodal Evaluation Benchmarks [77.59035801244278]
This paper investigates the role of the Large Language Model (LLM) backbone in Multimodal Large Language Models (MLLMs) evaluation. Our study encompasses four diverse MLLM benchmarks and eight state-of-the-art MLLMs. Key findings reveal that some benchmarks allow high performance even without visual inputs and up to 50% of error rates can be attributed to insufficient world knowledge in the LLM backbone.
arXiv Detail & Related papers (2024-10-16T07:49:13Z)
Financial Statement Analysis with Large Language Models [0.0]
We provide standardized and anonymous financial statements to GPT4 and instruct the model to analyze them.<n>The model outperforms financial analysts in its ability to predict earnings changes directionally.<n>Our trading strategies based on GPT's predictions yield a higher Sharpe ratio and alphas than strategies based on other models.
arXiv Detail & Related papers (2024-07-25T08:36:58Z)
When AI Meets Finance (StockAgent): Large Language Model-based Stock Trading in Simulated Real-world Environments [55.19252983108372]
We have developed a multi-agent AI system called StockAgent, driven by LLMs. The StockAgent allows users to evaluate the impact of different external factors on investor trading. It avoids the test set leakage issue present in existing trading simulation systems based on AI Agents.
arXiv Detail & Related papers (2024-07-15T06:49:30Z)
The Economic Implications of Large Language Model Selection on Earnings and Return on Investment: A Decision Theoretic Model [0.0]
We use a decision-theoretic approach to compare the financial impact of different language models. The study reveals how the superior accuracy of more expensive models can, under certain conditions, justify a greater investment. This article provides a framework for companies looking to optimize their technology choices.
arXiv Detail & Related papers (2024-05-27T20:08:41Z)
AlphaFin: Benchmarking Financial Analysis with Retrieval-Augmented Stock-Chain Framework [48.3060010653088]
We release AlphaFin datasets, combining traditional research datasets, real-time financial data, and handwritten chain-of-thought (CoT) data. We then use AlphaFin datasets to benchmark a state-of-the-art method, called Stock-Chain, for effectively tackling the financial analysis task.
arXiv Detail & Related papers (2024-03-19T09:45:33Z)
Revolutionizing Finance with LLMs: An Overview of Applications and Insights [45.660896719456886]
Large Language Models (LLMs) like ChatGPT have seen considerable advancements and have been applied in diverse fields.<n>These models are being utilized for automating financial report generation, forecasting market trends, analyzing investor sentiment, and offering personalized financial advice.
arXiv Detail & Related papers (2024-01-22T01:06:17Z)
Survey on Factuality in Large Language Models: Knowledge, Retrieval and Domain-Specificity [61.54815512469125]
This survey addresses the crucial issue of factuality in Large Language Models (LLMs) As LLMs find applications across diverse domains, the reliability and accuracy of their outputs become vital.
arXiv Detail & Related papers (2023-10-11T14:18:03Z)
Enhancing Financial Sentiment Analysis via Retrieval Augmented Large Language Models [11.154814189699735]
Large Language Models (LLMs) pre-trained on extensive corpora have demonstrated superior performance across various NLP tasks. We introduce a retrieval-augmented LLMs framework for financial sentiment analysis. Our approach achieves 15% to 48% performance gain in accuracy and F1 score.
arXiv Detail & Related papers (2023-10-06T05:40:23Z)
Large Language Models in Finance: A Survey [12.243277149505364]
Large language models (LLMs) have opened new possibilities for artificial intelligence applications in finance. Recent advances in large language models (LLMs) have opened new possibilities for artificial intelligence applications in finance.
arXiv Detail & Related papers (2023-09-28T06:04:04Z)
Can ChatGPT Forecast Stock Price Movements? Return Predictability and Large Language Models [51.3422222472898]
We document the capability of large language models (LLMs) like ChatGPT to predict stock price movements using news headlines. We develop a theoretical model incorporating information capacity constraints, underreaction, limits-to-arbitrage, and LLMs.
arXiv Detail & Related papers (2023-04-15T19:22:37Z)

This list is automatically generated from the titles and abstracts of the papers in this site.