Related papers: LiveTradeBench: Seeking Real-World Alpha with Large Language Models

LiveTradeBench: Seeking Real-World Alpha with Large Language Models

URL: http://arxiv.org/abs/2511.03628v1
Date: Wed, 05 Nov 2025 16:47:26 GMT
Title: LiveTradeBench: Seeking Real-World Alpha with Large Language Models
Authors: Haofei Yu, Fenghai Li, Jiaxuan You,
Abstract summary: Large language models (LLMs) achieve strong performance across benchmarks.<n>These tests occur in static settings, lacking real dynamics and uncertainty.<n>We introduce LiveTradeBench, a live trading environment for evaluating LLM agents in realistic and evolving markets.
Score: 26.976122048323873
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Large language models (LLMs) achieve strong performance across benchmarks--from knowledge quizzes and math reasoning to web-agent tasks--but these tests occur in static settings, lacking real dynamics and uncertainty. Consequently, they evaluate isolated reasoning or problem-solving rather than decision-making under uncertainty. To address this, we introduce LiveTradeBench, a live trading environment for evaluating LLM agents in realistic and evolving markets. LiveTradeBench follows three design principles: (i) Live data streaming of market prices and news, eliminating dependence on offline backtesting and preventing information leakage while capturing real-time uncertainty; (ii) a portfolio-management abstraction that extends control from single-asset actions to multi-asset allocation, integrating risk management and cross-asset reasoning; and (iii) multi-market evaluation across structurally distinct environments--U.S. stocks and Polymarket prediction markets--differing in volatility, liquidity, and information flow. At each step, an agent observes prices, news, and its portfolio, then outputs percentage allocations that balance risk and return. Using LiveTradeBench, we run 50-day live evaluations of 21 LLMs across families. Results show that (1) high LMArena scores do not imply superior trading outcomes; (2) models display distinct portfolio styles reflecting risk appetite and reasoning dynamics; and (3) some LLMs effectively leverage live signals to adapt decisions. These findings expose a gap between static evaluation and real-world competence, motivating benchmarks that test sequential decision making and consistency under live uncertainty.

Related papers

TraderBench: How Robust Are AI Agents in Adversarial Capital Markets? [8.661756660747042]
TraderBench is a benchmark for evaluating AI agents in finance.<n>It combines expert-verified static tasks (knowledge retrieval, analytical reasoning) with adversarial trading simulations.<n>Two novel tracks: crypto trading with four progressive market-manipulation transforms, and options derivatives scoring across P&L accuracy, Greeks, and risk management.
arXiv Detail & Related papers (2026-02-27T20:06:28Z)
AlphaForgeBench: Benchmarking End-to-End Trading Strategy Design with Large Language Models [23.493646150407116]
Current evaluations of real-time trading performance overlook a critical failure mode: severe behavioral instability in sequential decision-making under uncertainty.<n>We propose AlphaForgeBench, a principled framework that reframes Large Language Models (LLMs) as quantitative researchers rather than execution agents.
arXiv Detail & Related papers (2026-02-10T14:29:33Z)
Robust Reinforcement Learning in Finance: Modeling Market Impact with Elliptic Uncertainty Sets [57.179679246370114]
In financial applications, reinforcement learning (RL) agents are commonly trained on historical data, where their actions do not influence prices.<n>During deployment, these agents trade in live markets where their own transactions can shift asset prices, a phenomenon known as market impact.<n>Traditional robust RL approaches address this model misspecification by optimizing the worst-case performance over a set of uncertainties.<n>We develop a novel class of elliptic uncertainty sets, enabling efficient and tractable robust policy evaluation.
arXiv Detail & Related papers (2025-10-22T18:22:25Z)
When Agents Trade: Live Multi-Market Trading Benchmark for LLM Agents [74.55061622246824]
Agent Market Arena (AMA) is the first lifelong, real-time benchmark for evaluating Large Language Model (LLM)-based trading agents.<n>AMA integrates verified trading data, expert-checked news, and diverse agent architectures within a unified trading framework.<n>It evaluates agents across GPT-4o, GPT-4.1, Claude-3.5-haiku, Claude-sonnet-4, and Gemini-2.0-flash.
arXiv Detail & Related papers (2025-10-13T17:54:09Z)
StockBench: Can LLM Agents Trade Stocks Profitably In Real-world Markets? [44.10622904101254]
Large language models (LLMs) have recently demonstrated strong capabilities as autonomous agents.<n>We introduce StockBench, a benchmark designed to evaluate LLM agents in realistic, multi-month stock trading environments.<n>Our evaluation shows that while most LLM agents struggle to outperform the simple buy-and-hold baseline, several models demonstrate the potential to deliver higher returns and manage risk more effectively.
arXiv Detail & Related papers (2025-10-02T16:54:57Z)
Cross-Asset Risk Management: Integrating LLMs for Real-Time Monitoring of Equity, Fixed Income, and Currency Markets [30.815524322885754]
Large language models (LLMs) have emerged as powerful tools in the field of finance.<n>We introduce a Cross-Asset Risk Management framework that utilizes LLMs to facilitate real-time monitoring of equity, fixed income, and currency markets.
arXiv Detail & Related papers (2025-04-05T22:28:35Z)
Agent Trading Arena: A Study on Numerical Understanding in LLM-Based Agents [69.58565132975504]
Large language models (LLMs) have demonstrated remarkable capabilities in natural language tasks.<n>We present the Agent Trading Arena, a virtual zero-sum stock market in which LLM-based agents engage in competitive multi-agent trading.
arXiv Detail & Related papers (2025-02-25T08:41:01Z)
When AI Meets Finance (StockAgent): Large Language Model-based Stock Trading in Simulated Real-world Environments [55.19252983108372]
We have developed a multi-agent AI system called StockAgent, driven by LLMs. The StockAgent allows users to evaluate the impact of different external factors on investor trading. It avoids the test set leakage issue present in existing trading simulation systems based on AI Agents.
arXiv Detail & Related papers (2024-07-15T06:49:30Z)
Reinforcement-Learning based Portfolio Management with Augmented Asset Movement Prediction States [71.54651874063865]
Portfolio management (PM) aims to achieve investment goals such as maximal profits or minimal risks. In this paper, we propose SARL, a novel State-Augmented RL framework for PM. Our framework aims to address two unique challenges in financial PM: (1) data Heterogeneous data -- the collected information for each asset is usually diverse, noisy and imbalanced (e.g., news articles); and (2) environment uncertainty -- the financial market is versatile and non-stationary.
arXiv Detail & Related papers (2020-02-09T08:10:03Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.