Related papers: When Agents Trade: Live Multi-Market Trading Benchmark for LLM Agents

When Agents Trade: Live Multi-Market Trading Benchmark for LLM Agents

URL: http://arxiv.org/abs/2510.11695v2
Date: Thu, 30 Oct 2025 02:09:43 GMT
Title: When Agents Trade: Live Multi-Market Trading Benchmark for LLM Agents
Authors: Lingfei Qian, Xueqing Peng, Yan Wang, Vincent Jim Zhang, Huan He, Hanley Smith, Yi Han, Yueru He, Haohang Li, Yupeng Cao, Yangyang Yu, Alejandro Lopez-Lira, Peng Lu, Jian-Yun Nie, Guojun Xiong, Jimin Huang, Sophia Ananiadou,
Abstract summary: Agent Market Arena (AMA) is the first lifelong, real-time benchmark for evaluating Large Language Model (LLM)-based trading agents.<n>AMA integrates verified trading data, expert-checked news, and diverse agent architectures within a unified trading framework.<n>It evaluates agents across GPT-4o, GPT-4.1, Claude-3.5-haiku, Claude-sonnet-4, and Gemini-2.0-flash.
Score: 74.55061622246824
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Although Large Language Model (LLM)-based agents are increasingly used in financial trading, it remains unclear whether they can reason and adapt in live markets, as most studies test models instead of agents, cover limited periods and assets, and rely on unverified data. To address these gaps, we introduce Agent Market Arena (AMA), the first lifelong, real-time benchmark for evaluating LLM-based trading agents across multiple markets. AMA integrates verified trading data, expert-checked news, and diverse agent architectures within a unified trading framework, enabling fair and continuous comparison under real conditions. It implements four agents, including InvestorAgent as a single-agent baseline, TradeAgent and HedgeFundAgent with different risk styles, and DeepFundAgent with memory-based reasoning, and evaluates them across GPT-4o, GPT-4.1, Claude-3.5-haiku, Claude-sonnet-4, and Gemini-2.0-flash. Live experiments on both cryptocurrency and stock markets demonstrate that agent frameworks display markedly distinct behavioral patterns, spanning from aggressive risk-taking to conservative decision-making, whereas model backbones contribute less to outcome variation. AMA thus establishes a foundation for rigorous, reproducible, and continuously evolving evaluation of financial reasoning and trading intelligence in LLM-based agents.

Related papers

AgenticPay: A Multi-Agent LLM Negotiation System for Buyer-Seller Transactions [49.49718899185783]
Large language model (LLM)-based agents are increasingly expected to negotiate, coordinate, and transact autonomously.<n>We introduce AgenticPay, a benchmark and simulation framework for multi-agent buyer-seller negotiation driven by natural language.
arXiv Detail & Related papers (2026-02-05T18:50:36Z)
TradeTrap: Are LLM-based Trading Agents Truly Reliable and Faithful? [44.01987401527335]
TradeTrap is a unified evaluation framework for systematically stress-testing both adaptive and procedural autonomous trading agents.<n>It targets four core components of autonomous trading agents: market intelligence, strategy formulation, portfolio and ledger handling, and trade execution.<n>Experiments show that small perturbations at a single component can propagate through the agent decision loop and induce extreme concentration, runaway exposure, and large portfolio drawdowns.
arXiv Detail & Related papers (2025-12-01T23:06:42Z)
QuantAgents: Towards Multi-agent Financial System via Simulated Trading [40.636918662488505]
QuantAgents is a multi-agent system integrating simulated trading.<n> QuantAgents comprises four agents: a simulated trading analyst, a risk control analyst, a market news analyst, and a manager.<n>Our system incentivizes agents to receive feedback on two fronts: performance in real-world markets and predictive accuracy in simulated trading.
arXiv Detail & Related papers (2025-10-06T09:45:57Z)
StockBench: Can LLM Agents Trade Stocks Profitably In Real-world Markets? [44.10622904101254]
Large language models (LLMs) have recently demonstrated strong capabilities as autonomous agents.<n>We introduce StockBench, a benchmark designed to evaluate LLM agents in realistic, multi-month stock trading environments.<n>Our evaluation shows that while most LLM agents struggle to outperform the simple buy-and-hold baseline, several models demonstrate the potential to deliver higher returns and manage risk more effectively.
arXiv Detail & Related papers (2025-10-02T16:54:57Z)
Agent Trading Arena: A Study on Numerical Understanding in LLM-Based Agents [69.58565132975504]
Large language models (LLMs) have demonstrated remarkable capabilities in natural language tasks.<n>We present the Agent Trading Arena, a virtual zero-sum stock market in which LLM-based agents engage in competitive multi-agent trading.
arXiv Detail & Related papers (2025-02-25T08:41:01Z)
TradingAgents: Multi-Agents LLM Financial Trading Framework [4.293484524693143]
TradingAgents proposes a novel stock trading framework inspired by trading firms.<n>It features LLM-powered agents in specialized roles such as fundamental analysts, sentiment analysts, technical analysts, and traders with varied risk profiles.<n>By simulating a dynamic, collaborative trading environment, this framework aims to improve trading performance.
arXiv Detail & Related papers (2024-12-28T12:54:06Z)
When AI Meets Finance (StockAgent): Large Language Model-based Stock Trading in Simulated Real-world Environments [55.19252983108372]
We have developed a multi-agent AI system called StockAgent, driven by LLMs. The StockAgent allows users to evaluate the impact of different external factors on investor trading. It avoids the test set leakage issue present in existing trading simulation systems based on AI Agents.
arXiv Detail & Related papers (2024-07-15T06:49:30Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.