Win Fast or Lose Slow: Balancing Speed and Accuracy in Latency-Sensitive Decisions of LLMs
- URL: http://arxiv.org/abs/2505.19481v1
- Date: Mon, 26 May 2025 04:03:48 GMT
- Title: Win Fast or Lose Slow: Balancing Speed and Accuracy in Latency-Sensitive Decisions of LLMs
- Authors: Hao Kang, Qingru Zhang, Han Cai, Weiyuan Xu, Tushar Krishna, Yilun Du, Tsachy Weissman,
- Abstract summary: Large language models (LLMs) have shown remarkable performance across diverse reasoning and generation tasks.<n>This work presents the first systematic study of this latency quality trade off in real time decision making tasks.<n>We propose FPX, an adaptive framework that dynamically selects model size and quantization level based on real time demands.
- Score: 48.653022530291494
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Large language models (LLMs) have shown remarkable performance across diverse reasoning and generation tasks, and are increasingly deployed as agents in dynamic environments such as code generation and recommendation systems. However, many real-world applications, such as high-frequency trading and real-time competitive gaming, require decisions under strict latency constraints, where faster responses directly translate into higher rewards. Despite the importance of this latency quality trade off, it remains underexplored in the context of LLM based agents. In this work, we present the first systematic study of this trade off in real time decision making tasks. To support our investigation, we introduce two new benchmarks: HFTBench, a high frequency trading simulation, and StreetFighter, a competitive gaming platform. Our analysis reveals that optimal latency quality balance varies by task, and that sacrificing quality for lower latency can significantly enhance downstream performance. To address this, we propose FPX, an adaptive framework that dynamically selects model size and quantization level based on real time demands. Our method achieves the best performance on both benchmarks, improving win rate by up to 80% in Street Fighter and boosting daily yield by up to 26.52% in trading, underscoring the need for latency aware evaluation and deployment strategies for LLM based agents. These results demonstrate the critical importance of latency aware evaluation and deployment strategies for real world LLM based agents. Our benchmarks are available at Latency Sensitive Benchmarks.
Related papers
- Who is a Better Player: LLM against LLM [53.46608216197315]
We propose an adversarial benchmarking framework to assess the comprehensive performance of Large Language Models (LLMs) through board games competition.<n>We introduce Qi Town, a specialized evaluation platform that supports 5 widely played games and involves 20 LLM-driven players.
arXiv Detail & Related papers (2025-08-05T06:41:47Z) - Quality-of-Service Aware LLM Routing for Edge Computing with Multiple Experts [18.479200918676575]
Large Language Models (LLMs) have demonstrated remarkable capabilities, leading to a significant increase in user demand for LLM services.<n>However, cloud-based LLM services often suffer from high latency, unstable responsiveness, and privacy concerns.<n>This paper proposes a novel deep reinforcement learning-based routing framework for sustained high-quality LLM services.
arXiv Detail & Related papers (2025-08-01T00:45:15Z) - LLM-Enhanced Rapid-Reflex Async-Reflect Embodied Agent for Real-Time Decision-Making in Dynamically Changing Environments [6.227284273306464]
Delay in decision making emerges as a crucial yet insufficiently studied issue.<n>We propose a Time Conversion Mechanism (TCM) that translates delays in decision-making into equivalent simulation frames.<n>We present the Rapid-ReReflect Agent (RRARA), which couples a lightweight lightweight LLM-guided feedback module with a rule-based agent to enable immediate reactive behaviors.
arXiv Detail & Related papers (2025-06-08T17:09:26Z) - EconEvals: Benchmarks and Litmus Tests for LLM Agents in Unknown Environments [0.0699049312989311]
We develop benchmarks for LLM agents that act in, learn from, and strategize in unknown environments.<n>We also propose litmus tests, a new kind of quantitative measure for LLMs and LLM agents.
arXiv Detail & Related papers (2025-03-24T16:06:04Z) - MixLLM: Dynamic Routing in Mixed Large Language Models [57.309520357563215]
Large Language Models (LLMs) exhibit potential artificial generic intelligence recently, however, their usage is costly with high response latency.<n>We develop MixLLM, a dynamic contextual-bandit-based routing system for query-LLM assignment.
arXiv Detail & Related papers (2025-02-09T02:26:15Z) - Confident or Seek Stronger: Exploring Uncertainty-Based On-device LLM Routing From Benchmarking to Generalization [61.02719787737867]
Large language models (LLMs) are increasingly deployed and democratized on edge devices.<n>One promising solution is uncertainty-based SLM routing, offloading high-stakes queries to stronger LLMs when resulting in low-confidence responses on SLM.<n>We conduct a comprehensive investigation into benchmarking and generalization of uncertainty-driven routing strategies from SLMs to LLMs over 1500+ settings.
arXiv Detail & Related papers (2025-02-06T18:59:11Z) - Leveraging Online Olympiad-Level Math Problems for LLMs Training and Contamination-Resistant Evaluation [55.21013307734612]
AoPS-Instruct is a dataset of more than 600,000 high-quality QA pairs.<n>LiveAoPSBench is an evolving evaluation set with timestamps, derived from the latest forum data.<n>Our work presents a scalable approach to creating and maintaining large-scale, high-quality datasets for advanced math reasoning.
arXiv Detail & Related papers (2025-01-24T06:39:38Z) - Are Your LLMs Capable of Stable Reasoning? [38.03049704515947]
Large Language Models (LLMs) have demonstrated remarkable progress in complex reasoning tasks.<n>However, a significant discrepancy persists between benchmark performances and real-world applications.<n>We introduce G-Pass@k, a novel evaluation metric that provides a continuous assessment of model performance.<n>We present LiveMathBench, a dynamic benchmark comprising challenging, contemporary mathematical problems.
arXiv Detail & Related papers (2024-12-17T18:12:47Z) - MixEval: Deriving Wisdom of the Crowd from LLM Benchmark Mixtures [57.886592207948844]
We propose MixEval, a new paradigm for establishing efficient, gold-standard evaluation by strategically mixing off-the-shelf benchmarks.
It bridges (1) comprehensive and well-distributed real-world user queries and (2) efficient and fairly-graded ground-truth-based benchmarks, by matching queries mined from the web with similar queries from existing benchmarks.
arXiv Detail & Related papers (2024-06-03T05:47:05Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.