Related papers: Speculative Actions: A Lossless Framework for Faster Agentic Systems

Speculative Actions: A Lossless Framework for Faster Agentic Systems

URL: http://arxiv.org/abs/2510.04371v1
Date: Sun, 05 Oct 2025 21:28:11 GMT
Title: Speculative Actions: A Lossless Framework for Faster Agentic Systems
Authors: Naimeng Ye, Arnav Ahuja, Georgios Liargkovas, Yunan Lu, Kostis Kaffes, Tianyi Peng,
Abstract summary: Execution of AI agents is often slow, hampering training, evaluation, and deployment.<n>Inspired by speculative execution in microprocessors, we propose a framework that predicts likely actions using faster models.<n>We evaluate this framework across three agentic environments: gaming, e-commerce, web search, and a "lossy" extension for an operating systems environment.
Score: 6.708126506152481
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Despite growing interest in AI agents across industry and academia, their execution in an environment is often slow, hampering training, evaluation, and deployment. For example, a game of chess between two state-of-the-art agents may take hours. A critical bottleneck is that agent behavior unfolds sequentially: each action requires an API call, and these calls can be time-consuming. Inspired by speculative execution in microprocessors and speculative decoding in LLM inference, we propose speculative actions, a lossless framework for general agentic systems that predicts likely actions using faster models, enabling multiple steps to be executed in parallel. We evaluate this framework across three agentic environments: gaming, e-commerce, web search, and a "lossy" extension for an operating systems environment. In all cases, speculative actions achieve substantial accuracy in next-action prediction (up to 55%), translating into significant reductions in end-to-end latency. Moreover, performance can be further improved through stronger guessing models, top-K action prediction, multi-step speculation, and uncertainty-aware optimization, opening a promising path toward deploying low-latency agentic systems in the real world.

Related papers

Benchmark Test-Time Scaling of General LLM Agents [27.756239376314294]
General AgentBench is a benchmark for evaluating general LLM agents across search, coding, reasoning, and tool-use domains.<n>We study performance degradation when moving from domain-specific evaluations to this general-agent setting.<n>We find that neither scaling yields effective performance improvements in practice, due to two fundamental limitations.
arXiv Detail & Related papers (2026-02-22T01:08:02Z)
DLLM Agent: See Farther, Run Faster [94.74432470237817]
Diffusion large language models (DLLMs) have emerged as an alternative to autoregressive (AR) decoding with appealing efficiency and modeling properties.<n>We study this in a controlled setting by instantiatingDLLM and AR backbones within the same agent workflow.<n>We find thatDLLM Agents are on average over 30% faster end to end than AR agents, with some cases exceeding 8x speedup.
arXiv Detail & Related papers (2026-02-07T09:01:18Z)
AgentArk: Distilling Multi-Agent Intelligence into a Single LLM Agent [57.10083973844841]
AgentArk is a novel framework to distill multi-agent dynamics into the weights of a single model.<n>We investigate three hierarchical distillation strategies across various models, tasks, scaling, and scenarios.<n>By shifting the burden of computation from inference to training, the distilled models preserve the efficiency of one agent while exhibiting strong reasoning and self-correction performance of multiple agents.
arXiv Detail & Related papers (2026-02-03T19:18:28Z)
Towards Efficient Agents: A Co-Design of Inference Architecture and System [66.59916327634639]
This paper presents AgentInfer, a unified framework for end-to-end agent acceleration.<n>We decompose the problem into four synergistic components: AgentCollab, AgentSched, AgentSAM, and AgentCompress.<n>Experiments on the BrowseComp-zh and DeepDiver benchmarks demonstrate that through the synergistic collaboration of these methods, AgentInfer reduces ineffective token consumption by over 50%.
arXiv Detail & Related papers (2025-12-20T12:06:13Z)
SCOPE: Prompt Evolution for Enhancing Agent Effectiveness [53.75986399936395]
Large Language Model (LLM) agents are increasingly deployed in environments that generate massive, dynamic contexts.<n>While agents have access to this context, their static prompts lack the mechanisms to manage it effectively.<n>We introduce textbfSCOPE (Self-evolving Context Optimization via Prompt Evolution)<n>We propose a Dual-Stream mechanism that balances tactical specificity (resolving immediate errors) with strategic generality (evolving long-term principles)
arXiv Detail & Related papers (2025-12-17T12:25:05Z)
Chameleon: Adaptive Adversarial Agents for Scaling-Based Visual Prompt Injection in Multimodal AI Systems [0.0]
We propose a novel, adaptive adversarial framework designed to expose and exploit scaling vulnerabilities in production Vision-Language Models (VLMs)<n>Our experiments demonstrate that Chameleon achieves an Attack Success Rate (ASR) of 84.5% across varying scaling factors.<n>We show that these attacks effectively compromise agentic pipelines, reducing decision-making accuracy by over 45% in multi-step tasks.
arXiv Detail & Related papers (2025-12-04T15:22:28Z)
Reducing Latency of LLM Search Agent via Speculation-based Algorithm-System Co-Design [35.95362310928356]
LLM-based search agents achieve strong performance but suffer from severe latency.<n>We revisit this bottleneck through the lens of speculation.<n>We present SPAgent, an algorithm-system co-design framework that expands the role of speculation in search agents to reduce latency.
arXiv Detail & Related papers (2025-11-25T08:15:17Z)
VeriOS: Query-Driven Proactive Human-Agent-GUI Interaction for Trustworthy OS Agents [39.3943822850841]
We introduce VeriOS-Agent, a trustworthy OS agent trained with a two-stage learning paradigm.<n>We show that VeriOS-Agent improves the average step-wise success rate by 20.64% in untrustworthy scenarios over the state-of-the-art.
arXiv Detail & Related papers (2025-09-09T09:46:01Z)
Adaptive Reinforcement Learning for Unobservable Random Delays [46.04329493317009]
We introduce a general framework that enables agents to adaptively handle unobservable and time-varying delays.<n>Specifically, the agent generates a matrix of possible future actions to handle both unpredictable delays and lost action packets sent over networks.<n>Our method significantly outperforms state-of-the-art approaches across a wide range of benchmark environments.
arXiv Detail & Related papers (2025-06-17T11:11:37Z)
Agentic Predictor: Performance Prediction for Agentic Workflows via Multi-View Encoding [56.565200973244146]
Agentic Predictor is a lightweight predictor for efficient agentic workflow evaluation.<n>By learning to approximate task success rates, Agentic Predictor enables fast and accurate selection of optimal agentic workflow configurations.
arXiv Detail & Related papers (2025-05-26T09:46:50Z)
DARS: Dynamic Action Re-Sampling to Enhance Coding Agent Performance by Adaptive Tree Traversal [55.13854171147104]
Large Language Models (LLMs) have revolutionized various domains, including natural language processing, data analysis, and software development.<n>We present Dynamic Action Re-Sampling (DARS), a novel inference time compute scaling approach for coding agents.<n>We evaluate our approach on SWE-Bench Lite benchmark, demonstrating that this scaling strategy achieves a pass@k score of 55% with Claude 3.5 Sonnet V2.
arXiv Detail & Related papers (2025-03-18T14:02:59Z)
DynaSaur: Large Language Agents Beyond Predefined Actions [126.98162266986554]
Existing LLM agent systems typically select actions from a fixed and predefined set at every step.<n>We propose an LLM agent framework that can dynamically create and compose actions as needed.<n>In this framework, the agent interacts with its environment by generating and executing programs written in a general-purpose programming language.
arXiv Detail & Related papers (2024-11-04T02:08:59Z)
Instance-Aware Predictive Navigation in Multi-Agent Environments [93.15055834395304]
We propose an Instance-Aware Predictive Control (IPC) approach, which forecasts interactions between agents as well as future scene structures. We adopt a novel multi-instance event prediction module to estimate the possible interaction among agents in the ego-centric view. We design a sequential action sampling strategy to better leverage predicted states on both scene-level and instance-level.
arXiv Detail & Related papers (2021-01-14T22:21:25Z)

This list is automatically generated from the titles and abstracts of the papers in this site.