Related papers: SABER: Small Actions, Big Errors -- Safeguarding Mutating Steps in LLM Agents

SABER: Small Actions, Big Errors -- Safeguarding Mutating Steps in LLM Agents

URL: http://arxiv.org/abs/2512.07850v1
Date: Wed, 26 Nov 2025 01:28:22 GMT
Title: SABER: Small Actions, Big Errors -- Safeguarding Mutating Steps in LLM Agents
Authors: Alejandro Cuadron, Pengfei Yu, Yang Liu, Arpit Gupta,
Abstract summary: We analyze execution traces on $$-Bench (Airline/Retail) and SWE-Bench Verified.<n>We formalize emphdecisive deviations, earliest action, level divergences that flip success to failure.<n>We introduce cm, a model-agnostic, gradient-free, test-time safeguard.
Score: 52.20768003832476
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: Despite rapid progress in LLM agents, performance on long-horizon, tool-using tasks remains fragile. To better understand this fragility, we ask a simple question: \emph{do all actions contribute equally to failure?} Analyzing execution traces on $τ$-Bench (Airline/Retail) and SWE-Bench Verified, we decompose trajectories into \emph{mutating} (environment-changing) vs.\ non-mutating steps and formalize \emph{decisive deviations}, earliest action, level divergences that flip success to failure. A logistic regression reveals that each additional deviation in a mutating action reduces the odds of success by upto $92\%$ on Airline and upto $96\%$ on Retail for SoTA models. In contrast, deviations in non-mutating actions have little to no effect. Errors also grow with context length as agents drift from role and act on stale constraints. Motivated by these observations, we introduce \cm{}, a model-agnostic, gradient-free, test-time safeguard that (i) adds mutation-gated verification, (ii) injects \emph{Targeted Reflection} before mutating steps, and (iii) performs block-based context cleaning. \cm{} delivers consistent gains, e.g., Qwen3-Thinking: +28\% \emph{relative} on Airline, +11\% on Retail, and +7\% on SWE-Bench Verified; Claude: +9\%/+7\%. We further identify ceiling effects in $τ$-Bench, where annotation errors and underspecified tasks artificially cap model performance. To address this, we release $τ$-Bench Verified, which restores benchmark headroom through targeted revisions. Our results argue for action-level analysis, targeted safeguards, and reliable evaluations as prerequisites for robust multi-turn agents.

Related papers

Information Fidelity in Tool-Using LLM Agents: A Martingale Analysis of the Model Context Protocol [69.11739400975445]
We introduce the first theoretical framework for analyzing error accumulation in Model Context Protocol (MCP) agents.<n>We show that cumulative distortion exhibits linear growth and high-probability deviations bounded by $O(sqrtT)$.<n>Key findings include: semantic weighting reduces distortion by 80%, and periodic re-grounding approximately every 9 steps suffices for error control.
arXiv Detail & Related papers (2026-02-10T21:08:53Z)
When Actions Go Off-Task: Detecting and Correcting Misaligned Actions in Computer-Use Agents [50.5814495434565]
This work makes the first effort to define and study misaligned action detection in computer-use agents (CUAs)<n>We identify three common categories in real-world CUA deployment and construct MisActBench, a benchmark of realistic trajectories with human-annotated, action-level alignment labels.<n>We propose DeAction, a practical and universal guardrail that detects misaligned actions before execution and iteratively corrects them through structured feedback.
arXiv Detail & Related papers (2026-02-09T18:41:15Z)
Beyond Max Tokens: Stealthy Resource Amplification via Tool Calling Chains in LLM Agents [31.789859492703016]
The agent-tool communication loop is a critical attack surface in Large Language Model (LLM) agents.<n>Existing Denial-of-Service (DoS) attacks are ineffective for this new paradigm.<n>We introduce a stealthy, multi-turn economic DoS attack that operates at the tool layer under the guise of a correctly completed task.
arXiv Detail & Related papers (2026-01-16T02:47:45Z)
ReliabilityBench: Evaluating LLM Agent Reliability Under Production-Like Stress Conditions [0.32928123659012326]
Existing benchmarks for tool-using LLM agents primarily report single-run success rates and miss reliability properties required in production.<n>We introduce textbfReliabilityBench, a benchmark for evaluating agent reliability across three dimensions.<n>We evaluate two models (Gemini 2.0 Flash, GPT-4o) and two agent architectures (ReAct, Reflexion) across four domains (scheduling, travel, customer support, e-commerce) over 1,280 episodes.
arXiv Detail & Related papers (2026-01-03T13:41:33Z)
DoVer: Intervention-Driven Auto Debugging for LLM Multi-Agent Systems [48.971606069204825]
DoVer is an intervention-driven debug framework for large language model (LLM)-based multi-agent systems.<n>It augments hypothesis generation with active verification through targeted interventions.<n>DoVer flips 18-28% of failed trials into successes, achieves up to 16% milestone progress, and validates or refutes 30-60% of failure hypotheses.
arXiv Detail & Related papers (2025-12-07T09:23:48Z)
Abduct, Act, Predict: Scaffolding Causal Inference for Automated Failure Attribution in Multi-Agent Systems [20.846301581161978]
Failure attribution in multi-agent systems is a critical yet unsolved challenge.<n>Current methods treat this as a pattern recognition task over long conversation logs.<n>A2P Scaffolding transforms failure attribution from pattern recognition into a structured causal inference task.
arXiv Detail & Related papers (2025-09-12T16:51:15Z)
Runaway is Ashamed, But Helpful: On the Early-Exit Behavior of Large Language Model-based Agents in Embodied Environments [54.67512489842682]
Large language models (LLMs) have demonstrated strong planning and decision-making capabilities in complex embodied environments.<n>We take a first step toward exploring the early-exit behavior for LLM-based agents.
arXiv Detail & Related papers (2025-05-23T08:23:36Z)
AegisLLM: Scaling Agentic Systems for Self-Reflective Defense in LLM Security [74.22452069013289]
AegisLLM is a cooperative multi-agent defense against adversarial attacks and information leakage.<n>We show that scaling agentic reasoning system at test-time substantially enhances robustness without compromising model utility.<n> Comprehensive evaluations across key threat scenarios, including unlearning and jailbreaking, demonstrate the effectiveness of AegisLLM.
arXiv Detail & Related papers (2025-04-29T17:36:05Z)
Model Tailor: Mitigating Catastrophic Forgetting in Multi-modal Large Language Models [46.92994945808424]
Catastrophic forgetting emerges as a critical challenge when fine-tuning multi-modal large language models (MLLMs) This paper presents a comprehensive analysis of catastrophic forgetting in MLLMs and introduces a post-training adjustment method called Model Tailor.
arXiv Detail & Related papers (2024-02-19T11:02:05Z)
Towards Variable-Length Textual Adversarial Attacks [68.27995111870712]
It is non-trivial to conduct textual adversarial attacks on natural language processing tasks due to the discreteness of data. In this paper, we propose variable-length textual adversarial attacks(VL-Attack) Our method can achieve $33.18$ BLEU score on IWSLT14 German-English translation, achieving an improvement of $1.47$ over the baseline model.
arXiv Detail & Related papers (2021-04-16T14:37:27Z)

This list is automatically generated from the titles and abstracts of the papers in this site.