TraceAegis: Securing LLM-Based Agents via Hierarchical and Behavioral Anomaly Detection
- URL: http://arxiv.org/abs/2510.11203v1
- Date: Mon, 13 Oct 2025 09:35:06 GMT
- Title: TraceAegis: Securing LLM-Based Agents via Hierarchical and Behavioral Anomaly Detection
- Authors: Jiahao Liu, Bonan Ruan, Xianglin Yang, Zhiwei Lin, Yan Liu, Yang Wang, Tao Wei, Zhenkai Liang,
- Abstract summary: We propose TraceAegis, a provenance-based analysis framework that leverages agent execution traces to detect potential anomalies.<n>By validating execution traces against both hierarchical and behavioral constraints, TraceAegis is able to effectively detect abnormal behaviors.
- Score: 31.243042511018675
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: LLM-based agents have demonstrated promising adaptability in real-world applications. However, these agents remain vulnerable to a wide range of attacks, such as tool poisoning and malicious instructions, that compromise their execution flow and can lead to serious consequences like data breaches and financial loss. Existing studies typically attempt to mitigate such anomalies by predefining specific rules and enforcing them at runtime to enhance safety. Yet, designing comprehensive rules is difficult, requiring extensive manual effort and still leaving gaps that result in false negatives. As agent systems evolve into complex software systems, we take inspiration from software system security and propose TraceAegis, a provenance-based analysis framework that leverages agent execution traces to detect potential anomalies. In particular, TraceAegis constructs a hierarchical structure to abstract stable execution units that characterize normal agent behaviors. These units are then summarized into constrained behavioral rules that specify the conditions necessary to complete a task. By validating execution traces against both hierarchical and behavioral constraints, TraceAegis is able to effectively detect abnormal behaviors. To evaluate the effectiveness of TraceAegis, we introduce TraceAegis-Bench, a dataset covering two representative scenarios: healthcare and corporate procurement. Each scenario includes 1,300 benign behaviors and 300 abnormal behaviors, where the anomalies either violate the agent's execution order or break the semantic consistency of its execution sequence. Experimental results demonstrate that TraceAegis achieves strong performance on TraceAegis-Bench, successfully identifying the majority of abnormal behaviors.
Related papers
- TraceSIR: A Multi-Agent Framework for Structured Analysis and Reporting of Agentic Execution Traces [32.4073751390339]
We propose TraceSIR, a framework for structured analysis and reporting of agentic execution traces.<n>TraceSIR coordinates three specialized agents: StructureAgent, InsightAgent, and ReportAgent.<n>Experiments show that TraceSIR consistently produces coherent, informative, and actionable reports.
arXiv Detail & Related papers (2026-02-28T12:33:24Z) - SkillJect: Automating Stealthy Skill-Based Prompt Injection for Coding Agents with Trace-Driven Closed-Loop Refinement [120.52289344734415]
We propose an automated framework for stealthy prompt injection tailored to agent skills.<n>The framework forms a closed loop with three agents: an Attack Agent that synthesizes injection skills under explicit stealth constraints, a Code Agent that executes tasks using the injected skills and an Evaluate Agent that logs action traces.<n>Our method consistently achieves high attack success rates under realistic settings.
arXiv Detail & Related papers (2026-02-15T16:09:48Z) - AgentTrace: A Structured Logging Framework for Agent System Observability [0.0]
AgentTrace is a dynamic observability and telemetry framework designed to fill this gap.<n>Unlike traditional logging systems, AgentTrace emphasizes continuous, introspectable trace capture.<n>Our research highlights how AgentTrace can enable more reliable agent deployment, fine-grained risk analysis, and informed trust calibration.
arXiv Detail & Related papers (2026-02-07T04:04:59Z) - TriCEGAR: A Trace-Driven Abstraction Mechanism for Agentic AI [5.1181001367075]
TriCEGAR is a trace-driven abstraction mechanism that automates state construction from execution logs.<n>We describe a framework-native implementation that captures typed agent lifecycle events and builds abstractions from traces.<n>We also show how run likelihoods enable anomaly detection as a guardrailing signal.
arXiv Detail & Related papers (2026-01-30T14:01:47Z) - AgentDoG: A Diagnostic Guardrail Framework for AI Agent Safety and Security [126.49733412191416]
Current guardrail models lack agentic risk awareness and transparency in risk diagnosis.<n>We propose a unified three-dimensional taxonomy that categorizes agentic risks by their source (where), failure mode (how), and consequence (what)<n>We introduce a new fine-grained agentic safety benchmark (ATBench) and a Diagnostic Guardrail framework for agent safety and security (AgentDoG)
arXiv Detail & Related papers (2026-01-26T13:45:41Z) - The Why Behind the Action: Unveiling Internal Drivers via Agentic Attribution [63.61358761489141]
Large Language Model (LLM)-based agents are widely used in real-world applications such as customer service, web navigation, and software engineering.<n>We propose a novel framework for textbfgeneral agentic attribution, designed to identify the internal factors driving agent actions regardless of the task outcome.<n>We validate our framework across a diverse suite of agentic scenarios, including standard tool use and subtle reliability risks like memory-induced bias.
arXiv Detail & Related papers (2026-01-21T15:22:21Z) - Explainable and Fine-Grained Safeguarding of LLM Multi-Agent Systems via Bi-Level Graph Anomaly Detection [76.91230292971115]
Large language model (LLM)-based multi-agent systems (MAS) have shown strong capabilities in solving complex tasks.<n>XG-Guard is an explainable and fine-grained safeguarding framework for detecting malicious agents in MAS.
arXiv Detail & Related papers (2025-12-21T13:46:36Z) - Are Your Agents Upward Deceivers? [73.1073084327614]
Large Language Model (LLM)-based agents are increasingly used as autonomous subordinates that carry out tasks for users.<n>This raises the question of whether they may also engage in deception, similar to how individuals in human organizations lie to superiors to create a good image or avoid punishment.<n>We observe and define agentic upward deception, a phenomenon in which an agent facing environmental constraints conceals its failure and performs actions that were not requested without reporting.
arXiv Detail & Related papers (2025-12-04T14:47:05Z) - Where Did It Go Wrong? Attributing Undesirable LLM Behaviors via Representation Gradient Tracing [12.835224376066769]
Large Language Models (LLMs) have demonstrated remarkable capabilities, yet their deployment is frequently undermined by undesirable behaviors.<n>We introduce a novel and efficient framework that diagnoses a range of undesirable LLM behaviors by analyzing representation and its gradients.<n>We systematically evaluate our method for tasks that include tracking harmful content, detecting backdoor poisoning, and identifying knowledge contamination.
arXiv Detail & Related papers (2025-09-26T12:07:47Z) - Model Editing as a Double-Edged Sword: Steering Agent Ethical Behavior Toward Beneficence or Harm [57.00627691433355]
We frame agent behavior steering as a model editing task, which we term Behavior Editing.<n>We introduce BehaviorBench, a benchmark grounded in psychological moral theories.<n>We demonstrate that Behavior Editing can be used to promote ethical and benevolent behavior or, conversely, to induce harmful or malicious behavior.
arXiv Detail & Related papers (2025-06-25T16:51:51Z) - Code-as-Monitor: Constraint-aware Visual Programming for Reactive and Proactive Robotic Failure Detection [56.66677293607114]
We propose Code-as-Monitor (CaM) for both open-set reactive and proactive failure detection.<n>To enhance the accuracy and efficiency of monitoring, we introduce constraint elements that abstract constraint-related entities.<n>Experiments show that CaM achieves a 28.7% higher success rate and reduces execution time by 31.8% under severe disturbances.
arXiv Detail & Related papers (2024-12-05T18:58:27Z) - EagleEye: Attention to Unveil Malicious Event Sequences from Provenance Graphs [1.3359586871482305]
Securing endpoints is challenging due to the evolving nature of threats and attacks.
With endpoint logging systems becoming mature, provenance-graph representations enable the creation of sophisticated behavior rules.
We develop and present EagleEye, a novel system that uses rich features from provenance graphs for behavior event representation.
arXiv Detail & Related papers (2024-08-17T14:48:02Z) - Learning Recovery Strategies for Dynamic Self-healing in Reactive
Systems [1.7218973692320518]
Self-healing systems depend on following a set of predefined instructions to recover from a known failure state.
Our proposal targets complex reactive systems, defining monitors as predicates specifying satisfiability conditions of system properties.
We use a Reinforcement Learning-based technique to learn a recovery strategy based on users' corrective sequences.
arXiv Detail & Related papers (2024-01-22T23:34:21Z) - A Simple Solution for Offline Imitation from Observations and Examples
with Possibly Incomplete Trajectories [122.11358440078581]
offline imitation is useful in real-world scenarios where arbitrary interactions are costly and expert actions are unavailable.
We propose Trajectory-Aware Learning from Observations (TAILO) to solve MDPs where only task-specific expert states and task-agnostic non-expert state-action pairs are available.
arXiv Detail & Related papers (2023-11-02T15:41:09Z) - InfoBehavior: Self-supervised Representation Learning for Ultra-long
Behavior Sequence via Hierarchical Grouping [14.80873165144865]
E-commerce companies have to face abnormal sellers who sell potentially-risky products.
Traditional feature extraction techniques heavily depend on domain experts and adapt poorly to new tasks.
We propose a self-supervised method InfoBehavior to automatically extract meaningful representations from ultra-long raw behavior sequences.
arXiv Detail & Related papers (2021-06-13T03:45:45Z) - No Need to Know Physics: Resilience of Process-based Model-free Anomaly
Detection for Industrial Control Systems [95.54151664013011]
We present a novel framework to generate adversarial spoofing signals that violate physical properties of the system.
We analyze four anomaly detectors published at top security conferences.
arXiv Detail & Related papers (2020-12-07T11:02:44Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.