A Survey on Autonomy-Induced Security Risks in Large Model-Based Agents
- URL: http://arxiv.org/abs/2506.23844v1
- Date: Mon, 30 Jun 2025 13:34:34 GMT
- Title: A Survey on Autonomy-Induced Security Risks in Large Model-Based Agents
- Authors: Hang Su, Jun Luo, Chang Liu, Xiao Yang, Yichi Zhang, Yinpeng Dong, Jun Zhu,
- Abstract summary: Recent advances in large language models (LLMs) have catalyzed the rise of autonomous AI agents.<n>These large-model agents mark a paradigm shift from static inference systems to interactive, memory-augmented entities.
- Score: 45.53643260046778
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Recent advances in large language models (LLMs) have catalyzed the rise of autonomous AI agents capable of perceiving, reasoning, and acting in dynamic, open-ended environments. These large-model agents mark a paradigm shift from static inference systems to interactive, memory-augmented entities. While these capabilities significantly expand the functional scope of AI, they also introduce qualitatively novel security risks - such as memory poisoning, tool misuse, reward hacking, and emergent misalignment - that extend beyond the threat models of conventional systems or standalone LLMs. In this survey, we first examine the structural foundations and key capabilities that underpin increasing levels of agent autonomy, including long-term memory retention, modular tool use, recursive planning, and reflective reasoning. We then analyze the corresponding security vulnerabilities across the agent stack, identifying failure modes such as deferred decision hazards, irreversible tool chains, and deceptive behaviors arising from internal state drift or value misalignment. These risks are traced to architectural fragilities that emerge across perception, cognition, memory, and action modules. To address these challenges, we systematically review recent defense strategies deployed at different autonomy layers, including input sanitization, memory lifecycle control, constrained decision-making, structured tool invocation, and introspective reflection. We introduce the Reflective Risk-Aware Agent Architecture (R2A2), a unified cognitive framework grounded in Constrained Markov Decision Processes (CMDPs), which incorporates risk-aware world modeling, meta-policy adaptation, and joint reward-risk optimization to enable principled, proactive safety across the agent's decision-making loop.
Related papers
- QSAF: A Novel Mitigation Framework for Cognitive Degradation in Agentic AI [2.505520948667288]
We introduce Cognitive Degradation as a novel vulnerability class in agentic AI systems.<n>These failures originate internally, arising from memory starvation, planner recursion, context flooding, and output suppression.<n>To address this class of failures, we introduce the Qorvex Security AI Framework for Behavioral & Cognitive Resilience.
arXiv Detail & Related papers (2025-07-21T07:41:58Z) - SafeMobile: Chain-level Jailbreak Detection and Automated Evaluation for Multimodal Mobile Agents [58.21223208538351]
This work explores the security issues surrounding mobile multimodal agents.<n>It attempts to construct a risk discrimination mechanism by incorporating behavioral sequence information.<n>It also designs an automated assisted assessment scheme based on a large language model.
arXiv Detail & Related papers (2025-07-01T15:10:00Z) - Automating Safety Enhancement for LLM-based Agents with Synthetic Risk Scenarios [77.86600052899156]
Large Language Model (LLM)-based agents are increasingly deployed in real-world applications.<n>We propose AutoSafe, the first framework that systematically enhances agent safety through fully automated synthetic data generation.<n>We show that AutoSafe boosts safety scores by 45% on average and achieves a 28.91% improvement on real-world tasks.
arXiv Detail & Related papers (2025-05-23T10:56:06Z) - Safety Devolution in AI Agents [56.482973617087254]
This study investigates how expanding retrieval access affects model reliability, bias propagation, and harmful content generation.<n>Retrieval-augmented agents built on aligned LLMs often behave more unsafely than uncensored models without retrieval.<n>These findings underscore the need for robust mitigation strategies to ensure fairness and reliability in retrieval-augmented and increasingly autonomous AI systems.
arXiv Detail & Related papers (2025-05-20T11:21:40Z) - Safety Alignment Can Be Not Superficial With Explicit Safety Signals [8.297367440457508]
Recent studies on the safety alignment of large language models (LLMs) have revealed that existing approaches often operate superficially.<n>This paper identifies a fundamental cause of this superficiality: existing alignment approaches presume that models can implicitly learn a safety-related reasoning task during the alignment process.<n>By explicitly introducing a safety-related binary classification task and integrating its signals with our attention and decoding strategies, we eliminate this ambiguity.
arXiv Detail & Related papers (2025-05-19T20:40:46Z) - DrunkAgent: Stealthy Memory Corruption in LLM-Powered Recommender Agents [28.294322726282896]
Large language model (LLM)-powered agents are increasingly used in recommender systems (RSs) to achieve personalized behavior modeling.<n>This paper presents the first systematic investigation of memory-based vulnerabilities in LLM-powered recommender agents.<n>We propose a novel black-box attack framework named DrunkAgent, which crafts semantically meaningful adversarial triggers.
arXiv Detail & Related papers (2025-03-31T07:35:40Z) - A Formal Framework for Assessing and Mitigating Emergent Security Risks in Generative AI Models: Bridging Theory and Dynamic Risk Mitigation [0.3413711585591077]
As generative AI systems, including large language models (LLMs) and diffusion models, advance rapidly, their growing adoption has led to new and complex security risks.
This paper introduces a novel formal framework for categorizing and mitigating these emergent security risks.
We identify previously under-explored risks, including latent space exploitation, multi-modal cross-attack vectors, and feedback-loop-induced model degradation.
arXiv Detail & Related papers (2024-10-15T02:51:32Z) - EARBench: Towards Evaluating Physical Risk Awareness for Task Planning of Foundation Model-based Embodied AI Agents [53.717918131568936]
Embodied artificial intelligence (EAI) integrates advanced AI models into physical entities for real-world interaction.<n>Foundation models as the "brain" of EAI agents for high-level task planning have shown promising results.<n>However, the deployment of these agents in physical environments presents significant safety challenges.<n>This study introduces EARBench, a novel framework for automated physical risk assessment in EAI scenarios.
arXiv Detail & Related papers (2024-08-08T13:19:37Z) - HAZARD Challenge: Embodied Decision Making in Dynamically Changing
Environments [93.94020724735199]
HAZARD consists of three unexpected disaster scenarios, including fire, flood, and wind.
This benchmark enables us to evaluate autonomous agents' decision-making capabilities across various pipelines.
arXiv Detail & Related papers (2024-01-23T18:59:43Z) - Lyapunov-based uncertainty-aware safe reinforcement learning [0.0]
InReinforcement learning (RL) has shown a promising performance in learning optimal policies for a variety of sequential decision-making tasks.
In many real-world RL problems, besides optimizing the main objectives, the agent is expected to satisfy a certain level of safety.
We propose a Lyapunov-based uncertainty-aware safe RL model to address these limitations.
arXiv Detail & Related papers (2021-07-29T13:08:15Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.