AgentSafe: Safeguarding Large Language Model-based Multi-agent Systems via Hierarchical Data Management
- URL: http://arxiv.org/abs/2503.04392v1
- Date: Thu, 06 Mar 2025 12:41:54 GMT
- Title: AgentSafe: Safeguarding Large Language Model-based Multi-agent Systems via Hierarchical Data Management
- Authors: Junyuan Mao, Fanci Meng, Yifan Duan, Miao Yu, Xiaojun Jia, Junfeng Fang, Yuxuan Liang, Kun Wang, Qingsong Wen,
- Abstract summary: Large Language Model based multi-agent systems are revolutionizing autonomous communication and collaboration.<n>We introduce AgentSafe, a novel framework that enhances MAS security through hierarchical information management and memory protection.<n>AgentSafe incorporates two components: ThreatSieve, which secures communication by verifying information authority and preventing impersonation, and HierarCache, an adaptive memory management system that defends against unauthorized access and malicious poisoning.
- Score: 28.14286256061824
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Large Language Model based multi-agent systems are revolutionizing autonomous communication and collaboration, yet they remain vulnerable to security threats like unauthorized access and data breaches. To address this, we introduce AgentSafe, a novel framework that enhances MAS security through hierarchical information management and memory protection. AgentSafe classifies information by security levels, restricting sensitive data access to authorized agents. AgentSafe incorporates two components: ThreatSieve, which secures communication by verifying information authority and preventing impersonation, and HierarCache, an adaptive memory management system that defends against unauthorized access and malicious poisoning, representing the first systematic defense for agent memory. Experiments across various LLMs show that AgentSafe significantly boosts system resilience, achieving defense success rates above 80% under adversarial conditions. Additionally, AgentSafe demonstrates scalability, maintaining robust performance as agent numbers and information complexity grow. Results underscore effectiveness of AgentSafe in securing MAS and its potential for real-world application.
Related papers
- SAGA: A Security Architecture for Governing AI Agentic Systems [13.106925341037046]
Large Language Model (LLM)-based agents increasingly interact, collaborate, and delegate tasks to one another autonomously with minimal human interaction.
Industry guidelines for agentic system governance emphasize the need for users to maintain comprehensive control over their agents.
We propose SAGA, a Security Architecture for Governing Agentic systems, that offers user oversight over their agents' lifecycle.
arXiv Detail & Related papers (2025-04-27T23:10:00Z) - DoomArena: A framework for Testing AI Agents Against Evolving Security Threats [84.94654617852322]
We present DoomArena, a security evaluation framework for AI agents.
It is a plug-in framework and integrates easily into realistic agentic frameworks.
It is modular and decouples the development of attacks from details of the environment in which the agent is deployed.
arXiv Detail & Related papers (2025-04-18T20:36:10Z) - Get the Agents Drunk: Memory Perturbations in Autonomous Agent-based Recommender Systems [29.35591074298123]
Large language model-based agents are increasingly used in recommender systems (Agent4RSs) to achieve personalized behavior modeling.
To the best of our knowledge, how robust Agent4RSs are remains unexplored.
We propose the first work to attack Agent4RSs by perturbing agents' memories, not only to uncover their limitations but also to enhance their security and robustness.
arXiv Detail & Related papers (2025-03-31T07:35:40Z) - ShieldAgent: Shielding Agents via Verifiable Safety Policy Reasoning [7.481324060587101]
ShieldAgent is a guardrail agent designed to enforce explicit safety policy compliance for the action trajectory of other protected agents.
Given the action trajectory of the protected agent, ShieldAgent retrieves relevant rule circuits and generates a shielding plan.
ShieldAgent reduces API queries by 64.7% and inference time by 58.2%, demonstrating its high precision and efficiency in safeguarding agents.
arXiv Detail & Related papers (2025-03-26T17:58:40Z) - Defeating Prompt Injections by Design [79.00910871948787]
CaMeL is a robust defense that creates a protective system layer around the Large Language Models (LLMs)
To operate, CaMeL explicitly extracts the control and data flows from the (trusted) query.
We demonstrate effectiveness of CaMeL by solving $67%$ of tasks with provable security in AgentDojo [NeurIPS 2024], a recent agentic security benchmark.
arXiv Detail & Related papers (2025-03-24T15:54:10Z) - Multi-Agent Security Tax: Trading Off Security and Collaboration Capabilities in Multi-Agent Systems [1.2564343689544843]
We develop simulations of AI agents collaborating on shared objectives to study security risks and trade-offs.<n>We observe infectious malicious prompts - the multi-hop spreading of malicious instructions.<n>Our findings illustrate potential trade-off between security and collaborative efficiency in multi-agent systems.
arXiv Detail & Related papers (2025-02-26T14:00:35Z) - The Task Shield: Enforcing Task Alignment to Defend Against Indirect Prompt Injection in LLM Agents [6.829628038851487]
Large Language Model (LLM) agents are increasingly being deployed as conversational assistants capable of performing complex real-world tasks through tool integration.<n>In particular, indirect prompt injection attacks pose a critical threat, where malicious instructions embedded within external data sources can manipulate agents to deviate from user intentions.<n>We propose a novel perspective that reframes agent security from preventing harmful actions to ensuring task alignment, requiring every agent action to serve user objectives.
arXiv Detail & Related papers (2024-12-21T16:17:48Z) - SafeAgentBench: A Benchmark for Safe Task Planning of Embodied LLM Agents [42.69984822098671]
We present SafeAgentBench, a new benchmark for safety-aware task planning of embodied LLM agents.<n>SafeAgentBench includes: (1) a new dataset with 750 tasks, covering 10 potential hazards and 3 task types; (2) SafeAgentEnv, a universal embodied environment with a low-level controller, supporting multi-agent execution with 17 high-level actions for 8 state-of-the-art baselines; and (3) reliable evaluation methods from both execution and semantic perspectives.
arXiv Detail & Related papers (2024-12-17T18:55:58Z) - Security Threats in Agentic AI System [0.0]
The complexity of AI systems combined with their ability to process and analyze large volumes of data increases the chances of data leaks or breaches.
As AI agents evolve with greater autonomy, their capacity to bypass or exploit security measures becomes a growing concern.
arXiv Detail & Related papers (2024-10-16T06:40:02Z) - On the Resilience of LLM-Based Multi-Agent Collaboration with Faulty Agents [58.79302663733703]
Large language model-based multi-agent systems have shown great abilities across various tasks due to the collaboration of expert agents.<n>However, the impact of clumsy or even malicious agents, on the overall performance of the system remains underexplored.<n>This paper investigates what is the resilience of various system structures under faulty agents.
arXiv Detail & Related papers (2024-08-02T03:25:20Z) - AgentPoison: Red-teaming LLM Agents via Poisoning Memory or Knowledge Bases [73.04652687616286]
We propose AgentPoison, the first backdoor attack targeting generic and RAG-based LLM agents by poisoning their long-term memory or RAG knowledge base.
Unlike conventional backdoor attacks, AgentPoison requires no additional model training or fine-tuning.
On each agent, AgentPoison achieves an average attack success rate higher than 80% with minimal impact on benign performance.
arXiv Detail & Related papers (2024-07-17T17:59:47Z) - Security Matrix for Multimodal Agents on Mobile Devices: A Systematic and Proof of Concept Study [16.559272781032632]
The rapid progress in the reasoning capability of the Multi-modal Large Language Models has triggered the development of autonomous agent systems on mobile devices.
Despite the increased human-machine interaction efficiency, the security risks of MLLM-based mobile agent systems have not been systematically studied.
This paper highlights the need for security awareness in the design of MLLM-based systems and paves the way for future research on attacks and defense methods.
arXiv Detail & Related papers (2024-07-12T14:30:05Z) - GuardAgent: Safeguard LLM Agents by a Guard Agent via Knowledge-Enabled Reasoning [79.07152553060601]
We propose GuardAgent, the first guardrail agent to protect the target agents by dynamically checking whether their actions satisfy given safety guard requests.
Specifically, GuardAgent first analyzes the safety guard requests to generate a task plan, and then maps this plan into guardrail code for execution.
We show that GuardAgent effectively moderates the violation actions for different types of agents on two benchmarks with over 98% and 83% guardrail accuracies.
arXiv Detail & Related papers (2024-06-13T14:49:26Z) - Watch Out for Your Agents! Investigating Backdoor Threats to LLM-Based Agents [47.219047422240145]
We take the first step to investigate one of the typical safety threats, backdoor attack, to LLM-based agents.
Specifically, compared with traditional backdoor attacks on LLMs that are only able to manipulate the user inputs and model outputs, agent backdoor attacks exhibit more diverse and covert forms.
arXiv Detail & Related papers (2024-02-17T06:48:45Z) - TrustAgent: Towards Safe and Trustworthy LLM-based Agents [50.33549510615024]
This paper presents an Agent-Constitution-based agent framework, TrustAgent, with a focus on improving the LLM-based agent safety.
The proposed framework ensures strict adherence to the Agent Constitution through three strategic components: pre-planning strategy which injects safety knowledge to the model before plan generation, in-planning strategy which enhances safety during plan generation, and post-planning strategy which ensures safety by post-planning inspection.
arXiv Detail & Related papers (2024-02-02T17:26:23Z) - PsySafe: A Comprehensive Framework for Psychological-based Attack, Defense, and Evaluation of Multi-agent System Safety [70.84902425123406]
Multi-agent systems, when enhanced with Large Language Models (LLMs), exhibit profound capabilities in collective intelligence.
However, the potential misuse of this intelligence for malicious purposes presents significant risks.
We propose a framework (PsySafe) grounded in agent psychology, focusing on identifying how dark personality traits in agents can lead to risky behaviors.
Our experiments reveal several intriguing phenomena, such as the collective dangerous behaviors among agents, agents' self-reflection when engaging in dangerous behavior, and the correlation between agents' psychological assessments and dangerous behaviors.
arXiv Detail & Related papers (2024-01-22T12:11:55Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.