A-MemGuard: A Proactive Defense Framework for LLM-Based Agent Memory
- URL: http://arxiv.org/abs/2510.02373v1
- Date: Mon, 29 Sep 2025 16:04:15 GMT
- Title: A-MemGuard: A Proactive Defense Framework for LLM-Based Agent Memory
- Authors: Qianshan Wei, Tengchao Yang, Yaochen Wang, Xinfeng Li, Lijun Li, Zhenfei Yin, Yi Zhan, Thorsten Holz, Zhiqiang Lin, XiaoFeng Wang,
- Abstract summary: Large Language Model (LLM) agents use memory to learn from past interactions.<n>An adversary can inject seemingly harmless records into an agent's memory to manipulate its future behavior.<n>A-MemGuard is the first proactive defense framework for LLM agent memory.
- Score: 31.673865459672285
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Large Language Model (LLM) agents use memory to learn from past interactions, enabling autonomous planning and decision-making in complex environments. However, this reliance on memory introduces a critical security risk: an adversary can inject seemingly harmless records into an agent's memory to manipulate its future behavior. This vulnerability is characterized by two core aspects: First, the malicious effect of injected records is only activated within a specific context, making them hard to detect when individual memory entries are audited in isolation. Second, once triggered, the manipulation can initiate a self-reinforcing error cycle: the corrupted outcome is stored as precedent, which not only amplifies the initial error but also progressively lowers the threshold for similar attacks in the future. To address these challenges, we introduce A-MemGuard (Agent-Memory Guard), the first proactive defense framework for LLM agent memory. The core idea of our work is the insight that memory itself must become both self-checking and self-correcting. Without modifying the agent's core architecture, A-MemGuard combines two mechanisms: (1) consensus-based validation, which detects anomalies by comparing reasoning paths derived from multiple related memories and (2) a dual-memory structure, where detected failures are distilled into ``lessons'' stored separately and consulted before future actions, breaking error cycles and enabling adaptation. Comprehensive evaluations on multiple benchmarks show that A-MemGuard effectively cuts attack success rates by over 95% while incurring a minimal utility cost. This work shifts LLM memory security from static filtering to a proactive, experience-driven model where defenses strengthen over time. Our code is available in https://github.com/TangciuYueng/AMemGuard
Related papers
- MemoryArena: Benchmarking Agent Memory in Interdependent Multi-Session Agentic Tasks [55.145729491377374]
Existing evaluations of agents with memory typically assess memorization and action in isolation.<n>We introduce MemoryArena, a unified evaluation gym for benchmarking agent memory in multi-session Memory-Agent-Environment loops.<n> MemoryArena supports evaluation across web navigation, preference-constrained planning, progressive information search, and sequential formal reasoning.
arXiv Detail & Related papers (2026-02-18T09:49:14Z) - Zombie Agents: Persistent Control of Self-Evolving LLM Agents via Self-Reinforcing Injections [57.64370755825839]
Self-evolving agents update their internal state across sessions, often by writing and reusing long-term memory.<n>We study this risk and formalize a persistent attack we call a Zombie Agent.<n>We present a black-box attack framework that uses only indirect exposure through attacker-controlled web content.
arXiv Detail & Related papers (2026-02-17T15:28:24Z) - AgentSys: Secure and Dynamic LLM Agents Through Explicit Hierarchical Memory Management [47.49917373646469]
Existing defenses treat bloated memory as given and focus on remaining resilient.<n>We present AgentSys, a framework that defends against indirect prompt injection through explicit memory management.
arXiv Detail & Related papers (2026-02-07T06:28:51Z) - Memory Poisoning Attack and Defense on Memory Based LLM-Agents [3.7127635602605014]
Large language model agents equipped with persistent memory are vulnerable to memory poisoning attacks.<n>Recent work demonstrated that the MINJA (Memory Injection Attack) achieves over 95 % injection success rate.<n>This work addresses gaps through systematic empirical evaluation of memory poisoning attacks and defenses.
arXiv Detail & Related papers (2026-01-09T03:26:10Z) - The Trojan Knowledge: Bypassing Commercial LLM Guardrails via Harmless Prompt Weaving and Adaptive Tree Search [58.8834056209347]
Large language models (LLMs) remain vulnerable to jailbreak attacks that bypass safety guardrails to elicit harmful outputs.<n>We introduce the Correlated Knowledge Attack Agent (CKA-Agent), a dynamic framework that reframes jailbreaking as an adaptive, tree-structured exploration of the target model's knowledge base.
arXiv Detail & Related papers (2025-12-01T07:05:23Z) - BlindGuard: Safeguarding LLM-based Multi-Agent Systems under Unknown Attacks [58.959622170433725]
BlindGuard is an unsupervised defense method that learns without requiring any attack-specific labels or prior knowledge of malicious behaviors.<n>We show that BlindGuard effectively detects diverse attack types (i.e., prompt injection, memory poisoning, and tool attack) across multi-agent systems.
arXiv Detail & Related papers (2025-08-11T16:04:47Z) - LeakSealer: A Semisupervised Defense for LLMs Against Prompt Injection and Leakage Attacks [7.115093658017371]
LeakSealer is a model-agnostic framework that combines static analysis for forensic insights with dynamic defenses in a Human-In-The-Loop pipeline.<n>We empirically evaluate LeakSealer under two scenarios: (1) jailbreak attempts, employing a public benchmark dataset, and (2) PII leakage, supported by a curated dataset of labeled LLM interactions.
arXiv Detail & Related papers (2025-08-01T13:04:28Z) - VerificAgent: Domain-Specific Memory Verification for Scalable Oversight of Aligned Computer-Use Agents [0.17812428873698402]
Unvetted memories can drift from user intent and safety constraints.<n>We introduce VerificAgent, a scalable oversight framework for CUAs.<n>VerificAgent improves task reliability, reduces hallucination-induced failures, and preserves interpretable, auditable guidance.
arXiv Detail & Related papers (2025-06-03T07:25:49Z) - Wolf Hidden in Sheep's Conversations: Toward Harmless Data-Based Backdoor Attacks for Jailbreaking Large Language Models [81.44934796068495]
Supervised fine-tuning (SFT) aligns large language models with human intent by training them on labeled task-specific data.<n>Malicious attackers can inject backdoors into these models by embedding triggers into the harmful question-answer (QA) pairs.<n>We propose a novel textitclean-data backdoor attack for jailbreaking LLMs.
arXiv Detail & Related papers (2025-05-23T08:13:59Z) - Memory Injection Attacks on LLM Agents via Query-Only Interaction [49.14715983268449]
We propose a novel Memory INJection Attack, MINJA, without assuming that the attacker can directly modify the memory bank of the agent.<n>The attacker injects malicious records into the memory bank by only interacting with the agent via queries and output observations.<n> MINJA enables any user to influence agent memory, highlighting the risk.
arXiv Detail & Related papers (2025-03-05T17:53:24Z) - GuardAgent: Safeguard LLM Agents by a Guard Agent via Knowledge-Enabled Reasoning [79.07152553060601]
We propose GuardAgent, the first guardrail agent to protect target agents by dynamically checking whether their actions satisfy given safety guard requests.<n>Specifically, GuardAgent first analyzes the safety guard requests to generate a task plan, and then maps this plan into guardrail code for execution.<n>We show that GuardAgent effectively moderates the violation actions for different types of agents on two benchmarks with over 98% and 83% guardrail accuracies, respectively.
arXiv Detail & Related papers (2024-06-13T14:49:26Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.