Related papers: MemoryGraft: Persistent Compromise of LLM Agents via Poisoned Experience Retrieval

MemoryGraft: Persistent Compromise of LLM Agents via Poisoned Experience Retrieval

URL: http://arxiv.org/abs/2512.16962v1
Date: Thu, 18 Dec 2025 08:34:40 GMT
Title: MemoryGraft: Persistent Compromise of LLM Agents via Poisoned Experience Retrieval
Authors: Saksham Sahai Srivastava, Haoyu He,
Abstract summary: MemoryGraft is a novel indirect injection attack that compromises agent behavior not through immediate jailbreaks, but by implanting malicious successful experiences into the agent's long-term memory.<n>We demonstrate that an attacker who can supply benign ingestion-level artifacts that the agent reads during execution can induce it to construct a poisoned RAG store.<n>When the agent later encounters semantically similar tasks, union retrieval over lexical templates and embedding similarity reliably surfaces these grafted memories, and the agent adopts the embedded unsafe patterns, leading to persistent behavioral drift across sessions.
Score: 5.734678752740074
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Large Language Model (LLM) agents increasingly rely on long-term memory and Retrieval-Augmented Generation (RAG) to persist experiences and refine future performance. While this experience learning capability enhances agentic autonomy, it introduces a critical, unexplored attack surface, i.e., the trust boundary between an agent's reasoning core and its own past. In this paper, we introduce MemoryGraft. It is a novel indirect injection attack that compromises agent behavior not through immediate jailbreaks, but by implanting malicious successful experiences into the agent's long-term memory. Unlike traditional prompt injections that are transient, or standard RAG poisoning that targets factual knowledge, MemoryGraft exploits the agent's semantic imitation heuristic which is the tendency to replicate patterns from retrieved successful tasks. We demonstrate that an attacker who can supply benign ingestion-level artifacts that the agent reads during execution can induce it to construct a poisoned RAG store where a small set of malicious procedure templates is persisted alongside benign experiences. When the agent later encounters semantically similar tasks, union retrieval over lexical and embedding similarity reliably surfaces these grafted memories, and the agent adopts the embedded unsafe patterns, leading to persistent behavioral drift across sessions. We validate MemoryGraft on MetaGPT's DataInterpreter agent with GPT-4o and find that a small number of poisoned records can account for a large fraction of retrieved experiences on benign workloads, turning experience-based self-improvement into a vector for stealthy and durable compromise. To facilitate reproducibility and future research, our code and evaluation data are available at https://github.com/Jacobhhy/Agent-Memory-Poisoning.

Related papers

Zombie Agents: Persistent Control of Self-Evolving LLM Agents via Self-Reinforcing Injections [57.64370755825839]
Self-evolving agents update their internal state across sessions, often by writing and reusing long-term memory.<n>We study this risk and formalize a persistent attack we call a Zombie Agent.<n>We present a black-box attack framework that uses only indirect exposure through attacker-controlled web content.
arXiv Detail & Related papers (2026-02-17T15:28:24Z)
BackdoorAgent: A Unified Framework for Backdoor Attacks on LLM-based Agents [58.83028403414688]
Large language model (LLM) agents execute tasks through multi-step workflow that combine planning, memory, and tool use.<n>Backdoor triggers injected into specific stages of an agent workflow can persist through multiple intermediate states and adversely influence downstream outputs.<n>We propose textbfBackdoorAgent, a modular and stage-aware framework that provides a unified agent-centric view of backdoor threats in LLM agents.
arXiv Detail & Related papers (2026-01-08T03:49:39Z)
Reasoning-Style Poisoning of LLM Agents via Stealthy Style Transfer: Process-Level Attacks and Runtime Monitoring in RSV Space [4.699272847316498]
Reasoning-Style Poisoning (RSP) manipulates how agents process information rather than what they process.<n>Generative Style Injection (GSI) rewrites retrieved documents into pathological tones.<n>RSP-M is a lightweight runtime monitor that calculates RSV metrics in real-time and triggers alerts when values exceed safety thresholds.
arXiv Detail & Related papers (2025-12-16T14:34:10Z)
The Trojan Knowledge: Bypassing Commercial LLM Guardrails via Harmless Prompt Weaving and Adaptive Tree Search [58.8834056209347]
Large language models (LLMs) remain vulnerable to jailbreak attacks that bypass safety guardrails to elicit harmful outputs.<n>We introduce the Correlated Knowledge Attack Agent (CKA-Agent), a dynamic framework that reframes jailbreaking as an adaptive, tree-structured exploration of the target model's knowledge base.
arXiv Detail & Related papers (2025-12-01T07:05:23Z)
VerificAgent: Domain-Specific Memory Verification for Scalable Oversight of Aligned Computer-Use Agents [0.17812428873698402]
Unvetted memories can drift from user intent and safety constraints.<n>We introduce VerificAgent, a scalable oversight framework for CUAs.<n>VerificAgent improves task reliability, reduces hallucination-induced failures, and preserves interpretable, auditable guidance.
arXiv Detail & Related papers (2025-06-03T07:25:49Z)
How Memory Management Impacts LLM Agents: An Empirical Study of Experience-Following Behavior [65.70584076918679]
Memory is a critical component in large language model (LLM)-based agents.<n>This paper studies how memory management choices impact the LLM agents' behavior, especially their long-term performance.
arXiv Detail & Related papers (2025-05-21T22:35:01Z)
DrunkAgent: Stealthy Memory Corruption in LLM-Powered Recommender Agents [31.542621203252295]
Large language model (LLM)-powered agents are increasingly used in recommender systems (RSs)<n>This paper presents the first systematic investigation of memory-based vulnerabilities in LLM-powered recommender agents.<n>We propose a novel black-box attack framework named DrunkAgent.
arXiv Detail & Related papers (2025-03-31T07:35:40Z)
Memory Injection Attacks on LLM Agents via Query-Only Interaction [49.14715983268449]
We propose a novel Memory INJection Attack, MINJA, without assuming that the attacker can directly modify the memory bank of the agent.<n>The attacker injects malicious records into the memory bank by only interacting with the agent via queries and output observations.<n> MINJA enables any user to influence agent memory, highlighting the risk.
arXiv Detail & Related papers (2025-03-05T17:53:24Z)
AgentPoison: Red-teaming LLM Agents via Poisoning Memory or Knowledge Bases [73.04652687616286]
We propose AgentPoison, the first backdoor attack targeting generic and RAG-based LLM agents by poisoning their long-term memory or RAG knowledge base. Unlike conventional backdoor attacks, AgentPoison requires no additional model training or fine-tuning. On each agent, AgentPoison achieves an average attack success rate higher than 80% with minimal impact on benign performance.
arXiv Detail & Related papers (2024-07-17T17:59:47Z)
Dissecting Adversarial Robustness of Multimodal LM Agents [70.2077308846307]
We manually create 200 targeted adversarial tasks and evaluation scripts in a realistic threat model on top of VisualWebArena.<n>We find that we can successfully break latest agents that use black-box frontier LMs, including those that perform reflection and tree search.<n>We also use ARE to rigorously evaluate how the robustness changes as new components are added.
arXiv Detail & Related papers (2024-06-18T17:32:48Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.