Zombie Agents: Persistent Control of Self-Evolving LLM Agents via Self-Reinforcing Injections
- URL: http://arxiv.org/abs/2602.15654v1
- Date: Tue, 17 Feb 2026 15:28:24 GMT
- Title: Zombie Agents: Persistent Control of Self-Evolving LLM Agents via Self-Reinforcing Injections
- Authors: Xianglin Yang, Yufei He, Shuo Ji, Bryan Hooi, Jin Song Dong,
- Abstract summary: Self-evolving agents update their internal state across sessions, often by writing and reusing long-term memory.<n>We study this risk and formalize a persistent attack we call a Zombie Agent.<n>We present a black-box attack framework that uses only indirect exposure through attacker-controlled web content.
- Score: 57.64370755825839
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Self-evolving LLM agents update their internal state across sessions, often by writing and reusing long-term memory. This design improves performance on long-horizon tasks but creates a security risk: untrusted external content observed during a benign session can be stored as memory and later treated as instruction. We study this risk and formalize a persistent attack we call a Zombie Agent, where an attacker covertly implants a payload that survives across sessions, effectively turning the agent into a puppet of the attacker. We present a black-box attack framework that uses only indirect exposure through attacker-controlled web content. The attack has two phases. During infection, the agent reads a poisoned source while completing a benign task and writes the payload into long-term memory through its normal update process. During trigger, the payload is retrieved or carried forward and causes unauthorized tool behavior. We design mechanism-specific persistence strategies for common memory implementations, including sliding-window and retrieval-augmented memory, to resist truncation and relevance filtering. We evaluate the attack on representative agent setups and tasks, measuring both persistence over time and the ability to induce unauthorized actions while preserving benign task quality. Our results show that memory evolution can convert one-time indirect injection into persistent compromise, which suggests that defenses focused only on per-session prompt filtering are not sufficient for self-evolving agents.
Related papers
- Agentic AI as a Cybersecurity Attack Surface: Threats, Exploits, and Defenses in Runtime Supply Chains [7.8562769948743965]
Agentic systems built on large language models (LLMs) extend beyond text generation to autonomously retrieve information and invoke tools.<n>This runtime execution model shifts the attack surface from build-time artifacts to inference-time dependencies, exposing agents to manipulation through untrusted data and probabilistic capability resolution.<n>We systematize these risks within a unified runtime framework, categorizing threats into data supply chain attacks (transient context injection and persistent memory poisoning)<n>We also identify the Viral Agent Loop, in which agents act as vectors for self-propagating generative worms without exploiting code-level flaws.
arXiv Detail & Related papers (2026-02-23T06:57:57Z) - SkillJect: Automating Stealthy Skill-Based Prompt Injection for Coding Agents with Trace-Driven Closed-Loop Refinement [120.52289344734415]
We propose an automated framework for stealthy prompt injection tailored to agent skills.<n>The framework forms a closed loop with three agents: an Attack Agent that synthesizes injection skills under explicit stealth constraints, a Code Agent that executes tasks using the injected skills and an Evaluate Agent that logs action traces.<n>Our method consistently achieves high attack success rates under realistic settings.
arXiv Detail & Related papers (2026-02-15T16:09:48Z) - AgentSys: Secure and Dynamic LLM Agents Through Explicit Hierarchical Memory Management [47.49917373646469]
Existing defenses treat bloated memory as given and focus on remaining resilient.<n>We present AgentSys, a framework that defends against indirect prompt injection through explicit memory management.
arXiv Detail & Related papers (2026-02-07T06:28:51Z) - BackdoorAgent: A Unified Framework for Backdoor Attacks on LLM-based Agents [58.83028403414688]
Large language model (LLM) agents execute tasks through multi-step workflow that combine planning, memory, and tool use.<n>Backdoor triggers injected into specific stages of an agent workflow can persist through multiple intermediate states and adversely influence downstream outputs.<n>We propose textbfBackdoorAgent, a modular and stage-aware framework that provides a unified agent-centric view of backdoor threats in LLM agents.
arXiv Detail & Related papers (2026-01-08T03:49:39Z) - MemoryGraft: Persistent Compromise of LLM Agents via Poisoned Experience Retrieval [5.734678752740074]
MemoryGraft is a novel indirect injection attack that compromises agent behavior not through immediate jailbreaks, but by implanting malicious successful experiences into the agent's long-term memory.<n>We demonstrate that an attacker who can supply benign ingestion-level artifacts that the agent reads during execution can induce it to construct a poisoned RAG store.<n>When the agent later encounters semantically similar tasks, union retrieval over lexical templates and embedding similarity reliably surfaces these grafted memories, and the agent adopts the embedded unsafe patterns, leading to persistent behavioral drift across sessions.
arXiv Detail & Related papers (2025-12-18T08:34:40Z) - Reasoning-Style Poisoning of LLM Agents via Stealthy Style Transfer: Process-Level Attacks and Runtime Monitoring in RSV Space [4.699272847316498]
Reasoning-Style Poisoning (RSP) manipulates how agents process information rather than what they process.<n>Generative Style Injection (GSI) rewrites retrieved documents into pathological tones.<n>RSP-M is a lightweight runtime monitor that calculates RSV metrics in real-time and triggers alerts when values exceed safety thresholds.
arXiv Detail & Related papers (2025-12-16T14:34:10Z) - Malice in Agentland: Down the Rabbit Hole of Backdoors in the AI Supply Chain [82.98626829232899]
Fine-tuning AI agents on data from their own interactions introduces a critical security vulnerability within the AI supply chain.<n>We show that adversaries can easily poison the data collection pipeline to embed hard-to-detect backdoors.
arXiv Detail & Related papers (2025-10-03T12:47:21Z) - Your Agent Can Defend Itself against Backdoor Attacks [0.0]
Large language model (LLM)-powered agents face significant security risks from backdoor attacks during training and fine-tuning.<n>We present ReAgent, a novel defense against a range of backdoor attacks on LLM-based agents.
arXiv Detail & Related papers (2025-06-10T01:45:56Z) - AgentPoison: Red-teaming LLM Agents via Poisoning Memory or Knowledge Bases [73.04652687616286]
We propose AgentPoison, the first backdoor attack targeting generic and RAG-based LLM agents by poisoning their long-term memory or RAG knowledge base.
Unlike conventional backdoor attacks, AgentPoison requires no additional model training or fine-tuning.
On each agent, AgentPoison achieves an average attack success rate higher than 80% with minimal impact on benign performance.
arXiv Detail & Related papers (2024-07-17T17:59:47Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.