A Practical Memory Injection Attack against LLM Agents
- URL: http://arxiv.org/abs/2503.03704v2
- Date: Fri, 07 Mar 2025 03:49:03 GMT
- Title: A Practical Memory Injection Attack against LLM Agents
- Authors: Shen Dong, Shaochen Xu, Pengfei He, Yige Li, Jiliang Tang, Tianming Liu, Hui Liu, Zhen Xiang,
- Abstract summary: MINJA enables the injection of malicious records into the memory bank by only interacting with the agent via queries and output observations.<n> MINJA enables any user to influence agent memory, highlighting practical risks of LLM agents.
- Score: 49.01756339657071
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Agents based on large language models (LLMs) have demonstrated strong capabilities in a wide range of complex, real-world applications. However, LLM agents with a compromised memory bank may easily produce harmful outputs when the past records retrieved for demonstration are malicious. In this paper, we propose a novel Memory INJection Attack, MINJA, that enables the injection of malicious records into the memory bank by only interacting with the agent via queries and output observations. These malicious records are designed to elicit a sequence of malicious reasoning steps leading to undesirable agent actions when executing the victim user's query. Specifically, we introduce a sequence of bridging steps to link the victim query to the malicious reasoning steps. During the injection of the malicious record, we propose an indication prompt to guide the agent to autonomously generate our designed bridging steps. We also propose a progressive shortening strategy that gradually removes the indication prompt, such that the malicious record will be easily retrieved when processing the victim query comes after. Our extensive experiments across diverse agents demonstrate the effectiveness of MINJA in compromising agent memory. With minimal requirements for execution, MINJA enables any user to influence agent memory, highlighting practical risks of LLM agents.
Related papers
- Robustness via Referencing: Defending against Prompt Injection Attacks by Referencing the Executed Instruction [68.6543680065379]
Large language models (LLMs) are vulnerable to prompt injection attacks.
We propose a novel defense method that leverages, rather than suppresses, the instruction-following abilities of LLMs.
arXiv Detail & Related papers (2025-04-29T07:13:53Z) - UDora: A Unified Red Teaming Framework against LLM Agents by Dynamically Hijacking Their Own Reasoning [17.448966928905733]
Large Language Model (LLM) agents equipped with external tools have become increasingly powerful for handling complex tasks.<n>We present UDora, a unified red teaming framework designed for LLM Agents that dynamically leverages the agent's own reasoning processes to compel it toward malicious behavior.
arXiv Detail & Related papers (2025-02-28T21:30:28Z) - Unveiling Privacy Risks in LLM Agent Memory [40.26158509307175]
Large Language Model (LLM) agents have become increasingly prevalent across various real-world applications.<n>They enhance decision-making by storing private user-agent interactions in the memory module for demonstrations.<n>We propose a Memory EXTRaction Attack (MEXTRA) to extract private information from memory.
arXiv Detail & Related papers (2025-02-17T19:55:53Z) - MELON: Indirect Prompt Injection Defense via Masked Re-execution and Tool Comparison [60.30753230776882]
LLM agents are vulnerable to indirect prompt injection (IPI) attacks.<n>We present MELON, a novel IPI defense.<n>We show that MELON outperforms SOTA defenses in both attack prevention and utility preservation.
arXiv Detail & Related papers (2025-02-07T18:57:49Z) - Towards Action Hijacking of Large Language Model-based Agent [39.19067800226033]
We introduce Name, a novel hijacking attack to manipulate the action plans of black-box agent system.<n>Our approach achieved an average bypass rate of 92.7% for safety filters.
arXiv Detail & Related papers (2024-12-14T12:11:26Z) - Breaking ReAct Agents: Foot-in-the-Door Attack Will Get You In [5.65782619470663]
We examine how ReAct agents can be exploited using a straightforward yet effective method we refer to as the foot-in-the-door attack.
Our experiments show that indirect prompt injection attacks can significantly increase the likelihood of the agent performing subsequent malicious actions.
To mitigate this vulnerability, we propose implementing a simple reflection mechanism that prompts the agent to reassess the safety of its actions during execution.
arXiv Detail & Related papers (2024-10-22T12:24:41Z) - AgentHarm: A Benchmark for Measuring Harmfulness of LLM Agents [84.96249955105777]
LLM agents may pose a greater risk if misused, but their robustness remains underexplored.
We propose a new benchmark called AgentHarm to facilitate research on LLM agent misuse.
We find leading LLMs are surprisingly compliant with malicious agent requests without jailbreaking.
arXiv Detail & Related papers (2024-10-11T17:39:22Z) - SecAlign: Defending Against Prompt Injection with Preference Optimization [52.48001255555192]
Adrial prompts can be injected into external data sources to override the system's intended instruction and execute a malicious instruction.<n>We propose a new defense called SecAlign based on the technique of preference optimization.<n>Our method reduces the success rates of various prompt injections to around 0%, even against attacks much more sophisticated than ones seen during training.
arXiv Detail & Related papers (2024-10-07T19:34:35Z) - Human-Interpretable Adversarial Prompt Attack on Large Language Models with Situational Context [49.13497493053742]
This research explores converting a nonsensical suffix attack into a sensible prompt via a situation-driven contextual re-writing.
We combine an independent, meaningful adversarial insertion and situations derived from movies to check if this can trick an LLM.
Our approach demonstrates that a successful situation-driven attack can be executed on both open-source and proprietary LLMs.
arXiv Detail & Related papers (2024-07-19T19:47:26Z) - AgentPoison: Red-teaming LLM Agents via Poisoning Memory or Knowledge Bases [73.04652687616286]
We propose AgentPoison, the first backdoor attack targeting generic and RAG-based LLM agents by poisoning their long-term memory or RAG knowledge base.
Unlike conventional backdoor attacks, AgentPoison requires no additional model training or fine-tuning.
On each agent, AgentPoison achieves an average attack success rate higher than 80% with minimal impact on benign performance.
arXiv Detail & Related papers (2024-07-17T17:59:47Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.