Related papers: Unveiling Privacy Risks in LLM Agent Memory

Unveiling Privacy Risks in LLM Agent Memory

URL: http://arxiv.org/abs/2502.13172v1
Date: Mon, 17 Feb 2025 19:55:53 GMT
Title: Unveiling Privacy Risks in LLM Agent Memory
Authors: Bo Wang, Weiyi He, Pengfei He, Shenglai Zeng, Zhen Xiang, Yue Xing, Jiliang Tang,
Abstract summary: Large Language Model (LLM) agents have become increasingly prevalent across various real-world applications.<n>They enhance decision-making by storing private user-agent interactions in the memory module for demonstrations.<n>We propose a Memory EXTRaction Attack (MEXTRA) to extract private information from memory.
Score: 40.26158509307175
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Large Language Model (LLM) agents have become increasingly prevalent across various real-world applications. They enhance decision-making by storing private user-agent interactions in the memory module for demonstrations, introducing new privacy risks for LLM agents. In this work, we systematically investigate the vulnerability of LLM agents to our proposed Memory EXTRaction Attack (MEXTRA) under a black-box setting. To extract private information from memory, we propose an effective attacking prompt design and an automated prompt generation method based on different levels of knowledge about the LLM agent. Experiments on two representative agents demonstrate the effectiveness of MEXTRA. Moreover, we explore key factors influencing memory leakage from both the agent's and the attacker's perspectives. Our findings highlight the urgent need for effective memory safeguards in LLM agent design and deployment.

Related papers

Get the Agents Drunk: Memory Perturbations in Autonomous Agent-based Recommender Systems [29.35591074298123]
Large language model-based agents are increasingly used in recommender systems (Agent4RSs) to achieve personalized behavior modeling. To the best of our knowledge, how robust Agent4RSs are remains unexplored. We propose the first work to attack Agent4RSs by perturbing agents' memories, not only to uncover their limitations but also to enhance their security and robustness.
arXiv Detail & Related papers (2025-03-31T07:35:40Z)
A Practical Memory Injection Attack against LLM Agents [49.01756339657071]
MINJA enables the injection of malicious records into the memory bank by only interacting with the agent via queries and output observations. MINJA enables any user to influence agent memory, highlighting practical risks of LLM agents.
arXiv Detail & Related papers (2025-03-05T17:53:24Z)
Automating Prompt Leakage Attacks on Large Language Models Using Agentic Approach [9.483655213280738]
This paper presents a novel approach to evaluating the security of large language models (LLMs)<n>We define prompt leakage as a critical threat to secure LLM deployment.<n>We implement a multi-agent system where cooperative agents are tasked with probing and exploiting the target LLM to elicit its prompt.
arXiv Detail & Related papers (2025-02-18T08:17:32Z)
Towards Action Hijacking of Large Language Model-based Agent [39.19067800226033]
We introduce Name, a novel hijacking attack to manipulate the action plans of black-box agent system.<n>Our approach achieved an average bypass rate of 92.7% for safety filters.
arXiv Detail & Related papers (2024-12-14T12:11:26Z)
PrivAgent: Agentic-based Red-teaming for LLM Privacy Leakage [78.33839735526769]
LLMs may be fooled into outputting private information under carefully crafted adversarial prompts.<n>PrivAgent is a novel black-box red-teaming framework for privacy leakage.
arXiv Detail & Related papers (2024-12-07T20:09:01Z)
Imprompter: Tricking LLM Agents into Improper Tool Use [35.255462653237885]
Large Language Model (LLM) Agents are an emerging computing paradigm that blends generative machine learning with tools such as code interpreters, web browsing, email, and more generally, external resources. We contribute to the security foundations of agent-based systems and surface a new class of automatically computed obfuscated adversarial prompt attacks.
arXiv Detail & Related papers (2024-10-19T01:00:57Z)
AgentPoison: Red-teaming LLM Agents via Poisoning Memory or Knowledge Bases [73.04652687616286]
We propose AgentPoison, the first backdoor attack targeting generic and RAG-based LLM agents by poisoning their long-term memory or RAG knowledge base. Unlike conventional backdoor attacks, AgentPoison requires no additional model training or fine-tuning. On each agent, AgentPoison achieves an average attack success rate higher than 80% with minimal impact on benign performance.
arXiv Detail & Related papers (2024-07-17T17:59:47Z)
GuardAgent: Safeguard LLM Agents by a Guard Agent via Knowledge-Enabled Reasoning [79.07152553060601]
Existing methods for enhancing the safety of large language models (LLMs) are not directly transferable to LLM-powered agents. We propose GuardAgent, the first LLM agent as a guardrail to other LLM agents. GuardAgent comprises two steps: 1) creating a task plan by analyzing the provided guard requests, and 2) generating guardrail code based on the task plan and executing the code by calling APIs or using external engines.
arXiv Detail & Related papers (2024-06-13T14:49:26Z)
A Survey on the Memory Mechanism of Large Language Model based Agents [66.4963345269611]
Large language model (LLM) based agents have recently attracted much attention from the research and industry communities. LLM-based agents are featured in their self-evolving capability, which is the basis for solving real-world problems. The key component to support agent-environment interactions is the memory of the agents.
arXiv Detail & Related papers (2024-04-21T01:49:46Z)
Memory Sharing for Large Language Model based Agents [43.53494041932615]
This paper introduces the Memory Sharing, a framework which integrates the real-time memory filter, storage and retrieval to enhance the In-Context Learning process. The experimental results demonstrate that the MS framework significantly improves the agents' performance in addressing open-ended questions.
arXiv Detail & Related papers (2024-04-15T17:57:30Z)
Watch Out for Your Agents! Investigating Backdoor Threats to LLM-Based Agents [47.219047422240145]
We take the first step to investigate one of the typical safety threats, backdoor attack, to LLM-based agents. Specifically, compared with traditional backdoor attacks on LLMs that are only able to manipulate the user inputs and model outputs, agent backdoor attacks exhibit more diverse and covert forms.
arXiv Detail & Related papers (2024-02-17T06:48:45Z)
Do Embodied Agents Dream of Pixelated Sheep: Embodied Decision Making using Language Guided World Modelling [101.59430768507997]
Reinforcement learning (RL) agents typically learn tabula rasa, without prior knowledge of the world. We propose using few-shot large language models (LLMs) to hypothesize an Abstract World Model (AWM) Our method of hypothesizing an AWM with LLMs and then verifying the AWM based on agent experience not only increases sample efficiency over contemporary methods by an order of magnitude.
arXiv Detail & Related papers (2023-01-28T02:04:07Z)

This list is automatically generated from the titles and abstracts of the papers in this site.