The Sum Leaks More Than Its Parts: Compositional Privacy Risks and Mitigations in Multi-Agent Collaboration
- URL: http://arxiv.org/abs/2509.14284v1
- Date: Tue, 16 Sep 2025 16:57:25 GMT
- Title: The Sum Leaks More Than Its Parts: Compositional Privacy Risks and Mitigations in Multi-Agent Collaboration
- Authors: Vaidehi Patil, Elias Stengel-Eskin, Mohit Bansal,
- Abstract summary: Large language models (LLMs) are integral to multi-agent systems.<n>Privacy risks emerge that extend beyond memorization, direct inference, or single-turn evaluations.<n>In particular, seemingly innocuous responses, when composed across interactions, can cumulatively enable adversaries to recover sensitive information.
- Score: 72.33801123508145
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: As large language models (LLMs) become integral to multi-agent systems, new privacy risks emerge that extend beyond memorization, direct inference, or single-turn evaluations. In particular, seemingly innocuous responses, when composed across interactions, can cumulatively enable adversaries to recover sensitive information, a phenomenon we term compositional privacy leakage. We present the first systematic study of such compositional privacy leaks and possible mitigation methods in multi-agent LLM systems. First, we develop a framework that models how auxiliary knowledge and agent interactions jointly amplify privacy risks, even when each response is benign in isolation. Next, to mitigate this, we propose and evaluate two defense strategies: (1) Theory-of-Mind defense (ToM), where defender agents infer a questioner's intent by anticipating how their outputs may be exploited by adversaries, and (2) Collaborative Consensus Defense (CoDef), where responder agents collaborate with peers who vote based on a shared aggregated state to restrict sensitive information spread. Crucially, we balance our evaluation across compositions that expose sensitive information and compositions that yield benign inferences. Our experiments quantify how these defense strategies differ in balancing the privacy-utility trade-off. We find that while chain-of-thought alone offers limited protection to leakage (~39% sensitive blocking rate), our ToM defense substantially improves sensitive query blocking (up to 97%) but can reduce benign task success. CoDef achieves the best balance, yielding the highest Balanced Outcome (79.8%), highlighting the benefit of combining explicit reasoning with defender collaboration. Together, our results expose a new class of risks in collaborative LLM deployments and provide actionable insights for designing safeguards against compositional, context-driven privacy leakage.
Related papers
- Contextualized Privacy Defense for LLM Agents [84.30907378390512]
LLM agents increasingly act on users' personal information, yet existing privacy defenses remain limited in both design and adaptability.<n>We propose Contextualized Defense Instructing (CDI), a new privacy defense paradigm.<n>We show that our CDI consistently achieves a better balance between privacy preservation (94.2%) and helpfulness (80.6%) than baselines.
arXiv Detail & Related papers (2026-03-03T13:35:33Z) - NeuroFilter: Privacy Guardrails for Conversational LLM Agents [50.75206727081996]
This work addresses the computational challenge of enforcing privacy for agentic Large Language Models (LLMs)<n>NeuroFilter is a guardrail framework that operationalizes contextual integrity by mapping norm violations to simple directions in the model's activation space.<n>A comprehensive evaluation across over 150,000 interactions, covering models from 7B to 70B parameters, illustrates the strong performance of NeuroFilter.
arXiv Detail & Related papers (2026-01-21T05:16:50Z) - MAGPIE: A benchmark for Multi-AGent contextual PrIvacy Evaluation [61.92403071137653]
Existing privacy benchmarks only focus on simplistic, single-turn interactions where private information can be trivially omitted without affecting task outcomes.<n>We introduce MAGPIE, a novel benchmark designed to evaluate privacy understanding and preservation in multi-agent collaborative, non-adversarial scenarios.<n>Our evaluation reveals that state-of-the-art agents, including GPT-5 and Gemini 2.5-Pro, exhibit significant privacy leakage.
arXiv Detail & Related papers (2025-10-16T23:12:12Z) - From Defender to Devil? Unintended Risk Interactions Induced by LLM Defenses [18.096213847353965]
Large Language Models (LLMs) have shown remarkable performance across various applications, but their deployment in sensitive domains raises significant concerns.<n>We take the first step in investigating unintended interactions caused by defenses in LLMs, focusing on the complex interplay between safety, fairness, and privacy.<n>We propose CrossRiskEval, a comprehensive evaluation framework to assess whether deploying a defense targeting one risk inadvertently affects others.
arXiv Detail & Related papers (2025-10-09T09:00:00Z) - AdvEvo-MARL: Shaping Internalized Safety through Adversarial Co-Evolution in Multi-Agent Reinforcement Learning [78.5751183537704]
AdvEvo-MARL is a co-evolutionary multi-agent reinforcement learning framework that internalizes safety into task agents.<n>Rather than relying on external guards, AdvEvo-MARL jointly optimize attackers and defenders.
arXiv Detail & Related papers (2025-10-02T02:06:30Z) - Privacy in Action: Towards Realistic Privacy Mitigation and Evaluation for LLM-Powered Agents [40.39717403627143]
We present PrivacyChecker, a model-agnostic, contextual integrity based mitigation approach.<n>We also introduce PrivacyLens-Live, transforming static benchmarks into dynamic MCP and A2A environments.<n>Our data and code will be made available at https://aka.ms/privacy_in_action.
arXiv Detail & Related papers (2025-09-22T08:19:06Z) - Beyond Jailbreaking: Auditing Contextual Privacy in LLM Agents [43.303548143175256]
This study proposes an auditing framework for conversational privacy that quantifies an agent's susceptibility to risks.<n>The proposed Conversational Manipulation for Privacy Leakage (CMPL) framework is designed to stress-test agents that enforce strict privacy directives.
arXiv Detail & Related papers (2025-06-11T20:47:37Z) - Illusions of Relevance: Using Content Injection Attacks to Deceive Retrievers, Rerankers, and LLM Judges [52.96987928118327]
We find that embedding models for retrieval, rerankers, and large language model (LLM) relevance judges are vulnerable to content injection attacks.<n>We identify two primary threats: (1) inserting unrelated or harmful content within passages that still appear deceptively "relevant", and (2) inserting entire queries or key query terms into passages to boost their perceived relevance.<n>Our study systematically examines the factors that influence an attack's success, such as the placement of injected content and the balance between relevant and non-relevant material.
arXiv Detail & Related papers (2025-01-30T18:02:15Z) - Prompt Leakage effect and defense strategies for multi-turn LLM interactions [95.33778028192593]
Leakage of system prompts may compromise intellectual property and act as adversarial reconnaissance for an attacker.
We design a unique threat model which leverages the LLM sycophancy effect and elevates the average attack success rate (ASR) from 17.7% to 86.2% in a multi-turn setting.
We measure the mitigation effect of 7 black-box defense strategies, along with finetuning an open-source model to defend against leakage attempts.
arXiv Detail & Related papers (2024-04-24T23:39:58Z) - SUB-PLAY: Adversarial Policies against Partially Observed Multi-Agent Reinforcement Learning Systems [40.91476827978885]
Attackers can rapidly exploit the victim's vulnerabilities, generating adversarial policies that result in the failure of specific tasks.
We propose a novel black-box attack ( SUB-PLAY) that incorporates the concept of constructing multiple subgames to mitigate the impact of partial observability.
We evaluate three potential defenses aimed at exploring ways to mitigate security threats posed by adversarial policies.
arXiv Detail & Related papers (2024-02-06T06:18:16Z) - GaitGuard: Towards Private Gait in Mixed Reality [3.2392550445029396]
We present GaitGuard, a real-time system that protects gait privacy against video-based gait extraction attacks in MR environments.<n>We compare and combine multiple mitigation techniques, offering guidance to navigate the privacy-utility tradeoff.
arXiv Detail & Related papers (2023-12-07T17:42:04Z) - AnonPSI: An Anonymity Assessment Framework for PSI [5.301888664281537]
Private Set Intersection (PSI) is a protocol that enables two parties to securely compute a function over the intersected part of their shared datasets.
Recent studies have highlighted its vulnerability to Set Membership Inference Attacks (SMIA)
This paper explores the evaluation of anonymity within the PSI context.
arXiv Detail & Related papers (2023-11-29T22:13:53Z) - Attacking Cooperative Multi-Agent Reinforcement Learning by Adversarial Minority Influence [41.14664289570607]
Adrial Minority Influence (AMI) is a practical black-box attack and can be launched without knowing victim parameters.
AMI is also strong by considering the complex multi-agent interaction and the cooperative goal of agents.
We achieve the first successful attack against real-world robot swarms and effectively fool agents in simulated environments into collectively worst-case scenarios.
arXiv Detail & Related papers (2023-02-07T08:54:37Z) - Is Vertical Logistic Regression Privacy-Preserving? A Comprehensive
Privacy Analysis and Beyond [57.10914865054868]
We consider vertical logistic regression (VLR) trained with mini-batch descent gradient.
We provide a comprehensive and rigorous privacy analysis of VLR in a class of open-source Federated Learning frameworks.
arXiv Detail & Related papers (2022-07-19T05:47:30Z) - PP-MARL: Efficient Privacy-Preserving Multi-Agent Reinforcement Learning for Cooperative Intelligence in Communications [15.955599283219298]
Multi-agent reinforcement learning (MARL) is a popular approach for achieving cooperative intelligence (CI) in communication problems.<n> Ensuring privacy protection for MARL is a challenging task because of the presence of heterogeneous agents that learn interdependently via sharing information.<n>We propose PP-MARL, an efficient privacy-preserving learning scheme for MARL.
arXiv Detail & Related papers (2022-04-26T04:08:27Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.