Hiding in Plain Sight: Differential Privacy Noise Exploitation for
Evasion-resilient Localized Poisoning Attacks in Multiagent Reinforcement
Learning
- URL: http://arxiv.org/abs/2307.00268v2
- Date: Thu, 13 Jul 2023 03:18:15 GMT
- Title: Hiding in Plain Sight: Differential Privacy Noise Exploitation for
Evasion-resilient Localized Poisoning Attacks in Multiagent Reinforcement
Learning
- Authors: Md Tamjid Hossain, Hung La
- Abstract summary: differential privacy (DP) has been introduced in cooperative multiagent reinforcement learning (CMARL) to safeguard the agents' privacy against adversarial inference during knowledge sharing.
We present an adaptive, privacy-exploiting, and evasion-resilient localized poisoning attack (PeLPA) that capitalizes on the inherent DP-noise to circumvent anomaly detection systems.
- Score: 0.0
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Lately, differential privacy (DP) has been introduced in cooperative
multiagent reinforcement learning (CMARL) to safeguard the agents' privacy
against adversarial inference during knowledge sharing. Nevertheless, we argue
that the noise introduced by DP mechanisms may inadvertently give rise to a
novel poisoning threat, specifically in the context of private knowledge
sharing during CMARL, which remains unexplored in the literature. To address
this shortcoming, we present an adaptive, privacy-exploiting, and
evasion-resilient localized poisoning attack (PeLPA) that capitalizes on the
inherent DP-noise to circumvent anomaly detection systems and hinder the
optimal convergence of the CMARL model. We rigorously evaluate our proposed
PeLPA attack in diverse environments, encompassing both non-adversarial and
multiple-adversarial contexts. Our findings reveal that, in a medium-scale
environment, the PeLPA attack with attacker ratios of 20% and 40% can lead to
an increase in average steps to goal by 50.69% and 64.41%, respectively.
Furthermore, under similar conditions, PeLPA can result in a 1.4x and 1.6x
computational time increase in optimal reward attainment and a 1.18x and 1.38x
slower convergence for attacker ratios of 20% and 40%, respectively.
Related papers
- BadCLIP++: Stealthy and Persistent Backdoors in Multimodal Contrastive Learning [73.46118996284888]
Research on backdoor attacks against multimodal contrastive learning models faces two key challenges: stealthiness and persistence.<n>We propose BadCLIP++, a unified framework that tackles both challenges.<n>For stealthiness, we introduce a semantic-fusion QR micro-trigger that embeds imperceptible patterns near task-relevant regions.<n>For persistence, we stabilize trigger embeddings via radius shrinkage and centroid alignment.
arXiv Detail & Related papers (2026-02-19T08:31:16Z) - NLP Privacy Risk Identification in Social Media (NLP-PRISM): A Survey [9.47737368469032]
We propose the NLP Privacy Risk Identification in Social Media framework, which evaluates vulnerabilities across six dimensions.<n>Our analysis shows that transformer models achieve F1-scores ranging from 0.58-0.84, but incur a 1% - 23% drop under privacy-preserving fine-tuning.<n>We advocate for stronger anonymization, privacy-aware learning, and fairness-driven training to enable ethical NLP in social media contexts.
arXiv Detail & Related papers (2026-01-26T21:09:48Z) - AI Security Beyond Core Domains: Resume Screening as a Case Study of Adversarial Vulnerabilities in Specialized LLM Applications [71.27518152526686]
Large Language Models (LLMs) excel at text comprehension and generation, making them ideal for automated tasks like code review and content moderation.<n>LLMs can be manipulated by "adversarial instructions" hidden in input data, such as resumes or code, causing them to deviate from their intended task.<n>This paper introduces a benchmark to assess this vulnerability in resume screening, revealing attack success rates exceeding 80% for certain attack types.
arXiv Detail & Related papers (2025-12-23T08:42:09Z) - Quantifying Return on Security Controls in LLM Systems [0.0]
This paper introduces a decision-oriented framework to quantify residual risk.<n>It converts adversarial probe outcomes into financial risk estimates and return-on-control metrics.
arXiv Detail & Related papers (2025-12-17T04:58:09Z) - The Sum Leaks More Than Its Parts: Compositional Privacy Risks and Mitigations in Multi-Agent Collaboration [72.33801123508145]
Large language models (LLMs) are integral to multi-agent systems.<n>Privacy risks emerge that extend beyond memorization, direct inference, or single-turn evaluations.<n>In particular, seemingly innocuous responses, when composed across interactions, can cumulatively enable adversaries to recover sensitive information.
arXiv Detail & Related papers (2025-09-16T16:57:25Z) - FORTRESS: Frontier Risk Evaluation for National Security and Public Safety [5.544163262906087]
Current benchmarks often fail to test safeguard robustness to potential national security and public safety risks.<n>We introduce FORTRESS: 500 expert-crafted adversarial prompts with instance-based rubrics of 4-7 binary questions.<n>Each prompt-rubric pair has a corresponding benign version to test for model over-refusals.
arXiv Detail & Related papers (2025-06-17T19:08:02Z) - Trust Me, I Can Handle It: Self-Generated Adversarial Scenario Extrapolation for Robust Language Models [12.864404778567154]
Large Language Models (LLMs) exhibit impressive capabilities, but remain susceptible to a growing spectrum of safety risks.<n>Existing defenses often address only a single threat type or resort to rigid outright rejection.<n>This paper introduces Adrial Scenario Extrapolation (ASE), a novel inference-time framework that leverages Chain-of-Thought reasoning.
arXiv Detail & Related papers (2025-05-20T21:22:40Z) - Swallowing the Poison Pills: Insights from Vulnerability Disparity Among LLMs [3.7913442178940318]
Modern large language models (LLMs) exhibit critical vulnerabilities to poison pill attacks.
We demonstrate these attacks exploit inherent architectural properties of LLMs.
Our work establishes poison pills as both a security threat and diagnostic tool.
arXiv Detail & Related papers (2025-02-23T06:34:55Z) - GCP: Guarded Collaborative Perception with Spatial-Temporal Aware Malicious Agent Detection [11.336965062177722]
Collaborative perception is vulnerable to adversarial message attacks from malicious agents.
This paper reveals a novel blind area confusion (BAC) attack that compromises existing single-shot outlier-based detection methods.
We propose Guarded Collaborative Perception framework based on spatial-temporal aware malicious agent detection.
arXiv Detail & Related papers (2025-01-05T06:03:26Z) - Sub-optimal Learning in Meta-Classifier Attacks: A Study of Membership Inference on Differentially Private Location Aggregates [19.09251452596829]
We show that a significant gap exists between the expected attack accuracy given by DP and the empirical attack accuracy even with informed attackers.
We propose two new metric-based MIAs: the one-threshold attack and the two-threshold attack.
arXiv Detail & Related papers (2024-12-29T12:51:34Z) - CopyrightShield: Enhancing Diffusion Model Security against Copyright Infringement Attacks [61.06621533874629]
Diffusion models are vulnerable to copyright infringement attacks, where attackers inject strategically modified non-infringing images into the training set.<n>We first propose a defense framework, CopyrightShield, to defend against the above attack.<n> Experimental results demonstrate that CopyrightShield significantly improves poisoned sample detection performance across two attack scenarios.
arXiv Detail & Related papers (2024-12-02T14:19:44Z) - Criticality and Safety Margins for Reinforcement Learning [53.10194953873209]
We seek to define a criticality framework with both a quantifiable ground truth and a clear significance to users.
We introduce true criticality as the expected drop in reward when an agent deviates from its policy for n consecutive random actions.
We also introduce the concept of proxy criticality, a low-overhead metric that has a statistically monotonic relationship to true criticality.
arXiv Detail & Related papers (2024-09-26T21:00:45Z) - Membership Inference Attacks Against In-Context Learning [26.57639819629732]
We present the first membership inference attack tailored for In-Context Learning (ICL)
We propose four attack strategies tailored to various constrained scenarios.
We investigate three potential defenses targeting data, instruction, and output.
arXiv Detail & Related papers (2024-09-02T17:23:23Z) - AgentPoison: Red-teaming LLM Agents via Poisoning Memory or Knowledge Bases [73.04652687616286]
We propose AgentPoison, the first backdoor attack targeting generic and RAG-based LLM agents by poisoning their long-term memory or RAG knowledge base.
Unlike conventional backdoor attacks, AgentPoison requires no additional model training or fine-tuning.
On each agent, AgentPoison achieves an average attack success rate higher than 80% with minimal impact on benign performance.
arXiv Detail & Related papers (2024-07-17T17:59:47Z) - BEEAR: Embedding-based Adversarial Removal of Safety Backdoors in Instruction-tuned Language Models [57.5404308854535]
Safety backdoor attacks in large language models (LLMs) enable the stealthy triggering of unsafe behaviors while evading detection during normal interactions.
We present BEEAR, a mitigation approach leveraging the insight that backdoor triggers induce relatively uniform drifts in the model's embedding space.
Our bi-level optimization method identifies universal embedding perturbations that elicit unwanted behaviors and adjusts the model parameters to reinforce safe behaviors against these perturbations.
arXiv Detail & Related papers (2024-06-24T19:29:47Z) - Low-Cost Privacy-Aware Decentralized Learning [5.295018540083454]
This paper introduces ZIP-DL, a privacy-aware decentralized learning (DL) algorithm that exploits correlated noise to provide strong privacy protection against a local adversary.
We provide theoretical guarantees for both convergence speed and privacy guarantees, thereby making ZIP-DL applicable to practical scenarios.
arXiv Detail & Related papers (2024-03-18T13:53:17Z) - From Mean to Extreme: Formal Differential Privacy Bounds on the Success of Real-World Data Reconstruction Attacks [54.25638567385662]
Differential Privacy in machine learning is often interpreted as guarantees against membership inference.<n> translating DP budgets into quantitative protection against the more damaging threat of data reconstruction remains a challenging open problem.<n>This paper bridges the critical gap by deriving the first formal privacy bounds tailored to the mechanics of demonstrated "from-scratch" attacks.
arXiv Detail & Related papers (2024-02-20T09:52:30Z) - Malicious Agent Detection for Robust Multi-Agent Collaborative Perception [52.261231738242266]
Multi-agent collaborative (MAC) perception is more vulnerable to adversarial attacks than single-agent perception.
We propose Malicious Agent Detection (MADE), a reactive defense specific to MAC perception.
We conduct comprehensive evaluations on a benchmark 3D dataset V2X-sim and a real-road dataset DAIR-V2X.
arXiv Detail & Related papers (2023-10-18T11:36:42Z) - Toward Evaluating Robustness of Reinforcement Learning with Adversarial Policy [32.1138935956272]
Reinforcement learning agents are susceptible to evasion attacks during deployment.
In this paper, we propose Intrinsically Motivated Adrial Policy (IMAP) for efficient black-box adversarial policy learning.
arXiv Detail & Related papers (2023-05-04T07:24:12Z) - Safe Deployment for Counterfactual Learning to Rank with Exposure-Based
Risk Minimization [63.93275508300137]
We introduce a novel risk-aware Counterfactual Learning To Rank method with theoretical guarantees for safe deployment.
Our experimental results demonstrate the efficacy of our proposed method, which is effective at avoiding initial periods of bad performance when little data is available.
arXiv Detail & Related papers (2023-04-26T15:54:23Z) - A Risk-Sensitive Approach to Policy Optimization [21.684251937825234]
Standard deep reinforcement learning (DRL) aims to maximize expected reward, considering collected experiences equally in formulating a policy.
We propose a more direct approach whereby risk-sensitive objectives, specified in terms of the cumulative distribution function (CDF) of the distribution of full-episode rewards, are optimized.
We demonstrate that the use of moderately "pessimistic" risk profiles, which emphasize scenarios where the agent performs poorly, leads to enhanced exploration and a continual focus on addressing deficiencies.
arXiv Detail & Related papers (2022-08-19T00:55:05Z) - Policy Smoothing for Provably Robust Reinforcement Learning [109.90239627115336]
We study the provable robustness of reinforcement learning against norm-bounded adversarial perturbations of the inputs.
We generate certificates that guarantee that the total reward obtained by the smoothed policy will not fall below a certain threshold under a norm-bounded adversarial of perturbation the input.
arXiv Detail & Related papers (2021-06-21T21:42:08Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.