BadRL: Sparse Targeted Backdoor Attack Against Reinforcement Learning
- URL: http://arxiv.org/abs/2312.12585v1
- Date: Tue, 19 Dec 2023 20:29:29 GMT
- Title: BadRL: Sparse Targeted Backdoor Attack Against Reinforcement Learning
- Authors: Jing Cui, Yufei Han, Yuzhe Ma, Jianbin Jiao, Junge Zhang
- Abstract summary: Backdoor attacks in reinforcement learning (RL) have previously employed intense attack strategies to ensure attack success.
In this work, we propose a novel approach, BadRL, which focuses on conducting highly sparse backdoor poisoning efforts during training and testing.
Our algorithm, BadRL, strategically chooses state observations with high attack values to inject triggers during training and testing, thereby reducing the chances of detection.
- Score: 37.19070609394519
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Backdoor attacks in reinforcement learning (RL) have previously employed
intense attack strategies to ensure attack success. However, these methods
suffer from high attack costs and increased detectability. In this work, we
propose a novel approach, BadRL, which focuses on conducting highly sparse
backdoor poisoning efforts during training and testing while maintaining
successful attacks. Our algorithm, BadRL, strategically chooses state
observations with high attack values to inject triggers during training and
testing, thereby reducing the chances of detection. In contrast to the previous
methods that utilize sample-agnostic trigger patterns, BadRL dynamically
generates distinct trigger patterns based on targeted state observations,
thereby enhancing its effectiveness. Theoretical analysis shows that the
targeted backdoor attack is always viable and remains stealthy under specific
assumptions. Empirical results on various classic RL tasks illustrate that
BadRL can substantially degrade the performance of a victim agent with minimal
poisoning efforts 0.003% of total training steps) during training and
infrequent attacks during testing.
Related papers
- Reliable Poisoned Sample Detection against Backdoor Attacks Enhanced by Sharpness Aware Minimization [38.957943962546864]
We propose to train one model using the Sharpness-Aware Minimization (SAM) algorithm, rather than the vanilla training algorithm.
Extensive experiments on several benchmark datasets show the reliable detection performance of the proposed method against both weak and strong backdoor attacks.
arXiv Detail & Related papers (2024-11-18T12:35:08Z) - Efficient Backdoor Defense in Multimodal Contrastive Learning: A Token-Level Unlearning Method for Mitigating Threats [52.94388672185062]
We propose an efficient defense mechanism against backdoor threats using a concept known as machine unlearning.
This entails strategically creating a small set of poisoned samples to aid the model's rapid unlearning of backdoor vulnerabilities.
In the backdoor unlearning process, we present a novel token-based portion unlearning training regime.
arXiv Detail & Related papers (2024-09-29T02:55:38Z) - Revisiting Backdoor Attacks against Large Vision-Language Models [76.42014292255944]
This paper empirically examines the generalizability of backdoor attacks during the instruction tuning of LVLMs.
We modify existing backdoor attacks based on the above key observations.
This paper underscores that even simple traditional backdoor strategies pose a serious threat to LVLMs.
arXiv Detail & Related papers (2024-06-27T02:31:03Z) - SleeperNets: Universal Backdoor Poisoning Attacks Against Reinforcement Learning Agents [16.350898218047405]
Reinforcement learning (RL) is an actively growing field that is seeing increased usage in real-world, safety-critical applications.
In this work we explore a particularly stealthy form of training-time attacks against RL -- backdoor poisoning.
We formulate a novel poisoning attack framework which interlinks the adversary's objectives with those of finding an optimal policy.
arXiv Detail & Related papers (2024-05-30T23:31:25Z) - SEEP: Training Dynamics Grounds Latent Representation Search for Mitigating Backdoor Poisoning Attacks [53.28390057407576]
Modern NLP models are often trained on public datasets drawn from diverse sources.
Data poisoning attacks can manipulate the model's behavior in ways engineered by the attacker.
Several strategies have been proposed to mitigate the risks associated with backdoor attacks.
arXiv Detail & Related papers (2024-05-19T14:50:09Z) - Confidence-driven Sampling for Backdoor Attacks [49.72680157684523]
Backdoor attacks aim to surreptitiously insert malicious triggers into DNN models, granting unauthorized control during testing scenarios.
Existing methods lack robustness against defense strategies and predominantly focus on enhancing trigger stealthiness while randomly selecting poisoned samples.
We introduce a straightforward yet highly effective sampling methodology that leverages confidence scores. Specifically, it selects samples with lower confidence scores, significantly increasing the challenge for defenders in identifying and countering these attacks.
arXiv Detail & Related papers (2023-10-08T18:57:36Z) - Efficient Adversarial Training without Attacking: Worst-Case-Aware
Robust Reinforcement Learning [14.702446153750497]
Worst-case-aware Robust RL (WocaR-RL) is a robust training framework for deep reinforcement learning.
We show that WocaR-RL achieves state-of-the-art performance under various strong attacks.
arXiv Detail & Related papers (2022-10-12T05:24:46Z) - Sampling Attacks on Meta Reinforcement Learning: A Minimax Formulation
and Complexity Analysis [20.11993437283895]
This paper provides a game-theoretical underpinning for understanding this type of security risk.
We define the sampling attack model as a Stackelberg game between the attacker and the agent, which yields a minimax formulation.
We observe that a minor effort of the attacker can significantly deteriorate the learning performance.
arXiv Detail & Related papers (2022-07-29T21:29:29Z) - Robust Deep Reinforcement Learning through Adversarial Loss [74.20501663956604]
Recent studies have shown that deep reinforcement learning agents are vulnerable to small adversarial perturbations on the agent's inputs.
We propose RADIAL-RL, a principled framework to train reinforcement learning agents with improved robustness against adversarial attacks.
arXiv Detail & Related papers (2020-08-05T07:49:42Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.