BACKDOORL: Backdoor Attack against Competitive Reinforcement Learning
- URL: http://arxiv.org/abs/2105.00579v1
- Date: Sun, 2 May 2021 23:47:55 GMT
- Title: BACKDOORL: Backdoor Attack against Competitive Reinforcement Learning
- Authors: Lun Wang, Zaynah Javed, Xian Wu, Wenbo Guo, Xinyu Xing, Dawn Song
- Abstract summary: We migrate backdoor attacks to more complex RL systems involving multiple agents.
As a proof of concept, we demonstrate that an adversary agent can trigger the backdoor of the victim agent with its own action.
The results show that when the backdoor is activated, the winning rate of the victim drops by 17% to 37% compared to when not activated.
- Score: 80.99426477001619
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Recent research has confirmed the feasibility of backdoor attacks in deep
reinforcement learning (RL) systems. However, the existing attacks require the
ability to arbitrarily modify an agent's observation, constraining the
application scope to simple RL systems such as Atari games. In this paper, we
migrate backdoor attacks to more complex RL systems involving multiple agents
and explore the possibility of triggering the backdoor without directly
manipulating the agent's observation. As a proof of concept, we demonstrate
that an adversary agent can trigger the backdoor of the victim agent with its
own action in two-player competitive RL systems. We prototype and evaluate
BACKDOORL in four competitive environments. The results show that when the
backdoor is activated, the winning rate of the victim drops by 17% to 37%
compared to when not activated.
Related papers
- A Spatiotemporal Stealthy Backdoor Attack against Cooperative Multi-Agent Deep Reinforcement Learning [12.535344011523897]
cooperative multi-agent deep reinforcement learning (c-MADRL) is under the threat of backdoor attacks.
We propose a novel backdoor attack against c-MADRL, which attacks entire multi-agent team by embedding backdoor only in one agent.
Our backdoor attacks are able to reach a high attack success rate (91.6%) while maintaining a low clean performance variance rate (3.7%)
arXiv Detail & Related papers (2024-09-12T06:17:37Z) - Recover Triggered States: Protect Model Against Backdoor Attack in
Reinforcement Learning [23.94769537680776]
A backdoor attack allows a malicious user to manipulate the environment or corrupt the training data, thus inserting a backdoor into the trained agent.
This paper proposes the Recovery Triggered States (RTS) method, a novel approach that effectively protects the victim agents from backdoor attacks.
arXiv Detail & Related papers (2023-04-01T08:00:32Z) - BATT: Backdoor Attack with Transformation-based Triggers [72.61840273364311]
Deep neural networks (DNNs) are vulnerable to backdoor attacks.
Backdoor adversaries inject hidden backdoors that can be activated by adversary-specified trigger patterns.
One recent research revealed that most of the existing attacks failed in the real physical world.
arXiv Detail & Related papers (2022-11-02T16:03:43Z) - On the Effectiveness of Adversarial Training against Backdoor Attacks [111.8963365326168]
A backdoored model always predicts a target class in the presence of a predefined trigger pattern.
In general, adversarial training is believed to defend against backdoor attacks.
We propose a hybrid strategy which provides satisfactory robustness across different backdoor attacks.
arXiv Detail & Related papers (2022-02-22T02:24:46Z) - PolicyCleanse: Backdoor Detection and Mitigation in Reinforcement
Learning [19.524789009088245]
We propose the problem of Backdoor Detection in a multi-agent competitive reinforcement learning system.
PolicyCleanse is based on the property that the activated Trojan agents accumulated rewards degrade noticeably after several timesteps.
Along with PolicyCleanse, we also design a machine unlearning-based approach that can effectively mitigate the detected backdoor.
arXiv Detail & Related papers (2022-02-08T02:49:09Z) - Backdoors Stuck At The Frontdoor: Multi-Agent Backdoor Attacks That
Backfire [8.782809316491948]
We investigate a multi-agent backdoor attack scenario, where multiple attackers attempt to backdoor a victim model simultaneously.
A consistent backfiring phenomenon is observed across a wide range of games, where agents suffer from a low collective attack success rate.
The results motivate the re-evaluation of backdoor defense research for practical environments.
arXiv Detail & Related papers (2022-01-28T16:11:40Z) - Widen The Backdoor To Let More Attackers In [24.540853975732922]
We investigate the scenario of a multi-agent backdoor attack, where multiple non-colluding attackers craft and insert triggered samples in a shared dataset.
We discover a clear backfiring phenomenon: increasing the number of attackers shrinks each attacker's attack success rate.
We then exploit this phenomenon to minimize the collective ASR of attackers and maximize defender's robustness accuracy.
arXiv Detail & Related papers (2021-10-09T13:53:57Z) - Check Your Other Door! Establishing Backdoor Attacks in the Frequency
Domain [80.24811082454367]
We show the advantages of utilizing the frequency domain for establishing undetectable and powerful backdoor attacks.
We also show two possible defences that succeed against frequency-based backdoor attacks and possible ways for the attacker to bypass them.
arXiv Detail & Related papers (2021-09-12T12:44:52Z) - Explore and Control with Adversarial Surprise [78.41972292110967]
Reinforcement learning (RL) provides a framework for learning goal-directed policies given user-specified rewards.
We propose a new unsupervised RL technique based on an adversarial game which pits two policies against each other to compete over the amount of surprise an RL agent experiences.
We show that our method leads to the emergence of complex skills by exhibiting clear phase transitions.
arXiv Detail & Related papers (2021-07-12T17:58:40Z) - Turn the Combination Lock: Learnable Textual Backdoor Attacks via Word
Substitution [57.51117978504175]
Recent studies show that neural natural language processing (NLP) models are vulnerable to backdoor attacks.
Injected with backdoors, models perform normally on benign examples but produce attacker-specified predictions when the backdoor is activated.
We present invisible backdoors that are activated by a learnable combination of word substitution.
arXiv Detail & Related papers (2021-06-11T13:03:17Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.