Backdoors Stuck At The Frontdoor: Multi-Agent Backdoor Attacks That
Backfire
- URL: http://arxiv.org/abs/2201.12211v1
- Date: Fri, 28 Jan 2022 16:11:40 GMT
- Title: Backdoors Stuck At The Frontdoor: Multi-Agent Backdoor Attacks That
Backfire
- Authors: Siddhartha Datta, Nigel Shadbolt
- Abstract summary: We investigate a multi-agent backdoor attack scenario, where multiple attackers attempt to backdoor a victim model simultaneously.
A consistent backfiring phenomenon is observed across a wide range of games, where agents suffer from a low collective attack success rate.
The results motivate the re-evaluation of backdoor defense research for practical environments.
- Score: 8.782809316491948
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Malicious agents in collaborative learning and outsourced data collection
threaten the training of clean models. Backdoor attacks, where an attacker
poisons a model during training to successfully achieve targeted
misclassification, are a major concern to train-time robustness. In this paper,
we investigate a multi-agent backdoor attack scenario, where multiple attackers
attempt to backdoor a victim model simultaneously. A consistent backfiring
phenomenon is observed across a wide range of games, where agents suffer from a
low collective attack success rate. We examine different modes of backdoor
attack configurations, non-cooperation / cooperation, joint distribution
shifts, and game setups to return an equilibrium attack success rate at the
lower bound. The results motivate the re-evaluation of backdoor defense
research for practical environments.
Related papers
- Act in Collusion: A Persistent Distributed Multi-Target Backdoor in Federated Learning [5.91728247370845]
Federated learning is vulnerable to backdoor attacks due to its distributed nature.
We propose a more practical threat model for federated learning: the distributed multi-target backdoor.
We show that 30 rounds after the attack, Attack Success rates of three different backdoors from various clients remain above 93%.
arXiv Detail & Related papers (2024-11-06T13:57:53Z) - Efficient Backdoor Defense in Multimodal Contrastive Learning: A Token-Level Unlearning Method for Mitigating Threats [52.94388672185062]
We propose an efficient defense mechanism against backdoor threats using a concept known as machine unlearning.
This entails strategically creating a small set of poisoned samples to aid the model's rapid unlearning of backdoor vulnerabilities.
In the backdoor unlearning process, we present a novel token-based portion unlearning training regime.
arXiv Detail & Related papers (2024-09-29T02:55:38Z) - A Spatiotemporal Stealthy Backdoor Attack against Cooperative Multi-Agent Deep Reinforcement Learning [12.535344011523897]
cooperative multi-agent deep reinforcement learning (c-MADRL) is under the threat of backdoor attacks.
We propose a novel backdoor attack against c-MADRL, which attacks entire multi-agent team by embedding backdoor only in one agent.
Our backdoor attacks are able to reach a high attack success rate (91.6%) while maintaining a low clean performance variance rate (3.7%)
arXiv Detail & Related papers (2024-09-12T06:17:37Z) - Mitigating Backdoor Attack by Injecting Proactive Defensive Backdoor [63.84477483795964]
Data-poisoning backdoor attacks are serious security threats to machine learning models.
In this paper, we focus on in-training backdoor defense, aiming to train a clean model even when the dataset may be potentially poisoned.
We propose a novel defense approach called PDB (Proactive Defensive Backdoor)
arXiv Detail & Related papers (2024-05-25T07:52:26Z) - SEEP: Training Dynamics Grounds Latent Representation Search for Mitigating Backdoor Poisoning Attacks [53.28390057407576]
Modern NLP models are often trained on public datasets drawn from diverse sources.
Data poisoning attacks can manipulate the model's behavior in ways engineered by the attacker.
Several strategies have been proposed to mitigate the risks associated with backdoor attacks.
arXiv Detail & Related papers (2024-05-19T14:50:09Z) - Rethinking Backdoor Attacks [122.1008188058615]
In a backdoor attack, an adversary inserts maliciously constructed backdoor examples into a training set to make the resulting model vulnerable to manipulation.
Defending against such attacks typically involves viewing these inserted examples as outliers in the training set and using techniques from robust statistics to detect and remove them.
We show that without structural information about the training data distribution, backdoor attacks are indistinguishable from naturally-occurring features in the data.
arXiv Detail & Related papers (2023-07-19T17:44:54Z) - Recover Triggered States: Protect Model Against Backdoor Attack in
Reinforcement Learning [23.94769537680776]
A backdoor attack allows a malicious user to manipulate the environment or corrupt the training data, thus inserting a backdoor into the trained agent.
This paper proposes the Recovery Triggered States (RTS) method, a novel approach that effectively protects the victim agents from backdoor attacks.
arXiv Detail & Related papers (2023-04-01T08:00:32Z) - On the Effectiveness of Adversarial Training against Backdoor Attacks [111.8963365326168]
A backdoored model always predicts a target class in the presence of a predefined trigger pattern.
In general, adversarial training is believed to defend against backdoor attacks.
We propose a hybrid strategy which provides satisfactory robustness across different backdoor attacks.
arXiv Detail & Related papers (2022-02-22T02:24:46Z) - Widen The Backdoor To Let More Attackers In [24.540853975732922]
We investigate the scenario of a multi-agent backdoor attack, where multiple non-colluding attackers craft and insert triggered samples in a shared dataset.
We discover a clear backfiring phenomenon: increasing the number of attackers shrinks each attacker's attack success rate.
We then exploit this phenomenon to minimize the collective ASR of attackers and maximize defender's robustness accuracy.
arXiv Detail & Related papers (2021-10-09T13:53:57Z) - BACKDOORL: Backdoor Attack against Competitive Reinforcement Learning [80.99426477001619]
We migrate backdoor attacks to more complex RL systems involving multiple agents.
As a proof of concept, we demonstrate that an adversary agent can trigger the backdoor of the victim agent with its own action.
The results show that when the backdoor is activated, the winning rate of the victim drops by 17% to 37% compared to when not activated.
arXiv Detail & Related papers (2021-05-02T23:47:55Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.