Rethinking Backdoor Attacks
- URL: http://arxiv.org/abs/2307.10163v1
- Date: Wed, 19 Jul 2023 17:44:54 GMT
- Title: Rethinking Backdoor Attacks
- Authors: Alaa Khaddaj, Guillaume Leclerc, Aleksandar Makelov, Kristian
Georgiev, Hadi Salman, Andrew Ilyas, Aleksander Madry
- Abstract summary: In a backdoor attack, an adversary inserts maliciously constructed backdoor examples into a training set to make the resulting model vulnerable to manipulation.
Defending against such attacks typically involves viewing these inserted examples as outliers in the training set and using techniques from robust statistics to detect and remove them.
We show that without structural information about the training data distribution, backdoor attacks are indistinguishable from naturally-occurring features in the data.
- Score: 122.1008188058615
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In a backdoor attack, an adversary inserts maliciously constructed backdoor
examples into a training set to make the resulting model vulnerable to
manipulation. Defending against such attacks typically involves viewing these
inserted examples as outliers in the training set and using techniques from
robust statistics to detect and remove them.
In this work, we present a different approach to the backdoor attack problem.
Specifically, we show that without structural information about the training
data distribution, backdoor attacks are indistinguishable from
naturally-occurring features in the data--and thus impossible to "detect" in a
general sense. Then, guided by this observation, we revisit existing defenses
against backdoor attacks and characterize the (often latent) assumptions they
make and on which they depend. Finally, we explore an alternative perspective
on backdoor attacks: one that assumes these attacks correspond to the strongest
feature in the training data. Under this assumption (which we make formal) we
develop a new primitive for detecting backdoor attacks. Our primitive naturally
gives rise to a detection algorithm that comes with theoretical guarantees and
is effective in practice.
Related papers
- Mitigating Backdoor Attack by Injecting Proactive Defensive Backdoor [63.84477483795964]
Data-poisoning backdoor attacks are serious security threats to machine learning models.
In this paper, we focus on in-training backdoor defense, aiming to train a clean model even when the dataset may be potentially poisoned.
We propose a novel defense approach called PDB (Proactive Defensive Backdoor)
arXiv Detail & Related papers (2024-05-25T07:52:26Z) - Backdoor Attack with Sparse and Invisible Trigger [57.41876708712008]
Deep neural networks (DNNs) are vulnerable to backdoor attacks.
backdoor attack is an emerging yet threatening training-phase threat.
We propose a sparse and invisible backdoor attack (SIBA)
arXiv Detail & Related papers (2023-05-11T10:05:57Z) - MM-BD: Post-Training Detection of Backdoor Attacks with Arbitrary
Backdoor Pattern Types Using a Maximum Margin Statistic [27.62279831135902]
We propose a post-training defense that detects backdoor attacks with arbitrary types of backdoor embeddings.
Our detector does not need any legitimate clean samples, and can efficiently detect backdoor attacks with arbitrary numbers of source classes.
arXiv Detail & Related papers (2022-05-13T21:32:24Z) - Narcissus: A Practical Clean-Label Backdoor Attack with Limited
Information [22.98039177091884]
"Clean-label" backdoor attacks require knowledge of the entire training set to be effective.
This paper provides an algorithm to mount clean-label backdoor attacks based only on the knowledge of representative examples from the target class.
Our attack works well across datasets and models, even when the trigger presents in the physical world.
arXiv Detail & Related papers (2022-04-11T16:58:04Z) - On the Effectiveness of Adversarial Training against Backdoor Attacks [111.8963365326168]
A backdoored model always predicts a target class in the presence of a predefined trigger pattern.
In general, adversarial training is believed to defend against backdoor attacks.
We propose a hybrid strategy which provides satisfactory robustness across different backdoor attacks.
arXiv Detail & Related papers (2022-02-22T02:24:46Z) - Excess Capacity and Backdoor Poisoning [11.383869751239166]
A backdoor data poisoning attack is an adversarial attack wherein the attacker injects several watermarked, mislabeled training examples into a training set.
We present a formal theoretical framework within which one can discuss backdoor data poisoning attacks for classification problems.
arXiv Detail & Related papers (2021-09-02T03:04:38Z) - Backdoor Learning: A Survey [75.59571756777342]
Backdoor attack intends to embed hidden backdoor into deep neural networks (DNNs)
Backdoor learning is an emerging and rapidly growing research area.
This paper presents the first comprehensive survey of this realm.
arXiv Detail & Related papers (2020-07-17T04:09:20Z) - Rethinking the Trigger of Backdoor Attack [83.98031510668619]
Currently, most of existing backdoor attacks adopted the setting of emphstatic trigger, $i.e.,$ triggers across the training and testing images follow the same appearance and are located in the same area.
We demonstrate that such an attack paradigm is vulnerable when the trigger in testing images is not consistent with the one used for training.
arXiv Detail & Related papers (2020-04-09T17:19:37Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.