Reverse Engineering Imperceptible Backdoor Attacks on Deep Neural
Networks for Detection and Training Set Cleansing
- URL: http://arxiv.org/abs/2010.07489v1
- Date: Thu, 15 Oct 2020 03:12:24 GMT
- Title: Reverse Engineering Imperceptible Backdoor Attacks on Deep Neural
Networks for Detection and Training Set Cleansing
- Authors: Zhen Xiang, David J. Miller, George Kesidis
- Abstract summary: Backdoor data poisoning is an emerging form of adversarial attack against deep neural network image classifiers.
In this paper, we make a break-through in defending backdoor attacks with imperceptible backdoor patterns.
We propose an optimization-based reverse-engineering defense, that jointly: 1) detects whether the training set is poisoned; 2) if so, identifies the target class and the training images with the backdoor pattern embedded; and 3) additionally, reversely engineers an estimate of the backdoor pattern used by the attacker.
- Score: 22.22337220509128
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Backdoor data poisoning is an emerging form of adversarial attack usually
against deep neural network image classifiers. The attacker poisons the
training set with a relatively small set of images from one (or several) source
class(es), embedded with a backdoor pattern and labeled to a target class. For
a successful attack, during operation, the trained classifier will: 1)
misclassify a test image from the source class(es) to the target class whenever
the same backdoor pattern is present; 2) maintain a high classification
accuracy for backdoor-free test images. In this paper, we make a break-through
in defending backdoor attacks with imperceptible backdoor patterns (e.g.
watermarks) before/during the training phase. This is a challenging problem
because it is a priori unknown which subset (if any) of the training set has
been poisoned. We propose an optimization-based reverse-engineering defense,
that jointly: 1) detects whether the training set is poisoned; 2) if so,
identifies the target class and the training images with the backdoor pattern
embedded; and 3) additionally, reversely engineers an estimate of the backdoor
pattern used by the attacker. In benchmark experiments on CIFAR-10, for a large
variety of attacks, our defense achieves a new state-of-the-art by reducing the
attack success rate to no more than 4.9% after removing detected suspicious
training images.
Related papers
- Rethinking Backdoor Attacks [122.1008188058615]
In a backdoor attack, an adversary inserts maliciously constructed backdoor examples into a training set to make the resulting model vulnerable to manipulation.
Defending against such attacks typically involves viewing these inserted examples as outliers in the training set and using techniques from robust statistics to detect and remove them.
We show that without structural information about the training data distribution, backdoor attacks are indistinguishable from naturally-occurring features in the data.
arXiv Detail & Related papers (2023-07-19T17:44:54Z) - Backdoor Attack with Sparse and Invisible Trigger [57.41876708712008]
Deep neural networks (DNNs) are vulnerable to backdoor attacks.
backdoor attack is an emerging yet threatening training-phase threat.
We propose a sparse and invisible backdoor attack (SIBA)
arXiv Detail & Related papers (2023-05-11T10:05:57Z) - Invisible Backdoor Attack with Dynamic Triggers against Person
Re-identification [71.80885227961015]
Person Re-identification (ReID) has rapidly progressed with wide real-world applications, but also poses significant risks of adversarial attacks.
We propose a novel backdoor attack on ReID under a new all-to-unknown scenario, called Dynamic Triggers Invisible Backdoor Attack (DT-IBA)
We extensively validate the effectiveness and stealthiness of the proposed attack on benchmark datasets, and evaluate the effectiveness of several defense methods against our attack.
arXiv Detail & Related papers (2022-11-20T10:08:28Z) - Untargeted Backdoor Attack against Object Detection [69.63097724439886]
We design a poison-only backdoor attack in an untargeted manner, based on task characteristics.
We show that, once the backdoor is embedded into the target model by our attack, it can trick the model to lose detection of any object stamped with our trigger patterns.
arXiv Detail & Related papers (2022-11-02T17:05:45Z) - MM-BD: Post-Training Detection of Backdoor Attacks with Arbitrary
Backdoor Pattern Types Using a Maximum Margin Statistic [27.62279831135902]
We propose a post-training defense that detects backdoor attacks with arbitrary types of backdoor embeddings.
Our detector does not need any legitimate clean samples, and can efficiently detect backdoor attacks with arbitrary numbers of source classes.
arXiv Detail & Related papers (2022-05-13T21:32:24Z) - Narcissus: A Practical Clean-Label Backdoor Attack with Limited
Information [22.98039177091884]
"Clean-label" backdoor attacks require knowledge of the entire training set to be effective.
This paper provides an algorithm to mount clean-label backdoor attacks based only on the knowledge of representative examples from the target class.
Our attack works well across datasets and models, even when the trigger presents in the physical world.
arXiv Detail & Related papers (2022-04-11T16:58:04Z) - Backdoor Attack in the Physical World [49.64799477792172]
Backdoor attack intends to inject hidden backdoor into the deep neural networks (DNNs)
Most existing backdoor attacks adopted the setting of static trigger, $i.e.,$ triggers across the training and testing images.
We demonstrate that this attack paradigm is vulnerable when the trigger in testing images is not consistent with the one used for training.
arXiv Detail & Related papers (2021-04-06T08:37:33Z) - WaNet -- Imperceptible Warping-based Backdoor Attack [20.289889150949836]
A third-party model can be poisoned in training to work well in normal conditions but behave maliciously when a trigger pattern appears.
In this paper, we propose using warping-based triggers to attack third-party models.
The proposed backdoor outperforms the previous methods in a human inspection test by a wide margin, proving its stealthiness.
arXiv Detail & Related papers (2021-02-20T15:25:36Z) - Reflection Backdoor: A Natural Backdoor Attack on Deep Neural Networks [46.99548490594115]
A backdoor attack installs a backdoor into the victim model by injecting a backdoor pattern into a small proportion of the training data.
We propose reflection backdoor (Refool) to plant reflections as backdoor into a victim model.
We demonstrate on 3 computer vision tasks and 5 datasets that, Refool can attack state-of-the-art DNNs with high success rate.
arXiv Detail & Related papers (2020-07-05T13:56:48Z) - Clean-Label Backdoor Attacks on Video Recognition Models [87.46539956587908]
We show that image backdoor attacks are far less effective on videos.
We propose the use of a universal adversarial trigger as the backdoor trigger to attack video recognition models.
Our proposed backdoor attack is resistant to state-of-the-art backdoor defense/detection methods.
arXiv Detail & Related papers (2020-03-06T04:51:48Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.