Confidence Matters: Inspecting Backdoors in Deep Neural Networks via
Distribution Transfer
- URL: http://arxiv.org/abs/2208.06592v1
- Date: Sat, 13 Aug 2022 08:16:28 GMT
- Title: Confidence Matters: Inspecting Backdoors in Deep Neural Networks via
Distribution Transfer
- Authors: Tong Wang, Yuan Yao, Feng Xu, Miao Xu, Shengwei An, Ting Wang
- Abstract summary: We propose a backdoor defense DTInspector built upon a new observation.
DTInspector learns a patch that could change the predictions of most high-confidence data, and then decides the existence of backdoor.
- Score: 27.631616436623588
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Backdoor attacks have been shown to be a serious security threat against deep
learning models, and detecting whether a given model has been backdoored
becomes a crucial task. Existing defenses are mainly built upon the observation
that the backdoor trigger is usually of small size or affects the activation of
only a few neurons. However, the above observations are violated in many cases
especially for advanced backdoor attacks, hindering the performance and
applicability of the existing defenses. In this paper, we propose a backdoor
defense DTInspector built upon a new observation. That is, an effective
backdoor attack usually requires high prediction confidence on the poisoned
training samples, so as to ensure that the trained model exhibits the targeted
behavior with a high probability. Based on this observation, DTInspector first
learns a patch that could change the predictions of most high-confidence data,
and then decides the existence of backdoor by checking the ratio of prediction
changes after applying the learned patch on the low-confidence data. Extensive
evaluations on five backdoor attacks, four datasets, and three advanced
attacking types demonstrate the effectiveness of the proposed defense.
Related papers
- Breaking the False Sense of Security in Backdoor Defense through Re-Activation Attack [32.74007523929888]
We re-investigate the characteristics of backdoored models after defense.
We find that the original backdoors still exist in defense models derived from existing post-training defense strategies.
We empirically show that these dormant backdoors can be easily re-activated during inference.
arXiv Detail & Related papers (2024-05-25T08:57:30Z) - Mitigating Backdoor Attack by Injecting Proactive Defensive Backdoor [63.84477483795964]
Data-poisoning backdoor attacks are serious security threats to machine learning models.
In this paper, we focus on in-training backdoor defense, aiming to train a clean model even when the dataset may be potentially poisoned.
We propose a novel defense approach called PDB (Proactive Defensive Backdoor)
arXiv Detail & Related papers (2024-05-25T07:52:26Z) - Rethinking Backdoor Attacks [122.1008188058615]
In a backdoor attack, an adversary inserts maliciously constructed backdoor examples into a training set to make the resulting model vulnerable to manipulation.
Defending against such attacks typically involves viewing these inserted examples as outliers in the training set and using techniques from robust statistics to detect and remove them.
We show that without structural information about the training data distribution, backdoor attacks are indistinguishable from naturally-occurring features in the data.
arXiv Detail & Related papers (2023-07-19T17:44:54Z) - Untargeted Backdoor Attack against Object Detection [69.63097724439886]
We design a poison-only backdoor attack in an untargeted manner, based on task characteristics.
We show that, once the backdoor is embedded into the target model by our attack, it can trick the model to lose detection of any object stamped with our trigger patterns.
arXiv Detail & Related papers (2022-11-02T17:05:45Z) - On the Effectiveness of Adversarial Training against Backdoor Attacks [111.8963365326168]
A backdoored model always predicts a target class in the presence of a predefined trigger pattern.
In general, adversarial training is believed to defend against backdoor attacks.
We propose a hybrid strategy which provides satisfactory robustness across different backdoor attacks.
arXiv Detail & Related papers (2022-02-22T02:24:46Z) - Black-box Detection of Backdoor Attacks with Limited Information and
Data [56.0735480850555]
We propose a black-box backdoor detection (B3D) method to identify backdoor attacks with only query access to the model.
In addition to backdoor detection, we also propose a simple strategy for reliable predictions using the identified backdoored models.
arXiv Detail & Related papers (2021-03-24T12:06:40Z) - Backdoor Learning: A Survey [75.59571756777342]
Backdoor attack intends to embed hidden backdoor into deep neural networks (DNNs)
Backdoor learning is an emerging and rapidly growing research area.
This paper presents the first comprehensive survey of this realm.
arXiv Detail & Related papers (2020-07-17T04:09:20Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.