AEVA: Black-box Backdoor Detection Using Adversarial Extreme Value
Analysis
- URL: http://arxiv.org/abs/2110.14880v2
- Date: Fri, 29 Oct 2021 19:57:24 GMT
- Title: AEVA: Black-box Backdoor Detection Using Adversarial Extreme Value
Analysis
- Authors: Junfeng Guo and Ang Li and Cong Liu
- Abstract summary: We address the black-box hard-label backdoor detection problem.
We show that the objective of backdoor detection is bounded by an adversarial objective.
We propose the adversarial extreme value analysis to detect backdoors in black-box neural networks.
- Score: 23.184335982913325
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Deep neural networks (DNNs) are proved to be vulnerable against backdoor
attacks. A backdoor is often embedded in the target DNNs through injecting a
backdoor trigger into training examples, which can cause the target DNNs
misclassify an input attached with the backdoor trigger. Existing backdoor
detection methods often require the access to the original poisoned training
data, the parameters of the target DNNs, or the predictive confidence for each
given input, which are impractical in many real-world applications, e.g.,
on-device deployed DNNs. We address the black-box hard-label backdoor detection
problem where the DNN is fully black-box and only its final output label is
accessible. We approach this problem from the optimization perspective and show
that the objective of backdoor detection is bounded by an adversarial
objective. Further theoretical and empirical studies reveal that this
adversarial objective leads to a solution with highly skewed distribution; a
singularity is often observed in the adversarial map of a backdoor-infected
example, which we call the adversarial singularity phenomenon. Based on this
observation, we propose the adversarial extreme value analysis(AEVA) to detect
backdoors in black-box neural networks. AEVA is based on an extreme value
analysis of the adversarial map, computed from the monte-carlo gradient
estimation. Evidenced by extensive experiments across multiple popular tasks
and backdoor attacks, our approach is shown effective in detecting backdoor
attacks under the black-box hard-label scenarios.
Related papers
- BeniFul: Backdoor Defense via Middle Feature Analysis for Deep Neural Networks [0.6872939325656702]
We propose an effective and comprehensive backdoor defense method named BeniFul, which consists of two parts: a gray-box backdoor input detection and a white-box backdoor elimination.
Experimental results on CIFAR-10 and Tiny ImageNet against five state-of-the-art attacks demonstrate that our BeniFul exhibits a great defense capability in backdoor input detection and backdoor elimination.
arXiv Detail & Related papers (2024-10-15T13:14:55Z) - Backdoor Defense via Deconfounded Representation Learning [17.28760299048368]
We propose a Causality-inspired Backdoor Defense (CBD) to learn deconfounded representations for reliable classification.
CBD is effective in reducing backdoor threats while maintaining high accuracy in predicting benign samples.
arXiv Detail & Related papers (2023-03-13T02:25:59Z) - FreeEagle: Detecting Complex Neural Trojans in Data-Free Cases [50.065022493142116]
Trojan attack on deep neural networks, also known as backdoor attack, is a typical threat to artificial intelligence.
FreeEagle is the first data-free backdoor detection method that can effectively detect complex backdoor attacks.
arXiv Detail & Related papers (2023-02-28T11:31:29Z) - Untargeted Backdoor Attack against Object Detection [69.63097724439886]
We design a poison-only backdoor attack in an untargeted manner, based on task characteristics.
We show that, once the backdoor is embedded into the target model by our attack, it can trick the model to lose detection of any object stamped with our trigger patterns.
arXiv Detail & Related papers (2022-11-02T17:05:45Z) - An anomaly detection approach for backdoored neural networks: face
recognition as a case study [77.92020418343022]
We propose a novel backdoored network detection method based on the principle of anomaly detection.
We test our method on a novel dataset of backdoored networks and report detectability results with perfect scores.
arXiv Detail & Related papers (2022-08-22T12:14:13Z) - Invisible Backdoor Attacks Using Data Poisoning in the Frequency Domain [8.64369418938889]
We propose a generalized backdoor attack method based on the frequency domain.
It can implement backdoor implantation without mislabeling and accessing the training process.
We evaluate our approach in the no-label and clean-label cases on three datasets.
arXiv Detail & Related papers (2022-07-09T07:05:53Z) - Check Your Other Door! Establishing Backdoor Attacks in the Frequency
Domain [80.24811082454367]
We show the advantages of utilizing the frequency domain for establishing undetectable and powerful backdoor attacks.
We also show two possible defences that succeed against frequency-based backdoor attacks and possible ways for the attacker to bypass them.
arXiv Detail & Related papers (2021-09-12T12:44:52Z) - Black-box Detection of Backdoor Attacks with Limited Information and
Data [56.0735480850555]
We propose a black-box backdoor detection (B3D) method to identify backdoor attacks with only query access to the model.
In addition to backdoor detection, we also propose a simple strategy for reliable predictions using the identified backdoored models.
arXiv Detail & Related papers (2021-03-24T12:06:40Z) - Defending against Backdoor Attack on Deep Neural Networks [98.45955746226106]
We study the so-called textitbackdoor attack, which injects a backdoor trigger to a small portion of training data.
Experiments show that our method could effectively decrease the attack success rate, and also hold a high classification accuracy for clean images.
arXiv Detail & Related papers (2020-02-26T02:03:00Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.