PiDAn: A Coherence Optimization Approach for Backdoor Attack Detection
and Mitigation in Deep Neural Networks
- URL: http://arxiv.org/abs/2203.09289v1
- Date: Thu, 17 Mar 2022 12:37:21 GMT
- Title: PiDAn: A Coherence Optimization Approach for Backdoor Attack Detection
and Mitigation in Deep Neural Networks
- Authors: Yue Wang, Wenqing Li, Esha Sarkar, Muhammad Shafique, Michail
Maniatakos, Saif Eddin Jabari
- Abstract summary: Backdoor attacks impose a new threat in Deep Neural Networks (DNNs)
We propose PiDAn, an algorithm based on coherence optimization purifying the poisoned data.
Our PiDAn algorithm can detect more than 90% infected classes and identify 95% poisoned samples.
- Score: 22.900501880865658
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Backdoor attacks impose a new threat in Deep Neural Networks (DNNs), where a
backdoor is inserted into the neural network by poisoning the training dataset,
misclassifying inputs that contain the adversary trigger. The major challenge
for defending against these attacks is that only the attacker knows the secret
trigger and the target class. The problem is further exacerbated by the recent
introduction of "Hidden Triggers", where the triggers are carefully fused into
the input, bypassing detection by human inspection and causing backdoor
identification through anomaly detection to fail. To defend against such
imperceptible attacks, in this work we systematically analyze how
representations, i.e., the set of neuron activations for a given DNN when using
the training data as inputs, are affected by backdoor attacks. We propose
PiDAn, an algorithm based on coherence optimization purifying the poisoned
data. Our analysis shows that representations of poisoned data and authentic
data in the target class are still embedded in different linear subspaces,
which implies that they show different coherence with some latent spaces. Based
on this observation, the proposed PiDAn algorithm learns a sample-wise weight
vector to maximize the projected coherence of weighted samples, where we
demonstrate that the learned weight vector has a natural "grouping effect" and
is distinguishable between authentic data and poisoned data. This enables the
systematic detection and mitigation of backdoor attacks. Based on our
theoretical analysis and experimental results, we demonstrate the effectiveness
of PiDAn in defending against backdoor attacks that use different settings of
poisoned samples on GTSRB and ILSVRC2012 datasets. Our PiDAn algorithm can
detect more than 90% infected classes and identify 95% poisoned samples.
Related papers
- Long-Tailed Backdoor Attack Using Dynamic Data Augmentation Operations [50.1394620328318]
Existing backdoor attacks mainly focus on balanced datasets.
We propose an effective backdoor attack named Dynamic Data Augmentation Operation (D$2$AO)
Our method can achieve the state-of-the-art attack performance while preserving the clean accuracy.
arXiv Detail & Related papers (2024-10-16T18:44:22Z) - Robustness-Inspired Defense Against Backdoor Attacks on Graph Neural Networks [30.82433380830665]
Graph Neural Networks (GNNs) have achieved promising results in tasks such as node classification and graph classification.
Recent studies reveal that GNNs are vulnerable to backdoor attacks, posing a significant threat to their real-world adoption.
We propose using random edge dropping to detect backdoors and theoretically show that it can efficiently distinguish poisoned nodes from clean ones.
arXiv Detail & Related papers (2024-06-14T08:46:26Z) - PSBD: Prediction Shift Uncertainty Unlocks Backdoor Detection [57.571451139201855]
Prediction Shift Backdoor Detection (PSBD) is a novel method for identifying backdoor samples in deep neural networks.
PSBD is motivated by an intriguing Prediction Shift (PS) phenomenon, where poisoned models' predictions on clean data often shift away from true labels towards certain other labels.
PSBD identifies backdoor training samples by computing the Prediction Shift Uncertainty (PSU), the variance in probability values when dropout layers are toggled on and off during model inference.
arXiv Detail & Related papers (2024-06-09T15:31:00Z) - Backdoor Attack with Sparse and Invisible Trigger [57.41876708712008]
Deep neural networks (DNNs) are vulnerable to backdoor attacks.
backdoor attack is an emerging yet threatening training-phase threat.
We propose a sparse and invisible backdoor attack (SIBA)
arXiv Detail & Related papers (2023-05-11T10:05:57Z) - Backdoor Defense via Deconfounded Representation Learning [17.28760299048368]
We propose a Causality-inspired Backdoor Defense (CBD) to learn deconfounded representations for reliable classification.
CBD is effective in reducing backdoor threats while maintaining high accuracy in predicting benign samples.
arXiv Detail & Related papers (2023-03-13T02:25:59Z) - FreeEagle: Detecting Complex Neural Trojans in Data-Free Cases [50.065022493142116]
Trojan attack on deep neural networks, also known as backdoor attack, is a typical threat to artificial intelligence.
FreeEagle is the first data-free backdoor detection method that can effectively detect complex backdoor attacks.
arXiv Detail & Related papers (2023-02-28T11:31:29Z) - Untargeted Backdoor Attack against Object Detection [69.63097724439886]
We design a poison-only backdoor attack in an untargeted manner, based on task characteristics.
We show that, once the backdoor is embedded into the target model by our attack, it can trick the model to lose detection of any object stamped with our trigger patterns.
arXiv Detail & Related papers (2022-11-02T17:05:45Z) - Invisible Backdoor Attacks Using Data Poisoning in the Frequency Domain [8.64369418938889]
We propose a generalized backdoor attack method based on the frequency domain.
It can implement backdoor implantation without mislabeling and accessing the training process.
We evaluate our approach in the no-label and clean-label cases on three datasets.
arXiv Detail & Related papers (2022-07-09T07:05:53Z) - Black-box Detection of Backdoor Attacks with Limited Information and
Data [56.0735480850555]
We propose a black-box backdoor detection (B3D) method to identify backdoor attacks with only query access to the model.
In addition to backdoor detection, we also propose a simple strategy for reliable predictions using the identified backdoored models.
arXiv Detail & Related papers (2021-03-24T12:06:40Z) - Mitigating the Impact of Adversarial Attacks in Very Deep Networks [10.555822166916705]
Deep Neural Network (DNN) models have vulnerabilities related to security concerns.
Data poisoning-enabled perturbation attacks are complex adversarial ones that inject false data into models.
We propose an attack-agnostic-based defense method for mitigating their influence.
arXiv Detail & Related papers (2020-12-08T21:25:44Z) - Backdoor Smoothing: Demystifying Backdoor Attacks on Deep Neural
Networks [25.23881974235643]
We show that backdoor attacks induce a smoother decision function around the triggered samples -- a phenomenon which we refer to as textitbackdoor smoothing.
Our experiments show that smoothness increases when the trigger is added to the input samples, and that this phenomenon is more pronounced for more successful attacks.
arXiv Detail & Related papers (2020-06-11T18:28:54Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.