A Statistical Difference Reduction Method for Escaping Backdoor
Detection
- URL: http://arxiv.org/abs/2111.05077v1
- Date: Tue, 9 Nov 2021 12:09:18 GMT
- Title: A Statistical Difference Reduction Method for Escaping Backdoor
Detection
- Authors: Pengfei Xia, Hongjing Niu, Ziqiang Li, and Bin Li
- Abstract summary: Recent studies show that Deep Neural Networks (DNNs) are vulnerable to backdoor attacks.
Several detection methods have been developed to distinguish inputs to defend against such attacks.
We propose a Statistical Difference Reduction Method (SDRM) by adding a multi-level MMD constraint to the loss function.
- Score: 11.226288436817956
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Recent studies show that Deep Neural Networks (DNNs) are vulnerable to
backdoor attacks. An infected model behaves normally on benign inputs, whereas
its prediction will be forced to an attack-specific target on adversarial data.
Several detection methods have been developed to distinguish inputs to defend
against such attacks. The common hypothesis that these defenses rely on is that
there are large statistical differences between the latent representations of
clean and adversarial inputs extracted by the infected model. However, although
it is important, comprehensive research on whether the hypothesis must be true
is lacking. In this paper, we focus on it and study the following relevant
questions: 1) What are the properties of the statistical differences? 2) How to
effectively reduce them without harming the attack intensity? 3) What impact
does this reduction have on difference-based defenses? Our work is carried out
on the three questions. First, by introducing the Maximum Mean Discrepancy
(MMD) as the metric, we identify that the statistical differences of
multi-level representations are all large, not just the highest level. Then, we
propose a Statistical Difference Reduction Method (SDRM) by adding a
multi-level MMD constraint to the loss function during training a backdoor
model to effectively reduce the differences. Last, three typical
difference-based detection methods are examined. The F1 scores of these
defenses drop from 90%-100% on the regularly trained backdoor models to 60%-70%
on the models trained with SDRM on all two datasets, four model architectures,
and four attack methods. The results indicate that the proposed method can be
used to enhance existing attacks to escape backdoor detection algorithms.
Related papers
- DMGNN: Detecting and Mitigating Backdoor Attacks in Graph Neural Networks [30.766013737094532]
We propose DMGNN against out-of-distribution (OOD) and in-distribution (ID) graph backdoor attacks.
DMGNN can easily identify the hidden ID and OOD triggers via predicting label transitions based on counterfactual explanation.
DMGNN far outperforms the state-of-the-art (SOTA) defense methods, reducing the attack success rate to 5% with almost negligible degradation in model performance.
arXiv Detail & Related papers (2024-10-18T01:08:03Z) - Efficient Backdoor Defense in Multimodal Contrastive Learning: A Token-Level Unlearning Method for Mitigating Threats [52.94388672185062]
We propose an efficient defense mechanism against backdoor threats using a concept known as machine unlearning.
This entails strategically creating a small set of poisoned samples to aid the model's rapid unlearning of backdoor vulnerabilities.
In the backdoor unlearning process, we present a novel token-based portion unlearning training regime.
arXiv Detail & Related papers (2024-09-29T02:55:38Z) - Elijah: Eliminating Backdoors Injected in Diffusion Models via
Distribution Shift [86.92048184556936]
We propose the first backdoor detection and removal framework for DMs.
We evaluate our framework Elijah on hundreds of DMs of 3 types including DDPM, NCSN and LDM.
Our approach can have close to 100% detection accuracy and reduce the backdoor effects to close to zero without significantly sacrificing the model utility.
arXiv Detail & Related papers (2023-11-27T23:58:56Z) - DALA: A Distribution-Aware LoRA-Based Adversarial Attack against
Language Models [64.79319733514266]
Adversarial attacks can introduce subtle perturbations to input data.
Recent attack methods can achieve a relatively high attack success rate (ASR)
We propose a Distribution-Aware LoRA-based Adversarial Attack (DALA) method.
arXiv Detail & Related papers (2023-11-14T23:43:47Z) - Avoid Adversarial Adaption in Federated Learning by Multi-Metric
Investigations [55.2480439325792]
Federated Learning (FL) facilitates decentralized machine learning model training, preserving data privacy, lowering communication costs, and boosting model performance through diversified data sources.
FL faces vulnerabilities such as poisoning attacks, undermining model integrity with both untargeted performance degradation and targeted backdoor attacks.
We define a new notion of strong adaptive adversaries, capable of adapting to multiple objectives simultaneously.
MESAS is the first defense robust against strong adaptive adversaries, effective in real-world data scenarios, with an average overhead of just 24.37 seconds.
arXiv Detail & Related papers (2023-06-06T11:44:42Z) - BDMMT: Backdoor Sample Detection for Language Models through Model
Mutation Testing [14.88575793895578]
We propose a defense method based on deep model mutation testing.
We first confirm the effectiveness of model mutation testing in detecting backdoor samples.
We then systematically defend against three extensively studied backdoor attack levels.
arXiv Detail & Related papers (2023-01-25T05:24:46Z) - A Knowledge Distillation-Based Backdoor Attack in Federated Learning [9.22321085045949]
Adversarial Knowledge Distillation(ADVKD) is a method combine knowledge distillation with backdoor attack in Federated Learning (FL)
We show that ADVKD can not only reach a higher attack success rate, but also successfully bypass the defenses when other methods fails.
arXiv Detail & Related papers (2022-08-12T08:52:56Z) - Backdoor Attacks on Crowd Counting [63.90533357815404]
Crowd counting is a regression task that estimates the number of people in a scene image.
In this paper, we investigate the vulnerability of deep learning based crowd counting models to backdoor attacks.
arXiv Detail & Related papers (2022-07-12T16:17:01Z) - Invisible Backdoor Attacks Using Data Poisoning in the Frequency Domain [8.64369418938889]
We propose a generalized backdoor attack method based on the frequency domain.
It can implement backdoor implantation without mislabeling and accessing the training process.
We evaluate our approach in the no-label and clean-label cases on three datasets.
arXiv Detail & Related papers (2022-07-09T07:05:53Z) - PiDAn: A Coherence Optimization Approach for Backdoor Attack Detection
and Mitigation in Deep Neural Networks [22.900501880865658]
Backdoor attacks impose a new threat in Deep Neural Networks (DNNs)
We propose PiDAn, an algorithm based on coherence optimization purifying the poisoned data.
Our PiDAn algorithm can detect more than 90% infected classes and identify 95% poisoned samples.
arXiv Detail & Related papers (2022-03-17T12:37:21Z) - Maximum Mean Discrepancy Test is Aware of Adversarial Attacks [122.51040127438324]
The maximum mean discrepancy (MMD) test could in principle detect any distributional discrepancy between two datasets.
It has been shown that the MMD test is unaware of adversarial attacks.
arXiv Detail & Related papers (2020-10-22T03:42:12Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.