Related papers: A Statistical Difference Reduction Method for Escaping Backdoor Detection

A Statistical Difference Reduction Method for Escaping Backdoor Detection

URL: http://arxiv.org/abs/2111.05077v1
Date: Tue, 9 Nov 2021 12:09:18 GMT
Title: A Statistical Difference Reduction Method for Escaping Backdoor Detection
Authors: Pengfei Xia, Hongjing Niu, Ziqiang Li, and Bin Li
Abstract summary: Recent studies show that Deep Neural Networks (DNNs) are vulnerable to backdoor attacks. Several detection methods have been developed to distinguish inputs to defend against such attacks. We propose a Statistical Difference Reduction Method (SDRM) by adding a multi-level MMD constraint to the loss function.
Score: 11.226288436817956
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Recent studies show that Deep Neural Networks (DNNs) are vulnerable to backdoor attacks. An infected model behaves normally on benign inputs, whereas its prediction will be forced to an attack-specific target on adversarial data. Several detection methods have been developed to distinguish inputs to defend against such attacks. The common hypothesis that these defenses rely on is that there are large statistical differences between the latent representations of clean and adversarial inputs extracted by the infected model. However, although it is important, comprehensive research on whether the hypothesis must be true is lacking. In this paper, we focus on it and study the following relevant questions: 1) What are the properties of the statistical differences? 2) How to effectively reduce them without harming the attack intensity? 3) What impact does this reduction have on difference-based defenses? Our work is carried out on the three questions. First, by introducing the Maximum Mean Discrepancy (MMD) as the metric, we identify that the statistical differences of multi-level representations are all large, not just the highest level. Then, we propose a Statistical Difference Reduction Method (SDRM) by adding a multi-level MMD constraint to the loss function during training a backdoor model to effectively reduce the differences. Last, three typical difference-based detection methods are examined. The F1 scores of these defenses drop from 90%-100% on the regularly trained backdoor models to 60%-70% on the models trained with SDRM on all two datasets, four model architectures, and four attack methods. The results indicate that the proposed method can be used to enhance existing attacks to escape backdoor detection algorithms.

Related papers

UIBDiffusion: Universal Imperceptible Backdoor Attack for Diffusion Models [23.123721322735445]
diffusion models (DMs) are vulnerable to backdoor attacks. We propose UIBDiffusion, the universal imperceptible backdoor attack for DMs.
arXiv Detail & Related papers (2024-12-16T04:47:55Z)
Data Free Backdoor Attacks [83.10379074100453]
DFBA is a retraining-free and data-free backdoor attack without changing the model architecture. We verify that our injected backdoor is provably undetectable and unchosen by various state-of-the-art defenses. Our evaluation on multiple datasets demonstrates that our injected backdoor: 1) incurs negligible classification loss, 2) achieves 100% attack success rates, and 3) bypasses six existing state-of-the-art defenses.
arXiv Detail & Related papers (2024-12-09T05:30:25Z)
DMGNN: Detecting and Mitigating Backdoor Attacks in Graph Neural Networks [30.766013737094532]
We propose DMGNN against out-of-distribution (OOD) and in-distribution (ID) graph backdoor attacks. DMGNN can easily identify the hidden ID and OOD triggers via predicting label transitions based on counterfactual explanation. DMGNN far outperforms the state-of-the-art (SOTA) defense methods, reducing the attack success rate to 5% with almost negligible degradation in model performance.
arXiv Detail & Related papers (2024-10-18T01:08:03Z)
Efficient Backdoor Defense in Multimodal Contrastive Learning: A Token-Level Unlearning Method for Mitigating Threats [52.94388672185062]
We propose an efficient defense mechanism against backdoor threats using a concept known as machine unlearning. This entails strategically creating a small set of poisoned samples to aid the model's rapid unlearning of backdoor vulnerabilities. In the backdoor unlearning process, we present a novel token-based portion unlearning training regime.
arXiv Detail & Related papers (2024-09-29T02:55:38Z)
Elijah: Eliminating Backdoors Injected in Diffusion Models via Distribution Shift [86.92048184556936]
We propose the first backdoor detection and removal framework for DMs. We evaluate our framework Elijah on hundreds of DMs of 3 types including DDPM, NCSN and LDM. Our approach can have close to 100% detection accuracy and reduce the backdoor effects to close to zero without significantly sacrificing the model utility.
arXiv Detail & Related papers (2023-11-27T23:58:56Z)
DALA: A Distribution-Aware LoRA-Based Adversarial Attack against Language Models [64.79319733514266]
Adversarial attacks can introduce subtle perturbations to input data. Recent attack methods can achieve a relatively high attack success rate (ASR) We propose a Distribution-Aware LoRA-based Adversarial Attack (DALA) method.
arXiv Detail & Related papers (2023-11-14T23:43:47Z)
Avoid Adversarial Adaption in Federated Learning by Multi-Metric Investigations [55.2480439325792]
Federated Learning (FL) facilitates decentralized machine learning model training, preserving data privacy, lowering communication costs, and boosting model performance through diversified data sources. FL faces vulnerabilities such as poisoning attacks, undermining model integrity with both untargeted performance degradation and targeted backdoor attacks. We define a new notion of strong adaptive adversaries, capable of adapting to multiple objectives simultaneously. MESAS is the first defense robust against strong adaptive adversaries, effective in real-world data scenarios, with an average overhead of just 24.37 seconds.
arXiv Detail & Related papers (2023-06-06T11:44:42Z)
BDMMT: Backdoor Sample Detection for Language Models through Model Mutation Testing [14.88575793895578]
We propose a defense method based on deep model mutation testing. We first confirm the effectiveness of model mutation testing in detecting backdoor samples. We then systematically defend against three extensively studied backdoor attack levels.
arXiv Detail & Related papers (2023-01-25T05:24:46Z)
A Knowledge Distillation-Based Backdoor Attack in Federated Learning [9.22321085045949]
Adversarial Knowledge Distillation(ADVKD) is a method combine knowledge distillation with backdoor attack in Federated Learning (FL) We show that ADVKD can not only reach a higher attack success rate, but also successfully bypass the defenses when other methods fails.
arXiv Detail & Related papers (2022-08-12T08:52:56Z)
Backdoor Attacks on Crowd Counting [63.90533357815404]
Crowd counting is a regression task that estimates the number of people in a scene image. In this paper, we investigate the vulnerability of deep learning based crowd counting models to backdoor attacks.
arXiv Detail & Related papers (2022-07-12T16:17:01Z)
Invisible Backdoor Attacks Using Data Poisoning in the Frequency Domain [8.64369418938889]
We propose a generalized backdoor attack method based on the frequency domain. It can implement backdoor implantation without mislabeling and accessing the training process. We evaluate our approach in the no-label and clean-label cases on three datasets.
arXiv Detail & Related papers (2022-07-09T07:05:53Z)
PiDAn: A Coherence Optimization Approach for Backdoor Attack Detection and Mitigation in Deep Neural Networks [22.900501880865658]
Backdoor attacks impose a new threat in Deep Neural Networks (DNNs) We propose PiDAn, an algorithm based on coherence optimization purifying the poisoned data. Our PiDAn algorithm can detect more than 90% infected classes and identify 95% poisoned samples.
arXiv Detail & Related papers (2022-03-17T12:37:21Z)
Maximum Mean Discrepancy Test is Aware of Adversarial Attacks [122.51040127438324]
The maximum mean discrepancy (MMD) test could in principle detect any distributional discrepancy between two datasets. It has been shown that the MMD test is unaware of adversarial attacks.
arXiv Detail & Related papers (2020-10-22T03:42:12Z)

This list is automatically generated from the titles and abstracts of the papers in this site.