Backdoor Defense via Deconfounded Representation Learning
- URL: http://arxiv.org/abs/2303.06818v1
- Date: Mon, 13 Mar 2023 02:25:59 GMT
- Title: Backdoor Defense via Deconfounded Representation Learning
- Authors: Zaixi Zhang, Qi Liu, Zhicai Wang, Zepu Lu, Qingyong Hu
- Abstract summary: We propose a Causality-inspired Backdoor Defense (CBD) to learn deconfounded representations for reliable classification.
CBD is effective in reducing backdoor threats while maintaining high accuracy in predicting benign samples.
- Score: 17.28760299048368
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Deep neural networks (DNNs) are recently shown to be vulnerable to backdoor
attacks, where attackers embed hidden backdoors in the DNN model by injecting a
few poisoned examples into the training dataset. While extensive efforts have
been made to detect and remove backdoors from backdoored DNNs, it is still not
clear whether a backdoor-free clean model can be directly obtained from
poisoned datasets. In this paper, we first construct a causal graph to model
the generation process of poisoned data and find that the backdoor attack acts
as the confounder, which brings spurious associations between the input images
and target labels, making the model predictions less reliable. Inspired by the
causal understanding, we propose the Causality-inspired Backdoor Defense (CBD),
to learn deconfounded representations for reliable classification.
Specifically, a backdoored model is intentionally trained to capture the
confounding effects. The other clean model dedicates to capturing the desired
causal effects by minimizing the mutual information with the confounding
representations from the backdoored model and employing a sample-wise
re-weighting scheme. Extensive experiments on multiple benchmark datasets
against 6 state-of-the-art attacks verify that our proposed defense method is
effective in reducing backdoor threats while maintaining high accuracy in
predicting benign samples. Further analysis shows that CBD can also resist
potential adaptive attacks. The code is available at
\url{https://github.com/zaixizhang/CBD}.
Related papers
- DMGNN: Detecting and Mitigating Backdoor Attacks in Graph Neural Networks [30.766013737094532]
We propose DMGNN against out-of-distribution (OOD) and in-distribution (ID) graph backdoor attacks.
DMGNN can easily identify the hidden ID and OOD triggers via predicting label transitions based on counterfactual explanation.
DMGNN far outperforms the state-of-the-art (SOTA) defense methods, reducing the attack success rate to 5% with almost negligible degradation in model performance.
arXiv Detail & Related papers (2024-10-18T01:08:03Z) - Mitigating Backdoor Attack by Injecting Proactive Defensive Backdoor [63.84477483795964]
Data-poisoning backdoor attacks are serious security threats to machine learning models.
In this paper, we focus on in-training backdoor defense, aiming to train a clean model even when the dataset may be potentially poisoned.
We propose a novel defense approach called PDB (Proactive Defensive Backdoor)
arXiv Detail & Related papers (2024-05-25T07:52:26Z) - Backdoor Attack with Sparse and Invisible Trigger [57.41876708712008]
Deep neural networks (DNNs) are vulnerable to backdoor attacks.
backdoor attack is an emerging yet threatening training-phase threat.
We propose a sparse and invisible backdoor attack (SIBA)
arXiv Detail & Related papers (2023-05-11T10:05:57Z) - Untargeted Backdoor Attack against Object Detection [69.63097724439886]
We design a poison-only backdoor attack in an untargeted manner, based on task characteristics.
We show that, once the backdoor is embedded into the target model by our attack, it can trick the model to lose detection of any object stamped with our trigger patterns.
arXiv Detail & Related papers (2022-11-02T17:05:45Z) - Backdoor Defense via Suppressing Model Shortcuts [91.30995749139012]
In this paper, we explore the backdoor mechanism from the angle of the model structure.
We demonstrate that the attack success rate (ASR) decreases significantly when reducing the outputs of some key skip connections.
arXiv Detail & Related papers (2022-11-02T15:39:19Z) - Invisible Backdoor Attacks Using Data Poisoning in the Frequency Domain [8.64369418938889]
We propose a generalized backdoor attack method based on the frequency domain.
It can implement backdoor implantation without mislabeling and accessing the training process.
We evaluate our approach in the no-label and clean-label cases on three datasets.
arXiv Detail & Related papers (2022-07-09T07:05:53Z) - Adversarial Fine-tuning for Backdoor Defense: Connect Adversarial
Examples to Triggered Samples [15.57457705138278]
We propose a new Adversarial Fine-Tuning (AFT) approach to erase backdoor triggers.
AFT can effectively erase the backdoor triggers without obvious performance degradation on clean samples.
arXiv Detail & Related papers (2022-02-13T13:41:15Z) - Black-box Detection of Backdoor Attacks with Limited Information and
Data [56.0735480850555]
We propose a black-box backdoor detection (B3D) method to identify backdoor attacks with only query access to the model.
In addition to backdoor detection, we also propose a simple strategy for reliable predictions using the identified backdoored models.
arXiv Detail & Related papers (2021-03-24T12:06:40Z) - Backdoor Learning: A Survey [75.59571756777342]
Backdoor attack intends to embed hidden backdoor into deep neural networks (DNNs)
Backdoor learning is an emerging and rapidly growing research area.
This paper presents the first comprehensive survey of this realm.
arXiv Detail & Related papers (2020-07-17T04:09:20Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.