Towards Unified Robustness Against Both Backdoor and Adversarial Attacks
- URL: http://arxiv.org/abs/2405.17929v1
- Date: Tue, 28 May 2024 07:50:00 GMT
- Title: Towards Unified Robustness Against Both Backdoor and Adversarial Attacks
- Authors: Zhenxing Niu, Yuyao Sun, Qiguang Miao, Rong Jin, Gang Hua,
- Abstract summary: Deep Neural Networks (DNNs) are known to be vulnerable to both backdoor and adversarial attacks.
This paper reveals that there is an intriguing connection between backdoor and adversarial attacks.
A novel Progressive Unified Defense algorithm is proposed to defend against backdoor and adversarial attacks simultaneously.
- Score: 31.846262387360767
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Deep Neural Networks (DNNs) are known to be vulnerable to both backdoor and adversarial attacks. In the literature, these two types of attacks are commonly treated as distinct robustness problems and solved separately, since they belong to training-time and inference-time attacks respectively. However, this paper revealed that there is an intriguing connection between them: (1) planting a backdoor into a model will significantly affect the model's adversarial examples; (2) for an infected model, its adversarial examples have similar features as the triggered images. Based on these observations, a novel Progressive Unified Defense (PUD) algorithm is proposed to defend against backdoor and adversarial attacks simultaneously. Specifically, our PUD has a progressive model purification scheme to jointly erase backdoors and enhance the model's adversarial robustness. At the early stage, the adversarial examples of infected models are utilized to erase backdoors. With the backdoor gradually erased, our model purification can naturally turn into a stage to boost the model's robustness against adversarial attacks. Besides, our PUD algorithm can effectively identify poisoned images, which allows the initial extra dataset not to be completely clean. Extensive experimental results show that, our discovered connection between backdoor and adversarial attacks is ubiquitous, no matter what type of backdoor attack. The proposed PUD outperforms the state-of-the-art backdoor defense, including the model repairing-based and data filtering-based methods. Besides, it also has the ability to compete with the most advanced adversarial defense methods.
Related papers
- Efficient Backdoor Defense in Multimodal Contrastive Learning: A Token-Level Unlearning Method for Mitigating Threats [52.94388672185062]
We propose an efficient defense mechanism against backdoor threats using a concept known as machine unlearning.
This entails strategically creating a small set of poisoned samples to aid the model's rapid unlearning of backdoor vulnerabilities.
In the backdoor unlearning process, we present a novel token-based portion unlearning training regime.
arXiv Detail & Related papers (2024-09-29T02:55:38Z) - BadCLIP: Dual-Embedding Guided Backdoor Attack on Multimodal Contrastive
Learning [85.2564206440109]
This paper reveals the threats in this practical scenario that backdoor attacks can remain effective even after defenses.
We introduce the emphtoolns attack, which is resistant to backdoor detection and model fine-tuning defenses.
arXiv Detail & Related papers (2023-11-20T02:21:49Z) - Shared Adversarial Unlearning: Backdoor Mitigation by Unlearning Shared
Adversarial Examples [67.66153875643964]
Backdoor attacks are serious security threats to machine learning models.
In this paper, we explore the task of purifying a backdoored model using a small clean dataset.
By establishing the connection between backdoor risk and adversarial risk, we derive a novel upper bound for backdoor risk.
arXiv Detail & Related papers (2023-07-20T03:56:04Z) - Backdoor Defense via Deconfounded Representation Learning [17.28760299048368]
We propose a Causality-inspired Backdoor Defense (CBD) to learn deconfounded representations for reliable classification.
CBD is effective in reducing backdoor threats while maintaining high accuracy in predicting benign samples.
arXiv Detail & Related papers (2023-03-13T02:25:59Z) - Untargeted Backdoor Attack against Object Detection [69.63097724439886]
We design a poison-only backdoor attack in an untargeted manner, based on task characteristics.
We show that, once the backdoor is embedded into the target model by our attack, it can trick the model to lose detection of any object stamped with our trigger patterns.
arXiv Detail & Related papers (2022-11-02T17:05:45Z) - Backdoor Attack against NLP models with Robustness-Aware Perturbation
defense [0.0]
Backdoor attack intends to embed hidden backdoor into deep neural networks (DNNs)
In our work, we break this defense by controlling the robustness gap between poisoned and clean samples using adversarial training step.
arXiv Detail & Related papers (2022-04-08T10:08:07Z) - On the Effectiveness of Adversarial Training against Backdoor Attacks [111.8963365326168]
A backdoored model always predicts a target class in the presence of a predefined trigger pattern.
In general, adversarial training is believed to defend against backdoor attacks.
We propose a hybrid strategy which provides satisfactory robustness across different backdoor attacks.
arXiv Detail & Related papers (2022-02-22T02:24:46Z) - Adversarial Fine-tuning for Backdoor Defense: Connect Adversarial
Examples to Triggered Samples [15.57457705138278]
We propose a new Adversarial Fine-Tuning (AFT) approach to erase backdoor triggers.
AFT can effectively erase the backdoor triggers without obvious performance degradation on clean samples.
arXiv Detail & Related papers (2022-02-13T13:41:15Z) - Black-box Detection of Backdoor Attacks with Limited Information and
Data [56.0735480850555]
We propose a black-box backdoor detection (B3D) method to identify backdoor attacks with only query access to the model.
In addition to backdoor detection, we also propose a simple strategy for reliable predictions using the identified backdoored models.
arXiv Detail & Related papers (2021-03-24T12:06:40Z) - Can We Mitigate Backdoor Attack Using Adversarial Detection Methods? [26.8404758315088]
We conduct comprehensive studies on the connections between adversarial examples and backdoor examples of Deep Neural Networks.
Our insights are based on the observation that both adversarial examples and backdoor examples have anomalies during the inference process.
We revise four existing adversarial defense methods for detecting backdoor examples.
arXiv Detail & Related papers (2020-06-26T09:09:27Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.