Towards Stable Backdoor Purification through Feature Shift Tuning
- URL: http://arxiv.org/abs/2310.01875v3
- Date: Sat, 21 Oct 2023 12:37:05 GMT
- Title: Towards Stable Backdoor Purification through Feature Shift Tuning
- Authors: Rui Min, Zeyu Qin, Li Shen, Minhao Cheng
- Abstract summary: Deep neural networks (DNN) are vulnerable to backdoor attacks.
In this paper, we start with fine-tuning, one of the most common and easy-to-deploy backdoor defenses.
We introduce Feature Shift Tuning (FST), a method for tuning-based backdoor purification.
- Score: 22.529990213795216
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: It has been widely observed that deep neural networks (DNN) are vulnerable to
backdoor attacks where attackers could manipulate the model behavior
maliciously by tampering with a small set of training samples. Although a line
of defense methods is proposed to mitigate this threat, they either require
complicated modifications to the training process or heavily rely on the
specific model architecture, which makes them hard to deploy into real-world
applications. Therefore, in this paper, we instead start with fine-tuning, one
of the most common and easy-to-deploy backdoor defenses, through comprehensive
evaluations against diverse attack scenarios. Observations made through initial
experiments show that in contrast to the promising defensive results on high
poisoning rates, vanilla tuning methods completely fail at low poisoning rate
scenarios. Our analysis shows that with the low poisoning rate, the
entanglement between backdoor and clean features undermines the effect of
tuning-based defenses. Therefore, it is necessary to disentangle the backdoor
and clean features in order to improve backdoor purification. To address this,
we introduce Feature Shift Tuning (FST), a method for tuning-based backdoor
purification. Specifically, FST encourages feature shifts by actively deviating
the classifier weights from the originally compromised weights. Extensive
experiments demonstrate that our FST provides consistently stable performance
under different attack settings. Without complex parameter adjustments, FST
also achieves much lower tuning costs, only 10 epochs. Our codes are available
at https://github.com/AISafety-HKUST/stable_backdoor_purification.
Related papers
- Uncovering, Explaining, and Mitigating the Superficial Safety of Backdoor Defense [27.471096446155933]
We investigate the Post-Purification Robustness of current backdoor purification methods.
We find that current safety purification methods are vulnerable to the rapid re-learning of backdoor behavior.
We propose a tuning defense, Path-Aware Minimization (PAM), which promotes deviation along backdoor-connected paths with extra model updates.
arXiv Detail & Related papers (2024-10-13T13:37:36Z) - Efficient Backdoor Defense in Multimodal Contrastive Learning: A Token-Level Unlearning Method for Mitigating Threats [52.94388672185062]
We propose an efficient defense mechanism against backdoor threats using a concept known as machine unlearning.
This entails strategically creating a small set of poisoned samples to aid the model's rapid unlearning of backdoor vulnerabilities.
In the backdoor unlearning process, we present a novel token-based portion unlearning training regime.
arXiv Detail & Related papers (2024-09-29T02:55:38Z) - Defending Against Weight-Poisoning Backdoor Attacks for Parameter-Efficient Fine-Tuning [57.50274256088251]
We show that parameter-efficient fine-tuning (PEFT) is more susceptible to weight-poisoning backdoor attacks.
We develop a Poisoned Sample Identification Module (PSIM) leveraging PEFT, which identifies poisoned samples through confidence.
We conduct experiments on text classification tasks, five fine-tuning strategies, and three weight-poisoning backdoor attack methods.
arXiv Detail & Related papers (2024-02-19T14:22:54Z) - Model Supply Chain Poisoning: Backdooring Pre-trained Models via Embedding Indistinguishability [61.549465258257115]
We propose a novel and severer backdoor attack, TransTroj, which enables the backdoors embedded in PTMs to efficiently transfer in the model supply chain.
Experimental results show that our method significantly outperforms SOTA task-agnostic backdoor attacks.
arXiv Detail & Related papers (2024-01-29T04:35:48Z) - Backdoor Mitigation by Correcting the Distribution of Neural Activations [30.554700057079867]
Backdoor (Trojan) attacks are an important type of adversarial exploit against deep neural networks (DNNs)
We analyze an important property of backdoor attacks: a successful attack causes an alteration in the distribution of internal layer activations for backdoor-trigger instances.
We propose an efficient and effective method that achieves post-training backdoor mitigation by correcting the distribution alteration.
arXiv Detail & Related papers (2023-08-18T22:52:29Z) - Backdoor Attack with Sparse and Invisible Trigger [57.41876708712008]
Deep neural networks (DNNs) are vulnerable to backdoor attacks.
backdoor attack is an emerging yet threatening training-phase threat.
We propose a sparse and invisible backdoor attack (SIBA)
arXiv Detail & Related papers (2023-05-11T10:05:57Z) - BATT: Backdoor Attack with Transformation-based Triggers [72.61840273364311]
Deep neural networks (DNNs) are vulnerable to backdoor attacks.
Backdoor adversaries inject hidden backdoors that can be activated by adversary-specified trigger patterns.
One recent research revealed that most of the existing attacks failed in the real physical world.
arXiv Detail & Related papers (2022-11-02T16:03:43Z) - Backdoor Defense via Suppressing Model Shortcuts [91.30995749139012]
In this paper, we explore the backdoor mechanism from the angle of the model structure.
We demonstrate that the attack success rate (ASR) decreases significantly when reducing the outputs of some key skip connections.
arXiv Detail & Related papers (2022-11-02T15:39:19Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.