Fine-Tuning Is All You Need to Mitigate Backdoor Attacks
- URL: http://arxiv.org/abs/2212.09067v1
- Date: Sun, 18 Dec 2022 11:30:59 GMT
- Title: Fine-Tuning Is All You Need to Mitigate Backdoor Attacks
- Authors: Zeyang Sha and Xinlei He and Pascal Berrang and Mathias Humbert and
Yang Zhang
- Abstract summary: We show that fine-tuning can effectively remove backdoors from machine learning models while maintaining high model utility.
We coin a new term, namely backdoor sequela, to measure the changes in model vulnerabilities to other attacks before and after the backdoor has been removed.
- Score: 10.88508085229675
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Backdoor attacks represent one of the major threats to machine learning
models. Various efforts have been made to mitigate backdoors. However, existing
defenses have become increasingly complex and often require high computational
resources or may also jeopardize models' utility. In this work, we show that
fine-tuning, one of the most common and easy-to-adopt machine learning training
operations, can effectively remove backdoors from machine learning models while
maintaining high model utility. Extensive experiments over three machine
learning paradigms show that fine-tuning and our newly proposed
super-fine-tuning achieve strong defense performance. Furthermore, we coin a
new term, namely backdoor sequela, to measure the changes in model
vulnerabilities to other attacks before and after the backdoor has been
removed. Empirical evaluation shows that, compared to other defense methods,
super-fine-tuning leaves limited backdoor sequela. We hope our results can help
machine learning model owners better protect their models from backdoor
threats. Also, it calls for the design of more advanced attacks in order to
comprehensively assess machine learning models' backdoor vulnerabilities.
Related papers
- Expose Before You Defend: Unifying and Enhancing Backdoor Defenses via Exposed Models [68.40324627475499]
We introduce a novel two-step defense framework named Expose Before You Defend.
EBYD unifies existing backdoor defense methods into a comprehensive defense system with enhanced performance.
We conduct extensive experiments on 10 image attacks and 6 text attacks across 2 vision datasets and 4 language datasets.
arXiv Detail & Related papers (2024-10-25T09:36:04Z) - Unlearning Backdoor Attacks for LLMs with Weak-to-Strong Knowledge Distillation [10.888542040021962]
W2SDefense is a weak-to-strong unlearning algorithm to defend against backdoor attacks.
We conduct experiments on text classification tasks involving three state-of-the-art language models and three different backdoor attack algorithms.
arXiv Detail & Related papers (2024-10-18T12:39:32Z) - Efficient Backdoor Defense in Multimodal Contrastive Learning: A Token-Level Unlearning Method for Mitigating Threats [52.94388672185062]
We propose an efficient defense mechanism against backdoor threats using a concept known as machine unlearning.
This entails strategically creating a small set of poisoned samples to aid the model's rapid unlearning of backdoor vulnerabilities.
In the backdoor unlearning process, we present a novel token-based portion unlearning training regime.
arXiv Detail & Related papers (2024-09-29T02:55:38Z) - Mitigating Backdoor Attacks using Activation-Guided Model Editing [8.00994004466919]
Backdoor attacks compromise the integrity and reliability of machine learning models.
We propose a novel backdoor mitigation approach via machine unlearning to counter such backdoor attacks.
arXiv Detail & Related papers (2024-07-10T13:43:47Z) - TrojFM: Resource-efficient Backdoor Attacks against Very Large Foundation Models [69.37990698561299]
TrojFM is a novel backdoor attack tailored for very large foundation models.
Our approach injects backdoors by fine-tuning only a very small proportion of model parameters.
We demonstrate that TrojFM can launch effective backdoor attacks against widely used large GPT-style models.
arXiv Detail & Related papers (2024-05-27T03:10:57Z) - Mitigating Backdoor Attack by Injecting Proactive Defensive Backdoor [63.84477483795964]
Data-poisoning backdoor attacks are serious security threats to machine learning models.
In this paper, we focus on in-training backdoor defense, aiming to train a clean model even when the dataset may be potentially poisoned.
We propose a novel defense approach called PDB (Proactive Defensive Backdoor)
arXiv Detail & Related papers (2024-05-25T07:52:26Z) - Enhancing Fine-Tuning Based Backdoor Defense with Sharpness-Aware
Minimization [27.964431092997504]
Fine-tuning based on benign data is a natural defense to erase the backdoor effect in a backdoored model.
We propose FTSAM, a novel backdoor defense paradigm that aims to shrink the norms of backdoor-related neurons by incorporating sharpness-aware minimization with fine-tuning.
arXiv Detail & Related papers (2023-04-24T05:13:52Z) - Evil from Within: Machine Learning Backdoors through Hardware Trojans [72.99519529521919]
Backdoors pose a serious threat to machine learning, as they can compromise the integrity of security-critical systems, such as self-driving cars.
We introduce a backdoor attack that completely resides within a common hardware accelerator for machine learning.
We demonstrate the practical feasibility of our attack by implanting our hardware trojan into the Xilinx Vitis AI DPU.
arXiv Detail & Related papers (2023-04-17T16:24:48Z) - Architectural Backdoors in Neural Networks [27.315196801989032]
We introduce a new class of backdoor attacks that hide inside model architectures.
These backdoors are simple to implement, for instance by publishing open-source code for a backdoored model architecture.
We demonstrate that model architectural backdoors represent a real threat and, unlike other approaches, can survive a complete re-training from scratch.
arXiv Detail & Related papers (2022-06-15T22:44:03Z) - Check Your Other Door! Establishing Backdoor Attacks in the Frequency
Domain [80.24811082454367]
We show the advantages of utilizing the frequency domain for establishing undetectable and powerful backdoor attacks.
We also show two possible defences that succeed against frequency-based backdoor attacks and possible ways for the attacker to bypass them.
arXiv Detail & Related papers (2021-09-12T12:44:52Z) - Blind Backdoors in Deep Learning Models [22.844973592524966]
We investigate a new method for injecting backdoors into machine learning models, based on compromising the loss-value computation in the model-training code.
We use it to demonstrate new classes of backdoors strictly more powerful than those in the prior literature.
Our attack is blind: the attacker cannot modify the training data, nor observe the execution of his code, nor access the resulting model.
arXiv Detail & Related papers (2020-05-08T02:15:53Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.