Related papers: Fine-Tuning Is All You Need to Mitigate Backdoor Attacks

Fine-Tuning Is All You Need to Mitigate Backdoor Attacks

URL: http://arxiv.org/abs/2212.09067v1
Date: Sun, 18 Dec 2022 11:30:59 GMT
Title: Fine-Tuning Is All You Need to Mitigate Backdoor Attacks
Authors: Zeyang Sha and Xinlei He and Pascal Berrang and Mathias Humbert and Yang Zhang
Abstract summary: We show that fine-tuning can effectively remove backdoors from machine learning models while maintaining high model utility. We coin a new term, namely backdoor sequela, to measure the changes in model vulnerabilities to other attacks before and after the backdoor has been removed.
Score: 10.88508085229675
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Backdoor attacks represent one of the major threats to machine learning models. Various efforts have been made to mitigate backdoors. However, existing defenses have become increasingly complex and often require high computational resources or may also jeopardize models' utility. In this work, we show that fine-tuning, one of the most common and easy-to-adopt machine learning training operations, can effectively remove backdoors from machine learning models while maintaining high model utility. Extensive experiments over three machine learning paradigms show that fine-tuning and our newly proposed super-fine-tuning achieve strong defense performance. Furthermore, we coin a new term, namely backdoor sequela, to measure the changes in model vulnerabilities to other attacks before and after the backdoor has been removed. Empirical evaluation shows that, compared to other defense methods, super-fine-tuning leaves limited backdoor sequela. We hope our results can help machine learning model owners better protect their models from backdoor threats. Also, it calls for the design of more advanced attacks in order to comprehensively assess machine learning models' backdoor vulnerabilities.

Related papers

Behavior Backdoor for Deep Learning Models [95.50787731231063]
We take the first step towards behavioral backdoor'' attack, which is defined as a behavior-triggered backdoor model training procedure. We propose the first pipeline of implementing behavior backdoor, i.e., the Quantification Backdoor (QB) attack. Experiments have been conducted on different models, datasets, and tasks, demonstrating the effectiveness of this novel backdoor attack.
arXiv Detail & Related papers (2024-12-02T10:54:02Z)
Expose Before You Defend: Unifying and Enhancing Backdoor Defenses via Exposed Models [68.40324627475499]
We introduce a novel two-step defense framework named Expose Before You Defend. EBYD unifies existing backdoor defense methods into a comprehensive defense system with enhanced performance. We conduct extensive experiments on 10 image attacks and 6 text attacks across 2 vision datasets and 4 language datasets.
arXiv Detail & Related papers (2024-10-25T09:36:04Z)
Unlearning Backdoor Attacks for LLMs with Weak-to-Strong Knowledge Distillation [10.888542040021962]
W2SDefense is a weak-to-strong unlearning algorithm to defend against backdoor attacks. We conduct experiments on text classification tasks involving three state-of-the-art language models and three different backdoor attack algorithms.
arXiv Detail & Related papers (2024-10-18T12:39:32Z)
Efficient Backdoor Defense in Multimodal Contrastive Learning: A Token-Level Unlearning Method for Mitigating Threats [52.94388672185062]
We propose an efficient defense mechanism against backdoor threats using a concept known as machine unlearning. This entails strategically creating a small set of poisoned samples to aid the model's rapid unlearning of backdoor vulnerabilities. In the backdoor unlearning process, we present a novel token-based portion unlearning training regime.
arXiv Detail & Related papers (2024-09-29T02:55:38Z)
Mitigating Backdoor Attacks using Activation-Guided Model Editing [8.00994004466919]
Backdoor attacks compromise the integrity and reliability of machine learning models. We propose a novel backdoor mitigation approach via machine unlearning to counter such backdoor attacks.
arXiv Detail & Related papers (2024-07-10T13:43:47Z)
TrojFM: Resource-efficient Backdoor Attacks against Very Large Foundation Models [69.37990698561299]
TrojFM is a novel backdoor attack tailored for very large foundation models. Our approach injects backdoors by fine-tuning only a very small proportion of model parameters. We demonstrate that TrojFM can launch effective backdoor attacks against widely used large GPT-style models.
arXiv Detail & Related papers (2024-05-27T03:10:57Z)
Mitigating Backdoor Attack by Injecting Proactive Defensive Backdoor [63.84477483795964]
Data-poisoning backdoor attacks are serious security threats to machine learning models. In this paper, we focus on in-training backdoor defense, aiming to train a clean model even when the dataset may be potentially poisoned. We propose a novel defense approach called PDB (Proactive Defensive Backdoor)
arXiv Detail & Related papers (2024-05-25T07:52:26Z)
Enhancing Fine-Tuning Based Backdoor Defense with Sharpness-Aware Minimization [27.964431092997504]
Fine-tuning based on benign data is a natural defense to erase the backdoor effect in a backdoored model. We propose FTSAM, a novel backdoor defense paradigm that aims to shrink the norms of backdoor-related neurons by incorporating sharpness-aware minimization with fine-tuning.
arXiv Detail & Related papers (2023-04-24T05:13:52Z)
Evil from Within: Machine Learning Backdoors through Hardware Trojans [72.99519529521919]
Backdoors pose a serious threat to machine learning, as they can compromise the integrity of security-critical systems, such as self-driving cars. We introduce a backdoor attack that completely resides within a common hardware accelerator for machine learning. We demonstrate the practical feasibility of our attack by implanting our hardware trojan into the Xilinx Vitis AI DPU.
arXiv Detail & Related papers (2023-04-17T16:24:48Z)
Architectural Backdoors in Neural Networks [27.315196801989032]
We introduce a new class of backdoor attacks that hide inside model architectures. These backdoors are simple to implement, for instance by publishing open-source code for a backdoored model architecture. We demonstrate that model architectural backdoors represent a real threat and, unlike other approaches, can survive a complete re-training from scratch.
arXiv Detail & Related papers (2022-06-15T22:44:03Z)
Check Your Other Door! Establishing Backdoor Attacks in the Frequency Domain [80.24811082454367]
We show the advantages of utilizing the frequency domain for establishing undetectable and powerful backdoor attacks. We also show two possible defences that succeed against frequency-based backdoor attacks and possible ways for the attacker to bypass them.
arXiv Detail & Related papers (2021-09-12T12:44:52Z)
Blind Backdoors in Deep Learning Models [22.844973592524966]
We investigate a new method for injecting backdoors into machine learning models, based on compromising the loss-value computation in the model-training code. We use it to demonstrate new classes of backdoors strictly more powerful than those in the prior literature. Our attack is blind: the attacker cannot modify the training data, nor observe the execution of his code, nor access the resulting model.
arXiv Detail & Related papers (2020-05-08T02:15:53Z)

This list is automatically generated from the titles and abstracts of the papers in this site.