PAD-FT: A Lightweight Defense for Backdoor Attacks via Data Purification and Fine-Tuning
- URL: http://arxiv.org/abs/2409.12072v1
- Date: Wed, 18 Sep 2024 15:47:23 GMT
- Title: PAD-FT: A Lightweight Defense for Backdoor Attacks via Data Purification and Fine-Tuning
- Authors: Yukai Xu, Yujie Gu, Kouichi Sakurai,
- Abstract summary: Backdoor attacks pose a significant threat to deep neural networks.
We propose a novel mechanism, PAD-FT, that does not require an additional clean dataset and fine-tunes only a very small part of the model to disinfect the victim model.
Our mechanism demonstrates superior effectiveness across multiple backdoor attack methods and datasets.
- Score: 4.337364406035291
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Backdoor attacks pose a significant threat to deep neural networks, particularly as recent advancements have led to increasingly subtle implantation, making the defense more challenging. Existing defense mechanisms typically rely on an additional clean dataset as a standard reference and involve retraining an auxiliary model or fine-tuning the entire victim model. However, these approaches are often computationally expensive and not always feasible in practical applications. In this paper, we propose a novel and lightweight defense mechanism, termed PAD-FT, that does not require an additional clean dataset and fine-tunes only a very small part of the model to disinfect the victim model. To achieve this, our approach first introduces a simple data purification process to identify and select the most-likely clean data from the poisoned training dataset. The self-purified clean dataset is then used for activation clipping and fine-tuning only the last classification layer of the victim model. By integrating data purification, activation clipping, and classifier fine-tuning, our mechanism PAD-FT demonstrates superior effectiveness across multiple backdoor attack methods and datasets, as confirmed through extensive experimental evaluation.
Related papers
- Long-Tailed Backdoor Attack Using Dynamic Data Augmentation Operations [50.1394620328318]
Existing backdoor attacks mainly focus on balanced datasets.
We propose an effective backdoor attack named Dynamic Data Augmentation Operation (D$2$AO)
Our method can achieve the state-of-the-art attack performance while preserving the clean accuracy.
arXiv Detail & Related papers (2024-10-16T18:44:22Z) - Unlearnable Examples Detection via Iterative Filtering [84.59070204221366]
Deep neural networks are proven to be vulnerable to data poisoning attacks.
It is quite beneficial and challenging to detect poisoned samples from a mixed dataset.
We propose an Iterative Filtering approach for UEs identification.
arXiv Detail & Related papers (2024-08-15T13:26:13Z) - Mellivora Capensis: A Backdoor-Free Training Framework on the Poisoned Dataset without Auxiliary Data [29.842087372804905]
This paper addresses the challenges of backdoor attack countermeasures in real-world scenarios.
We propose a robust and clean-data-free backdoor defense framework, namely Mellivora Capensis (textttMeCa), which enables the model trainer to train a clean model on the poisoned dataset.
arXiv Detail & Related papers (2024-05-21T12:20:19Z) - Progressive Poisoned Data Isolation for Training-time Backdoor Defense [23.955347169187917]
Deep Neural Networks (DNN) are susceptible to backdoor attacks where malicious attackers manipulate the model's predictions via data poisoning.
In this study, we present a novel and efficacious defense method, termed Progressive Isolation of Poisoned Data (PIPD)
Our PIPD achieves an average True Positive Rate (TPR) of 99.95% and an average False Positive Rate (FPR) of 0.06% for diverse attacks over CIFAR-10 dataset.
arXiv Detail & Related papers (2023-12-20T02:40:28Z) - Rethinking Backdoor Attacks on Dataset Distillation: A Kernel Method
Perspective [65.70799289211868]
We introduce two new theory-driven trigger pattern generation methods specialized for dataset distillation.
We show that our optimization-based trigger design framework informs effective backdoor attacks on dataset distillation.
arXiv Detail & Related papers (2023-11-28T09:53:05Z) - Setting the Trap: Capturing and Defeating Backdoors in Pretrained
Language Models through Honeypots [68.84056762301329]
Recent research has exposed the susceptibility of pretrained language models (PLMs) to backdoor attacks.
We propose and integrate a honeypot module into the original PLM to absorb backdoor information exclusively.
Our design is motivated by the observation that lower-layer representations in PLMs carry sufficient backdoor features.
arXiv Detail & Related papers (2023-10-28T08:21:16Z) - Neural Polarizer: A Lightweight and Effective Backdoor Defense via
Purifying Poisoned Features [62.82817831278743]
Recent studies have demonstrated the susceptibility of deep neural networks to backdoor attacks.
We propose a novel backdoor defense method by inserting a learnable neural polarizer into the backdoored model as an intermediate layer.
arXiv Detail & Related papers (2023-06-29T05:39:58Z) - Backdoor Attacks Against Dataset Distillation [24.39067295054253]
This study performs the first backdoor attack against the models trained on the data distilled by dataset distillation models in the image domain.
We propose two types of backdoor attacks, namely NAIVEATTACK and DOORPING.
Empirical evaluation shows that NAIVEATTACK achieves decent attack success rate (ASR) scores in some cases, while DOORPING reaches higher ASR scores (close to 1.0) in all cases.
arXiv Detail & Related papers (2023-01-03T16:58:34Z) - One-shot Neural Backdoor Erasing via Adversarial Weight Masking [8.345632941376673]
Adversarial Weight Masking (AWM) is a novel method capable of erasing the neural backdoors even in the one-shot setting.
AWM can largely improve the purifying effects over other state-of-the-art methods on various available training dataset sizes.
arXiv Detail & Related papers (2022-07-10T16:18:39Z) - Backdoor Defense with Machine Unlearning [32.968653927933296]
We propose BAERASE, a novel method that can erase the backdoor injected into the victim model through machine unlearning.
BAERASE can averagely lower the attack success rates of three kinds of state-of-the-art backdoor attacks by 99% on four benchmark datasets.
arXiv Detail & Related papers (2022-01-24T09:09:12Z) - How Robust are Randomized Smoothing based Defenses to Data Poisoning? [66.80663779176979]
We present a previously unrecognized threat to robust machine learning models that highlights the importance of training-data quality.
We propose a novel bilevel optimization-based data poisoning attack that degrades the robustness guarantees of certifiably robust classifiers.
Our attack is effective even when the victim trains the models from scratch using state-of-the-art robust training methods.
arXiv Detail & Related papers (2020-12-02T15:30:21Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.