PAD-FT: A Lightweight Defense for Backdoor Attacks via Data Purification and Fine-Tuning
- URL: http://arxiv.org/abs/2409.12072v1
- Date: Wed, 18 Sep 2024 15:47:23 GMT
- Title: PAD-FT: A Lightweight Defense for Backdoor Attacks via Data Purification and Fine-Tuning
- Authors: Yukai Xu, Yujie Gu, Kouichi Sakurai,
- Abstract summary: Backdoor attacks pose a significant threat to deep neural networks.
We propose a novel mechanism, PAD-FT, that does not require an additional clean dataset and fine-tunes only a very small part of the model to disinfect the victim model.
Our mechanism demonstrates superior effectiveness across multiple backdoor attack methods and datasets.
- Score: 4.337364406035291
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Backdoor attacks pose a significant threat to deep neural networks, particularly as recent advancements have led to increasingly subtle implantation, making the defense more challenging. Existing defense mechanisms typically rely on an additional clean dataset as a standard reference and involve retraining an auxiliary model or fine-tuning the entire victim model. However, these approaches are often computationally expensive and not always feasible in practical applications. In this paper, we propose a novel and lightweight defense mechanism, termed PAD-FT, that does not require an additional clean dataset and fine-tunes only a very small part of the model to disinfect the victim model. To achieve this, our approach first introduces a simple data purification process to identify and select the most-likely clean data from the poisoned training dataset. The self-purified clean dataset is then used for activation clipping and fine-tuning only the last classification layer of the victim model. By integrating data purification, activation clipping, and classifier fine-tuning, our mechanism PAD-FT demonstrates superior effectiveness across multiple backdoor attack methods and datasets, as confirmed through extensive experimental evaluation.
Related papers
- Revisiting the Auxiliary Data in Backdoor Purification [35.689214077873764]
Backdoor attacks occur when an attacker subtly manipulates machine learning models during the training phase.
To mitigate such emerging threats, a prevalent strategy is to cleanse the victim models by various backdoor purification techniques.
This study assesses the SOTA backdoor purification techniques across different types of real-world auxiliary datasets.
arXiv Detail & Related papers (2025-02-11T03:46:35Z) - Fine-tuning is Not Fine: Mitigating Backdoor Attacks in GNNs with Limited Clean Data [51.745219224707384]
Graph Neural Networks (GNNs) have achieved remarkable performance through their message-passing mechanism.
Recent studies have highlighted the vulnerability of GNNs to backdoor attacks.
In this paper, we propose a practical backdoor mitigation framework, denoted as GRAPHNAD.
arXiv Detail & Related papers (2025-01-10T10:16:35Z) - Defending Against Neural Network Model Inversion Attacks via Data Poisoning [15.099559883494475]
Model inversion attacks pose a significant privacy threat to machine learning models.
This paper introduces a novel defense mechanism to better balance privacy and utility.
We propose a strategy that leverages data poisoning to contaminate the training data of inversion models.
arXiv Detail & Related papers (2024-12-10T15:08:56Z) - Long-Tailed Backdoor Attack Using Dynamic Data Augmentation Operations [50.1394620328318]
Existing backdoor attacks mainly focus on balanced datasets.
We propose an effective backdoor attack named Dynamic Data Augmentation Operation (D$2$AO)
Our method can achieve the state-of-the-art attack performance while preserving the clean accuracy.
arXiv Detail & Related papers (2024-10-16T18:44:22Z) - Protecting Model Adaptation from Trojans in the Unlabeled Data [120.42853706967188]
This paper explores the potential trojan attacks on model adaptation launched by well-designed poisoning target data.
We propose a plug-and-play method named DiffAdapt, which can be seamlessly integrated with existing adaptation algorithms.
arXiv Detail & Related papers (2024-01-11T16:42:10Z) - Rethinking Backdoor Attacks on Dataset Distillation: A Kernel Method
Perspective [65.70799289211868]
We introduce two new theory-driven trigger pattern generation methods specialized for dataset distillation.
We show that our optimization-based trigger design framework informs effective backdoor attacks on dataset distillation.
arXiv Detail & Related papers (2023-11-28T09:53:05Z) - Neural Polarizer: A Lightweight and Effective Backdoor Defense via
Purifying Poisoned Features [62.82817831278743]
Recent studies have demonstrated the susceptibility of deep neural networks to backdoor attacks.
We propose a novel backdoor defense method by inserting a learnable neural polarizer into the backdoored model as an intermediate layer.
arXiv Detail & Related papers (2023-06-29T05:39:58Z) - Backdoor Attacks Against Dataset Distillation [24.39067295054253]
This study performs the first backdoor attack against the models trained on the data distilled by dataset distillation models in the image domain.
We propose two types of backdoor attacks, namely NAIVEATTACK and DOORPING.
Empirical evaluation shows that NAIVEATTACK achieves decent attack success rate (ASR) scores in some cases, while DOORPING reaches higher ASR scores (close to 1.0) in all cases.
arXiv Detail & Related papers (2023-01-03T16:58:34Z) - One-shot Neural Backdoor Erasing via Adversarial Weight Masking [8.345632941376673]
Adversarial Weight Masking (AWM) is a novel method capable of erasing the neural backdoors even in the one-shot setting.
AWM can largely improve the purifying effects over other state-of-the-art methods on various available training dataset sizes.
arXiv Detail & Related papers (2022-07-10T16:18:39Z) - Backdoor Defense with Machine Unlearning [32.968653927933296]
We propose BAERASE, a novel method that can erase the backdoor injected into the victim model through machine unlearning.
BAERASE can averagely lower the attack success rates of three kinds of state-of-the-art backdoor attacks by 99% on four benchmark datasets.
arXiv Detail & Related papers (2022-01-24T09:09:12Z) - How Robust are Randomized Smoothing based Defenses to Data Poisoning? [66.80663779176979]
We present a previously unrecognized threat to robust machine learning models that highlights the importance of training-data quality.
We propose a novel bilevel optimization-based data poisoning attack that degrades the robustness guarantees of certifiably robust classifiers.
Our attack is effective even when the victim trains the models from scratch using state-of-the-art robust training methods.
arXiv Detail & Related papers (2020-12-02T15:30:21Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.