Related papers: PAD-FT: A Lightweight Defense for Backdoor Attacks via Data Purification and Fine-Tuning

PAD-FT: A Lightweight Defense for Backdoor Attacks via Data Purification and Fine-Tuning

URL: http://arxiv.org/abs/2409.12072v1
Date: Wed, 18 Sep 2024 15:47:23 GMT
Title: PAD-FT: A Lightweight Defense for Backdoor Attacks via Data Purification and Fine-Tuning
Authors: Yukai Xu, Yujie Gu, Kouichi Sakurai,
Abstract summary: Backdoor attacks pose a significant threat to deep neural networks. We propose a novel mechanism, PAD-FT, that does not require an additional clean dataset and fine-tunes only a very small part of the model to disinfect the victim model. Our mechanism demonstrates superior effectiveness across multiple backdoor attack methods and datasets.
Score: 4.337364406035291
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Backdoor attacks pose a significant threat to deep neural networks, particularly as recent advancements have led to increasingly subtle implantation, making the defense more challenging. Existing defense mechanisms typically rely on an additional clean dataset as a standard reference and involve retraining an auxiliary model or fine-tuning the entire victim model. However, these approaches are often computationally expensive and not always feasible in practical applications. In this paper, we propose a novel and lightweight defense mechanism, termed PAD-FT, that does not require an additional clean dataset and fine-tunes only a very small part of the model to disinfect the victim model. To achieve this, our approach first introduces a simple data purification process to identify and select the most-likely clean data from the poisoned training dataset. The self-purified clean dataset is then used for activation clipping and fine-tuning only the last classification layer of the victim model. By integrating data purification, activation clipping, and classifier fine-tuning, our mechanism PAD-FT demonstrates superior effectiveness across multiple backdoor attack methods and datasets, as confirmed through extensive experimental evaluation.

Related papers

Variance-Based Defense Against Blended Backdoor Attacks [0.0]
Backdoor attacks represent a subtle yet effective class of cyberattacks targeting AI models.<n>We propose a novel defense method that trains a model on the given dataset, detects poisoned classes, and extracts the critical part of the attack trigger.
arXiv Detail & Related papers (2025-06-02T09:01:35Z)
Revisiting the Auxiliary Data in Backdoor Purification [35.689214077873764]
Backdoor attacks occur when an attacker subtly manipulates machine learning models during the training phase. To mitigate such emerging threats, a prevalent strategy is to cleanse the victim models by various backdoor purification techniques. This study assesses the SOTA backdoor purification techniques across different types of real-world auxiliary datasets.
arXiv Detail & Related papers (2025-02-11T03:46:35Z)
Fine-tuning is Not Fine: Mitigating Backdoor Attacks in GNNs with Limited Clean Data [51.745219224707384]
Graph Neural Networks (GNNs) have achieved remarkable performance through their message-passing mechanism. Recent studies have highlighted the vulnerability of GNNs to backdoor attacks. In this paper, we propose a practical backdoor mitigation framework, denoted as GRAPHNAD.
arXiv Detail & Related papers (2025-01-10T10:16:35Z)
Defending Against Neural Network Model Inversion Attacks via Data Poisoning [15.099559883494475]
Model inversion attacks pose a significant privacy threat to machine learning models. This paper introduces a novel defense mechanism to better balance privacy and utility. We propose a strategy that leverages data poisoning to contaminate the training data of inversion models.
arXiv Detail & Related papers (2024-12-10T15:08:56Z)
Long-Tailed Backdoor Attack Using Dynamic Data Augmentation Operations [50.1394620328318]
Existing backdoor attacks mainly focus on balanced datasets. We propose an effective backdoor attack named Dynamic Data Augmentation Operation (D$2$AO) Our method can achieve the state-of-the-art attack performance while preserving the clean accuracy.
arXiv Detail & Related papers (2024-10-16T18:44:22Z)
Unlearnable Examples Detection via Iterative Filtering [84.59070204221366]
Deep neural networks are proven to be vulnerable to data poisoning attacks. It is quite beneficial and challenging to detect poisoned samples from a mixed dataset. We propose an Iterative Filtering approach for UEs identification.
arXiv Detail & Related papers (2024-08-15T13:26:13Z)
Mellivora Capensis: A Backdoor-Free Training Framework on the Poisoned Dataset without Auxiliary Data [29.842087372804905]
This paper addresses the challenges of backdoor attack countermeasures in real-world scenarios. We propose a robust and clean-data-free backdoor defense framework, namely Mellivora Capensis (textttMeCa), which enables the model trainer to train a clean model on the poisoned dataset.
arXiv Detail & Related papers (2024-05-21T12:20:19Z)
Protecting Model Adaptation from Trojans in the Unlabeled Data [120.42853706967188]
This paper explores the potential trojan attacks on model adaptation launched by well-designed poisoning target data. We propose a plug-and-play method named DiffAdapt, which can be seamlessly integrated with existing adaptation algorithms.
arXiv Detail & Related papers (2024-01-11T16:42:10Z)
Progressive Poisoned Data Isolation for Training-time Backdoor Defense [23.955347169187917]
Deep Neural Networks (DNN) are susceptible to backdoor attacks where malicious attackers manipulate the model's predictions via data poisoning. In this study, we present a novel and efficacious defense method, termed Progressive Isolation of Poisoned Data (PIPD) Our PIPD achieves an average True Positive Rate (TPR) of 99.95% and an average False Positive Rate (FPR) of 0.06% for diverse attacks over CIFAR-10 dataset.
arXiv Detail & Related papers (2023-12-20T02:40:28Z)
Rethinking Backdoor Attacks on Dataset Distillation: A Kernel Method Perspective [65.70799289211868]
We introduce two new theory-driven trigger pattern generation methods specialized for dataset distillation. We show that our optimization-based trigger design framework informs effective backdoor attacks on dataset distillation.
arXiv Detail & Related papers (2023-11-28T09:53:05Z)
Setting the Trap: Capturing and Defeating Backdoors in Pretrained Language Models through Honeypots [68.84056762301329]
Recent research has exposed the susceptibility of pretrained language models (PLMs) to backdoor attacks. We propose and integrate a honeypot module into the original PLM to absorb backdoor information exclusively. Our design is motivated by the observation that lower-layer representations in PLMs carry sufficient backdoor features.
arXiv Detail & Related papers (2023-10-28T08:21:16Z)
Neural Polarizer: A Lightweight and Effective Backdoor Defense via Purifying Poisoned Features [62.82817831278743]
Recent studies have demonstrated the susceptibility of deep neural networks to backdoor attacks. We propose a novel backdoor defense method by inserting a learnable neural polarizer into the backdoored model as an intermediate layer.
arXiv Detail & Related papers (2023-06-29T05:39:58Z)
Backdoor Attacks Against Dataset Distillation [24.39067295054253]
This study performs the first backdoor attack against the models trained on the data distilled by dataset distillation models in the image domain. We propose two types of backdoor attacks, namely NAIVEATTACK and DOORPING. Empirical evaluation shows that NAIVEATTACK achieves decent attack success rate (ASR) scores in some cases, while DOORPING reaches higher ASR scores (close to 1.0) in all cases.
arXiv Detail & Related papers (2023-01-03T16:58:34Z)
One-shot Neural Backdoor Erasing via Adversarial Weight Masking [8.345632941376673]
Adversarial Weight Masking (AWM) is a novel method capable of erasing the neural backdoors even in the one-shot setting. AWM can largely improve the purifying effects over other state-of-the-art methods on various available training dataset sizes.
arXiv Detail & Related papers (2022-07-10T16:18:39Z)
Backdoor Defense with Machine Unlearning [32.968653927933296]
We propose BAERASE, a novel method that can erase the backdoor injected into the victim model through machine unlearning. BAERASE can averagely lower the attack success rates of three kinds of state-of-the-art backdoor attacks by 99% on four benchmark datasets.
arXiv Detail & Related papers (2022-01-24T09:09:12Z)
How Robust are Randomized Smoothing based Defenses to Data Poisoning? [66.80663779176979]
We present a previously unrecognized threat to robust machine learning models that highlights the importance of training-data quality. We propose a novel bilevel optimization-based data poisoning attack that degrades the robustness guarantees of certifiably robust classifiers. Our attack is effective even when the victim trains the models from scratch using state-of-the-art robust training methods.
arXiv Detail & Related papers (2020-12-02T15:30:21Z)

This list is automatically generated from the titles and abstracts of the papers in this site.