Related papers: Adversarial Feature Map Pruning for Backdoor

Adversarial Feature Map Pruning for Backdoor

URL: http://arxiv.org/abs/2307.11565v2
Date: Fri, 23 Feb 2024 12:42:24 GMT
Title: Adversarial Feature Map Pruning for Backdoor
Authors: Dong Huang, Qingwen Bu
Abstract summary: We propose Adversarial Feature Map Pruning for Backdoor (FMP) to mitigate backdoor attacks. FMP attempts to prune backdoor feature maps, which are trained to extract backdoor information from inputs. Our experiments demonstrate that, compared to existing defense strategies, FMP can effectively reduce the Attack Success Rate (ASR) even against the most complex and invisible attack triggers.
Score: 4.550555443103878
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Deep neural networks have been widely used in many critical applications, such as autonomous vehicles and medical diagnosis. However, their security is threatened by backdoor attacks, which are achieved by adding artificial patterns to specific training data. Existing defense strategies primarily focus on using reverse engineering to reproduce the backdoor trigger generated by attackers and subsequently repair the DNN model by adding the trigger into inputs and fine-tuning the model with ground-truth labels. However, once the trigger generated by the attackers is complex and invisible, the defender cannot reproduce the trigger successfully then the DNN model will not be repaired, as the trigger is not effectively removed. In this work, we propose Adversarial Feature Map Pruning for Backdoor (FMP) to mitigate backdoor from the DNN. Unlike existing defense strategies, which focus on reproducing backdoor triggers, FMP attempts to prune backdoor feature maps, which are trained to extract backdoor information from inputs. After pruning these backdoor feature maps, FMP will fine-tune the model with a secure subset of training data. Our experiments demonstrate that, compared to existing defense strategies, FMP can effectively reduce the Attack Success Rate (ASR) even against the most complex and invisible attack triggers (e.g., FMP decreases the ASR to 2.86\% in CIFAR10, which is 19.2\% to 65.41\% lower than baselines). Second, unlike conventional defense methods that tend to exhibit low robust accuracy (that is, the accuracy of the model on poisoned data), FMP achieves a higher RA, indicating its superiority in maintaining model performance while mitigating the effects of backdoor attacks (e.g., FMP obtains 87.40\% RA in CIFAR10). Our code is publicly available at: https://github.com/retsuh-bqw/FMP.

Related papers

A Dual-Purpose Framework for Backdoor Defense and Backdoor Amplification in Diffusion Models [5.957580737396457]
PureDiffusion is a dual-purpose framework that simultaneously serves two contrasting roles: backdoor defense and backdoor attack amplification. For defense, we introduce two novel loss functions to invert backdoor triggers embedded in diffusion models. For attack amplification, we describe how our trigger inversion algorithm can be used to reinforce the original trigger embedded in the backdoored diffusion model.
arXiv Detail & Related papers (2025-02-26T11:01:43Z)
An Effective and Resilient Backdoor Attack Framework against Deep Neural Networks and Vision Transformers [22.77836113915616]
We propose a novel attention-based mask generation methodology that searches for the optimal trigger shape and location. We also introduce a Quality-of-Experience term into the loss function and carefully adjust the transparency value of the trigger. Our proposed backdoor attack framework also showcases robustness against state-of-the-art backdoor defenses.
arXiv Detail & Related papers (2024-12-09T02:03:27Z)
"No Matter What You Do": Purifying GNN Models via Backdoor Unlearning [33.07926413485209]
backdoor attacks in GNNs lie in the fact that the attacker modifies a portion of graph data by embedding triggers. We present GCleaner, the first backdoor mitigation method on GNNs. GCleaner can reduce the backdoor attack success rate to 10% with only 1% of clean data, and has almost negligible degradation in model performance.
arXiv Detail & Related papers (2024-10-02T06:30:49Z)
Efficient Backdoor Defense in Multimodal Contrastive Learning: A Token-Level Unlearning Method for Mitigating Threats [52.94388672185062]
We propose an efficient defense mechanism against backdoor threats using a concept known as machine unlearning. This entails strategically creating a small set of poisoned samples to aid the model's rapid unlearning of backdoor vulnerabilities. In the backdoor unlearning process, we present a novel token-based portion unlearning training regime.
arXiv Detail & Related papers (2024-09-29T02:55:38Z)
PureDiffusion: Using Backdoor to Counter Backdoor in Generative Diffusion Models [5.957580737396457]
Diffusion models (DMs) are advanced deep learning models that achieved state-of-the-art capability on a wide range of generative tasks. Recent studies have shown their vulnerability regarding backdoor attacks, in which backdoored DMs consistently generate a designated result called backdoor target. We introduce PureDiffusion, a novel backdoor defense framework that can efficiently detect backdoor attacks by inverting backdoor triggers embedded in DMs.
arXiv Detail & Related papers (2024-09-20T23:19:26Z)
T2IShield: Defending Against Backdoors on Text-to-Image Diffusion Models [70.03122709795122]
We propose a comprehensive defense method named T2IShield to detect, localize, and mitigate backdoor attacks. We find the "Assimilation Phenomenon" on the cross-attention maps caused by the backdoor trigger. For backdoor sample detection, T2IShield achieves a detection F1 score of 88.9$%$ with low computational cost.
arXiv Detail & Related papers (2024-07-05T01:53:21Z)
TrojFM: Resource-efficient Backdoor Attacks against Very Large Foundation Models [69.37990698561299]
TrojFM is a novel backdoor attack tailored for very large foundation models. Our approach injects backdoors by fine-tuning only a very small proportion of model parameters. We demonstrate that TrojFM can launch effective backdoor attacks against widely used large GPT-style models.
arXiv Detail & Related papers (2024-05-27T03:10:57Z)
Mitigating Backdoor Attack by Injecting Proactive Defensive Backdoor [63.84477483795964]
Data-poisoning backdoor attacks are serious security threats to machine learning models. In this paper, we focus on in-training backdoor defense, aiming to train a clean model even when the dataset may be potentially poisoned. We propose a novel defense approach called PDB (Proactive Defensive Backdoor)
arXiv Detail & Related papers (2024-05-25T07:52:26Z)
Dual Model Replacement:invisible Multi-target Backdoor Attack based on Federal Learning [21.600003684064706]
This paper designs a backdoor attack method based on federated learning. aiming at the concealment of the backdoor trigger, a TrojanGan steganography model with encoder-decoder structure is designed. A dual model replacement backdoor attack algorithm based on federated learning is designed.
arXiv Detail & Related papers (2024-04-22T07:44:02Z)
Backdoor Attack with Sparse and Invisible Trigger [57.41876708712008]
Deep neural networks (DNNs) are vulnerable to backdoor attacks. backdoor attack is an emerging yet threatening training-phase threat. We propose a sparse and invisible backdoor attack (SIBA)
arXiv Detail & Related papers (2023-05-11T10:05:57Z)
Model-Contrastive Learning for Backdoor Defense [13.781375023320981]
We propose a novel backdoor defense method named MCL based on model-contrastive learning. MCL is more effective for reducing backdoor threats while maintaining higher accuracy of benign data.
arXiv Detail & Related papers (2022-05-09T16:36:46Z)
Black-box Detection of Backdoor Attacks with Limited Information and Data [56.0735480850555]
We propose a black-box backdoor detection (B3D) method to identify backdoor attacks with only query access to the model. In addition to backdoor detection, we also propose a simple strategy for reliable predictions using the identified backdoored models.
arXiv Detail & Related papers (2021-03-24T12:06:40Z)

This list is automatically generated from the titles and abstracts of the papers in this site.