Rethinking Pruning for Backdoor Mitigation: An Optimization Perspective
- URL: http://arxiv.org/abs/2405.17746v1
- Date: Tue, 28 May 2024 01:59:06 GMT
- Title: Rethinking Pruning for Backdoor Mitigation: An Optimization Perspective
- Authors: Nan Li, Haiyang Yu, Ping Yi,
- Abstract summary: We propose an optimized Neuron Pruning (ONP) method combined with Graph Neural Network (GNN) and Reinforcement Learning (RL) to repair backdoor models.
With a small amount of clean data, ONP can effectively prune the backdoor neurons implanted by a set of backdoor attacks at the cost of negligible performance degradation.
- Score: 19.564985801521814
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Deep Neural Networks (DNNs) are known to be vulnerable to backdoor attacks, posing concerning threats to their reliable deployment. Recent research reveals that backdoors can be erased from infected DNNs by pruning a specific group of neurons, while how to effectively identify and remove these backdoor-associated neurons remains an open challenge. Most of the existing defense methods rely on defined rules and focus on neuron's local properties, ignoring the exploration and optimization of pruning policies. To address this gap, we propose an Optimized Neuron Pruning (ONP) method combined with Graph Neural Network (GNN) and Reinforcement Learning (RL) to repair backdoor models. Specifically, ONP first models the target DNN as graphs based on neuron connectivity, and then uses GNN-based RL agents to learn graph embeddings and find a suitable pruning policy. To the best of our knowledge, this is the first attempt to employ GNN and RL for optimizing pruning policies in the field of backdoor defense. Experiments show, with a small amount of clean data, ONP can effectively prune the backdoor neurons implanted by a set of backdoor attacks at the cost of negligible performance degradation, achieving a new state-of-the-art performance for backdoor mitigation.
Related papers
- "No Matter What You Do": Purifying GNN Models via Backdoor Unlearning [33.07926413485209]
backdoor attacks in GNNs lie in the fact that the attacker modifies a portion of graph data by embedding triggers.
We present GCleaner, the first backdoor mitigation method on GNNs.
GCleaner can reduce the backdoor attack success rate to 10% with only 1% of clean data, and has almost negligible degradation in model performance.
arXiv Detail & Related papers (2024-10-02T06:30:49Z) - Magnitude-based Neuron Pruning for Backdoor Defens [3.056446348447152]
Deep Neural Networks (DNNs) are known to be vulnerable to backdoor attacks.
Recent research reveals that backdoors can be erased from infected DNNs by pruning a specific group of neurons.
We propose a Magnitude-based Neuron Pruning (MNP) method to detect and prune backdoor neurons.
arXiv Detail & Related papers (2024-05-28T02:05:39Z) - Reconstructive Neuron Pruning for Backdoor Defense [96.21882565556072]
We propose a novel defense called emphReconstructive Neuron Pruning (RNP) to expose and prune backdoor neurons.
In RNP, unlearning is operated at the neuron level while recovering is operated at the filter level, forming an asymmetric reconstructive learning procedure.
We show that such an asymmetric process on only a few clean samples can effectively expose and prune the backdoor neurons implanted by a wide range of attacks.
arXiv Detail & Related papers (2023-05-24T08:29:30Z) - Backdoor Attack with Sparse and Invisible Trigger [57.41876708712008]
Deep neural networks (DNNs) are vulnerable to backdoor attacks.
backdoor attack is an emerging yet threatening training-phase threat.
We propose a sparse and invisible backdoor attack (SIBA)
arXiv Detail & Related papers (2023-05-11T10:05:57Z) - Backdoor Defense via Suppressing Model Shortcuts [91.30995749139012]
In this paper, we explore the backdoor mechanism from the angle of the model structure.
We demonstrate that the attack success rate (ASR) decreases significantly when reducing the outputs of some key skip connections.
arXiv Detail & Related papers (2022-11-02T15:39:19Z) - Adversarial Neuron Pruning Purifies Backdoored Deep Models [24.002034537777526]
Adrial Neuron Pruning (ANP) effectively removes the injected backdoor without causing obvious performance degradation.
We propose a novel model repairing method, termed Adrial Neuron Pruning (ANP), which prunes some sensitive neurons to purify the injected backdoor.
arXiv Detail & Related papers (2021-10-27T13:41:53Z) - Black-box Detection of Backdoor Attacks with Limited Information and
Data [56.0735480850555]
We propose a black-box backdoor detection (B3D) method to identify backdoor attacks with only query access to the model.
In addition to backdoor detection, we also propose a simple strategy for reliable predictions using the identified backdoored models.
arXiv Detail & Related papers (2021-03-24T12:06:40Z) - Noise-Response Analysis of Deep Neural Networks Quantifies Robustness
and Fingerprints Structural Malware [48.7072217216104]
Deep neural networks (DNNs) have structural malware' (i.e., compromised weights and activation pathways)
It is generally difficult to detect backdoors, and existing detection methods are computationally expensive and require extensive resources (e.g., access to the training data)
Here, we propose a rapid feature-generation technique that quantifies the robustness of a DNN, fingerprints' its nonlinearity, and allows us to detect backdoors (if present)
Our empirical results demonstrate that we can accurately detect backdoors with high confidence orders-of-magnitude faster than existing approaches (seconds versus
arXiv Detail & Related papers (2020-07-31T23:52:58Z) - Defending against Backdoor Attack on Deep Neural Networks [98.45955746226106]
We study the so-called textitbackdoor attack, which injects a backdoor trigger to a small portion of training data.
Experiments show that our method could effectively decrease the attack success rate, and also hold a high classification accuracy for clean images.
arXiv Detail & Related papers (2020-02-26T02:03:00Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.