Reconstructive Neuron Pruning for Backdoor Defense
- URL: http://arxiv.org/abs/2305.14876v2
- Date: Fri, 8 Dec 2023 06:17:41 GMT
- Title: Reconstructive Neuron Pruning for Backdoor Defense
- Authors: Yige Li, Xixiang Lyu, Xingjun Ma, Nodens Koren, Lingjuan Lyu, Bo Li,
Yu-Gang Jiang
- Abstract summary: We propose a novel defense called emphReconstructive Neuron Pruning (RNP) to expose and prune backdoor neurons.
In RNP, unlearning is operated at the neuron level while recovering is operated at the filter level, forming an asymmetric reconstructive learning procedure.
We show that such an asymmetric process on only a few clean samples can effectively expose and prune the backdoor neurons implanted by a wide range of attacks.
- Score: 96.21882565556072
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Deep neural networks (DNNs) have been found to be vulnerable to backdoor
attacks, raising security concerns about their deployment in mission-critical
applications. While existing defense methods have demonstrated promising
results, it is still not clear how to effectively remove backdoor-associated
neurons in backdoored DNNs. In this paper, we propose a novel defense called
\emph{Reconstructive Neuron Pruning} (RNP) to expose and prune backdoor neurons
via an unlearning and then recovering process. Specifically, RNP first unlearns
the neurons by maximizing the model's error on a small subset of clean samples
and then recovers the neurons by minimizing the model's error on the same data.
In RNP, unlearning is operated at the neuron level while recovering is operated
at the filter level, forming an asymmetric reconstructive learning procedure.
We show that such an asymmetric process on only a few clean samples can
effectively expose and prune the backdoor neurons implanted by a wide range of
attacks, achieving a new state-of-the-art defense performance. Moreover, the
unlearned model at the intermediate step of our RNP can be directly used to
improve other backdoor defense tasks including backdoor removal, trigger
recovery, backdoor label detection, and backdoor sample detection. Code is
available at \url{https://github.com/bboylyg/RNP}.
Related papers
- Magnitude-based Neuron Pruning for Backdoor Defens [3.056446348447152]
Deep Neural Networks (DNNs) are known to be vulnerable to backdoor attacks.
Recent research reveals that backdoors can be erased from infected DNNs by pruning a specific group of neurons.
We propose a Magnitude-based Neuron Pruning (MNP) method to detect and prune backdoor neurons.
arXiv Detail & Related papers (2024-05-28T02:05:39Z) - Rethinking Pruning for Backdoor Mitigation: An Optimization Perspective [19.564985801521814]
We propose an optimized Neuron Pruning (ONP) method combined with Graph Neural Network (GNN) and Reinforcement Learning (RL) to repair backdoor models.
With a small amount of clean data, ONP can effectively prune the backdoor neurons implanted by a set of backdoor attacks at the cost of negligible performance degradation.
arXiv Detail & Related papers (2024-05-28T01:59:06Z) - Mitigating Backdoors within Deep Neural Networks in Data-limited
Configuration [1.1663475941322277]
A backdoored deep neural network shows normal behavior on clean data while behaving maliciously once a trigger is injected into a sample at the test time.
In this paper, we formulate some characteristics of poisoned neurons.
This backdoor suspiciousness score can rank network neurons according to their activation values, weights, and their relationship with other neurons in the same layer.
arXiv Detail & Related papers (2023-11-13T15:54:27Z) - Backdoor Defense via Deconfounded Representation Learning [17.28760299048368]
We propose a Causality-inspired Backdoor Defense (CBD) to learn deconfounded representations for reliable classification.
CBD is effective in reducing backdoor threats while maintaining high accuracy in predicting benign samples.
arXiv Detail & Related papers (2023-03-13T02:25:59Z) - Backdoor Defense via Suppressing Model Shortcuts [91.30995749139012]
In this paper, we explore the backdoor mechanism from the angle of the model structure.
We demonstrate that the attack success rate (ASR) decreases significantly when reducing the outputs of some key skip connections.
arXiv Detail & Related papers (2022-11-02T15:39:19Z) - Few-shot Backdoor Defense Using Shapley Estimation [123.56934991060788]
We develop a new approach called Shapley Pruning to mitigate backdoor attacks on deep neural networks.
ShapPruning identifies the few infected neurons (under 1% of all neurons) and manages to protect the model's structure and accuracy.
Experiments demonstrate the effectiveness and robustness of our method against various attacks and tasks.
arXiv Detail & Related papers (2021-12-30T02:27:03Z) - Adversarial Neuron Pruning Purifies Backdoored Deep Models [24.002034537777526]
Adrial Neuron Pruning (ANP) effectively removes the injected backdoor without causing obvious performance degradation.
We propose a novel model repairing method, termed Adrial Neuron Pruning (ANP), which prunes some sensitive neurons to purify the injected backdoor.
arXiv Detail & Related papers (2021-10-27T13:41:53Z) - Black-box Detection of Backdoor Attacks with Limited Information and
Data [56.0735480850555]
We propose a black-box backdoor detection (B3D) method to identify backdoor attacks with only query access to the model.
In addition to backdoor detection, we also propose a simple strategy for reliable predictions using the identified backdoored models.
arXiv Detail & Related papers (2021-03-24T12:06:40Z) - Red Alarm for Pre-trained Models: Universal Vulnerability to
Neuron-Level Backdoor Attacks [98.15243373574518]
Pre-trained models (PTMs) have been widely used in various downstream tasks.
In this work, we demonstrate the universal vulnerability of PTMs, where fine-tuned PTMs can be easily controlled by backdoor attacks.
arXiv Detail & Related papers (2021-01-18T10:18:42Z) - Defending against Backdoor Attack on Deep Neural Networks [98.45955746226106]
We study the so-called textitbackdoor attack, which injects a backdoor trigger to a small portion of training data.
Experiments show that our method could effectively decrease the attack success rate, and also hold a high classification accuracy for clean images.
arXiv Detail & Related papers (2020-02-26T02:03:00Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.