Adversarial Neuron Pruning Purifies Backdoored Deep Models
- URL: http://arxiv.org/abs/2110.14430v1
- Date: Wed, 27 Oct 2021 13:41:53 GMT
- Title: Adversarial Neuron Pruning Purifies Backdoored Deep Models
- Authors: Dongxian Wu, Yisen Wang
- Abstract summary: Adrial Neuron Pruning (ANP) effectively removes the injected backdoor without causing obvious performance degradation.
We propose a novel model repairing method, termed Adrial Neuron Pruning (ANP), which prunes some sensitive neurons to purify the injected backdoor.
- Score: 24.002034537777526
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: As deep neural networks (DNNs) are growing larger, their requirements for
computational resources become huge, which makes outsourcing training more
popular. Training in a third-party platform, however, may introduce potential
risks that a malicious trainer will return backdoored DNNs, which behave
normally on clean samples but output targeted misclassifications whenever a
trigger appears at the test time. Without any knowledge of the trigger, it is
difficult to distinguish or recover benign DNNs from backdoored ones. In this
paper, we first identify an unexpected sensitivity of backdoored DNNs, that is,
they are much easier to collapse and tend to predict the target label on clean
samples when their neurons are adversarially perturbed. Based on these
observations, we propose a novel model repairing method, termed Adversarial
Neuron Pruning (ANP), which prunes some sensitive neurons to purify the
injected backdoor. Experiments show, even with only an extremely small amount
of clean data (e.g., 1%), ANP effectively removes the injected backdoor without
causing obvious performance degradation.
Related papers
- Magnitude-based Neuron Pruning for Backdoor Defens [3.056446348447152]
Deep Neural Networks (DNNs) are known to be vulnerable to backdoor attacks.
Recent research reveals that backdoors can be erased from infected DNNs by pruning a specific group of neurons.
We propose a Magnitude-based Neuron Pruning (MNP) method to detect and prune backdoor neurons.
arXiv Detail & Related papers (2024-05-28T02:05:39Z) - Rethinking Pruning for Backdoor Mitigation: An Optimization Perspective [19.564985801521814]
We propose an optimized Neuron Pruning (ONP) method combined with Graph Neural Network (GNN) and Reinforcement Learning (RL) to repair backdoor models.
With a small amount of clean data, ONP can effectively prune the backdoor neurons implanted by a set of backdoor attacks at the cost of negligible performance degradation.
arXiv Detail & Related papers (2024-05-28T01:59:06Z) - Mitigating Backdoors within Deep Neural Networks in Data-limited
Configuration [1.1663475941322277]
A backdoored deep neural network shows normal behavior on clean data while behaving maliciously once a trigger is injected into a sample at the test time.
In this paper, we formulate some characteristics of poisoned neurons.
This backdoor suspiciousness score can rank network neurons according to their activation values, weights, and their relationship with other neurons in the same layer.
arXiv Detail & Related papers (2023-11-13T15:54:27Z) - Reconstructive Neuron Pruning for Backdoor Defense [96.21882565556072]
We propose a novel defense called emphReconstructive Neuron Pruning (RNP) to expose and prune backdoor neurons.
In RNP, unlearning is operated at the neuron level while recovering is operated at the filter level, forming an asymmetric reconstructive learning procedure.
We show that such an asymmetric process on only a few clean samples can effectively expose and prune the backdoor neurons implanted by a wide range of attacks.
arXiv Detail & Related papers (2023-05-24T08:29:30Z) - Backdoor Defense via Deconfounded Representation Learning [17.28760299048368]
We propose a Causality-inspired Backdoor Defense (CBD) to learn deconfounded representations for reliable classification.
CBD is effective in reducing backdoor threats while maintaining high accuracy in predicting benign samples.
arXiv Detail & Related papers (2023-03-13T02:25:59Z) - Backdoor Defense via Suppressing Model Shortcuts [91.30995749139012]
In this paper, we explore the backdoor mechanism from the angle of the model structure.
We demonstrate that the attack success rate (ASR) decreases significantly when reducing the outputs of some key skip connections.
arXiv Detail & Related papers (2022-11-02T15:39:19Z) - Few-shot Backdoor Defense Using Shapley Estimation [123.56934991060788]
We develop a new approach called Shapley Pruning to mitigate backdoor attacks on deep neural networks.
ShapPruning identifies the few infected neurons (under 1% of all neurons) and manages to protect the model's structure and accuracy.
Experiments demonstrate the effectiveness and robustness of our method against various attacks and tasks.
arXiv Detail & Related papers (2021-12-30T02:27:03Z) - Black-box Detection of Backdoor Attacks with Limited Information and
Data [56.0735480850555]
We propose a black-box backdoor detection (B3D) method to identify backdoor attacks with only query access to the model.
In addition to backdoor detection, we also propose a simple strategy for reliable predictions using the identified backdoored models.
arXiv Detail & Related papers (2021-03-24T12:06:40Z) - Noise-Response Analysis of Deep Neural Networks Quantifies Robustness
and Fingerprints Structural Malware [48.7072217216104]
Deep neural networks (DNNs) have structural malware' (i.e., compromised weights and activation pathways)
It is generally difficult to detect backdoors, and existing detection methods are computationally expensive and require extensive resources (e.g., access to the training data)
Here, we propose a rapid feature-generation technique that quantifies the robustness of a DNN, fingerprints' its nonlinearity, and allows us to detect backdoors (if present)
Our empirical results demonstrate that we can accurately detect backdoors with high confidence orders-of-magnitude faster than existing approaches (seconds versus
arXiv Detail & Related papers (2020-07-31T23:52:58Z) - Defending against Backdoor Attack on Deep Neural Networks [98.45955746226106]
We study the so-called textitbackdoor attack, which injects a backdoor trigger to a small portion of training data.
Experiments show that our method could effectively decrease the attack success rate, and also hold a high classification accuracy for clean images.
arXiv Detail & Related papers (2020-02-26T02:03:00Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.