Backdoor Defense with Machine Unlearning
- URL: http://arxiv.org/abs/2201.09538v1
- Date: Mon, 24 Jan 2022 09:09:12 GMT
- Title: Backdoor Defense with Machine Unlearning
- Authors: Yang Liu, Mingyuan Fan, Cen Chen, Ximeng Liu, Zhuo Ma, Li Wang,
Jianfeng Ma
- Abstract summary: We propose BAERASE, a novel method that can erase the backdoor injected into the victim model through machine unlearning.
BAERASE can averagely lower the attack success rates of three kinds of state-of-the-art backdoor attacks by 99% on four benchmark datasets.
- Score: 32.968653927933296
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Backdoor injection attack is an emerging threat to the security of neural
networks, however, there still exist limited effective defense methods against
the attack. In this paper, we propose BAERASE, a novel method that can erase
the backdoor injected into the victim model through machine unlearning.
Specifically, BAERASE mainly implements backdoor defense in two key steps.
First, trigger pattern recovery is conducted to extract the trigger patterns
infected by the victim model. Here, the trigger pattern recovery problem is
equivalent to the one of extracting an unknown noise distribution from the
victim model, which can be easily resolved by the entropy maximization based
generative model. Subsequently, BAERASE leverages these recovered trigger
patterns to reverse the backdoor injection procedure and induce the victim
model to erase the polluted memories through a newly designed gradient ascent
based machine unlearning method. Compared with the previous machine unlearning
solutions, the proposed approach gets rid of the reliance on the full access to
training data for retraining and shows higher effectiveness on backdoor erasing
than existing fine-tuning or pruning methods. Moreover, experiments show that
BAERASE can averagely lower the attack success rates of three kinds of
state-of-the-art backdoor attacks by 99\% on four benchmark datasets.
Related papers
- Backdoor Mitigation by Distance-Driven Detoxification [38.27102305144483]
Backdoor attacks undermine the integrity of machine learning models by allowing attackers to manipulate predictions using poisoned training data.
This paper considers a post-training backdoor defense task, aiming to detoxify the backdoors in pre-trained models.
We propose Distance-Driven Detoxification (D3), an innovative approach that reformulates backdoor defense as a constrained optimization problem.
arXiv Detail & Related papers (2024-11-14T16:54:06Z) - Unlearn to Relearn Backdoors: Deferred Backdoor Functionality Attacks on Deep Learning Models [6.937795040660591]
We introduce Deferred Activated Backdoor Functionality (DABF) as a new paradigm in backdoor attacks.
Unlike conventional attacks, DABF initially conceals its backdoor, producing benign outputs even when triggered.
DABF attacks exploit the common practice in the life cycle of machine learning models to perform model updates and fine-tuning after initial deployment.
arXiv Detail & Related papers (2024-11-10T07:01:53Z) - Efficient Backdoor Defense in Multimodal Contrastive Learning: A Token-Level Unlearning Method for Mitigating Threats [52.94388672185062]
We propose an efficient defense mechanism against backdoor threats using a concept known as machine unlearning.
This entails strategically creating a small set of poisoned samples to aid the model's rapid unlearning of backdoor vulnerabilities.
In the backdoor unlearning process, we present a novel token-based portion unlearning training regime.
arXiv Detail & Related papers (2024-09-29T02:55:38Z) - Mitigating Backdoor Attacks using Activation-Guided Model Editing [8.00994004466919]
Backdoor attacks compromise the integrity and reliability of machine learning models.
We propose a novel backdoor mitigation approach via machine unlearning to counter such backdoor attacks.
arXiv Detail & Related papers (2024-07-10T13:43:47Z) - SEEP: Training Dynamics Grounds Latent Representation Search for Mitigating Backdoor Poisoning Attacks [53.28390057407576]
Modern NLP models are often trained on public datasets drawn from diverse sources.
Data poisoning attacks can manipulate the model's behavior in ways engineered by the attacker.
Several strategies have been proposed to mitigate the risks associated with backdoor attacks.
arXiv Detail & Related papers (2024-05-19T14:50:09Z) - Backdoor Attack against One-Class Sequential Anomaly Detection Models [10.020488631167204]
We explore compromising deep sequential anomaly detection models by proposing a novel backdoor attack strategy.
The attack approach comprises two primary steps, trigger generation and backdoor injection.
Experiments demonstrate the effectiveness of our proposed attack strategy by injecting backdoors on two well-established one-class anomaly detection models.
arXiv Detail & Related papers (2024-02-15T19:19:54Z) - Exploiting Machine Unlearning for Backdoor Attacks in Deep Learning
System [4.9233610638625604]
We propose a novel black-box backdoor attack based on machine unlearning.
The attacker first augments the training set with carefully designed samples, including poison and mitigation data, to train a benign' model.
Then, the attacker posts unlearning requests for the mitigation samples to remove the impact of relevant data on the model, gradually activating the hidden backdoor.
arXiv Detail & Related papers (2023-09-12T02:42:39Z) - Shared Adversarial Unlearning: Backdoor Mitigation by Unlearning Shared
Adversarial Examples [67.66153875643964]
Backdoor attacks are serious security threats to machine learning models.
In this paper, we explore the task of purifying a backdoored model using a small clean dataset.
By establishing the connection between backdoor risk and adversarial risk, we derive a novel upper bound for backdoor risk.
arXiv Detail & Related papers (2023-07-20T03:56:04Z) - Backdoor Defense via Suppressing Model Shortcuts [91.30995749139012]
In this paper, we explore the backdoor mechanism from the angle of the model structure.
We demonstrate that the attack success rate (ASR) decreases significantly when reducing the outputs of some key skip connections.
arXiv Detail & Related papers (2022-11-02T15:39:19Z) - Few-shot Backdoor Defense Using Shapley Estimation [123.56934991060788]
We develop a new approach called Shapley Pruning to mitigate backdoor attacks on deep neural networks.
ShapPruning identifies the few infected neurons (under 1% of all neurons) and manages to protect the model's structure and accuracy.
Experiments demonstrate the effectiveness and robustness of our method against various attacks and tasks.
arXiv Detail & Related papers (2021-12-30T02:27:03Z) - Black-box Detection of Backdoor Attacks with Limited Information and
Data [56.0735480850555]
We propose a black-box backdoor detection (B3D) method to identify backdoor attacks with only query access to the model.
In addition to backdoor detection, we also propose a simple strategy for reliable predictions using the identified backdoored models.
arXiv Detail & Related papers (2021-03-24T12:06:40Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.