Related papers: Backdoor Defense with Machine Unlearning

Backdoor Defense with Machine Unlearning

URL: http://arxiv.org/abs/2201.09538v1
Date: Mon, 24 Jan 2022 09:09:12 GMT
Title: Backdoor Defense with Machine Unlearning
Authors: Yang Liu, Mingyuan Fan, Cen Chen, Ximeng Liu, Zhuo Ma, Li Wang, Jianfeng Ma
Abstract summary: We propose BAERASE, a novel method that can erase the backdoor injected into the victim model through machine unlearning. BAERASE can averagely lower the attack success rates of three kinds of state-of-the-art backdoor attacks by 99% on four benchmark datasets.
Score: 32.968653927933296
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: Backdoor injection attack is an emerging threat to the security of neural networks, however, there still exist limited effective defense methods against the attack. In this paper, we propose BAERASE, a novel method that can erase the backdoor injected into the victim model through machine unlearning. Specifically, BAERASE mainly implements backdoor defense in two key steps. First, trigger pattern recovery is conducted to extract the trigger patterns infected by the victim model. Here, the trigger pattern recovery problem is equivalent to the one of extracting an unknown noise distribution from the victim model, which can be easily resolved by the entropy maximization based generative model. Subsequently, BAERASE leverages these recovered trigger patterns to reverse the backdoor injection procedure and induce the victim model to erase the polluted memories through a newly designed gradient ascent based machine unlearning method. Compared with the previous machine unlearning solutions, the proposed approach gets rid of the reliance on the full access to training data for retraining and shows higher effectiveness on backdoor erasing than existing fine-tuning or pruning methods. Moreover, experiments show that BAERASE can averagely lower the attack success rates of three kinds of state-of-the-art backdoor attacks by 99\% on four benchmark datasets.

Related papers

InverTune: Removing Backdoors from Multimodal Contrastive Learning Models via Trigger Inversion and Activation Tuning [36.56302680556252]
We introduce InverTune, the first backdoor defense framework for multimodal models under minimal attacker assumptions.<n>InverTune effectively identifies and removes backdoor artifacts through three key components, achieving robust protection against backdoor attacks.<n> Experimental results show that InverTune reduces the average attack success rate (ASR) by 97.87% against the state-of-the-art (SOTA) attacks.
arXiv Detail & Related papers (2025-06-14T09:08:34Z)
REFINE: Inversion-Free Backdoor Defense via Model Reprogramming [60.554146386198376]
Backdoor attacks on deep neural networks (DNNs) have emerged as a significant security threat. We propose REFINE, an inversion-free backdoor defense method based on model reprogramming.
arXiv Detail & Related papers (2025-02-22T07:29:12Z)
Behavior Backdoor for Deep Learning Models [95.50787731231063]
We take the first step towards behavioral backdoor'' attack, which is defined as a behavior-triggered backdoor model training procedure. We propose the first pipeline of implementing behavior backdoor, i.e., the Quantification Backdoor (QB) attack. Experiments have been conducted on different models, datasets, and tasks, demonstrating the effectiveness of this novel backdoor attack.
arXiv Detail & Related papers (2024-12-02T10:54:02Z)
Backdoor Mitigation by Distance-Driven Detoxification [38.27102305144483]
Backdoor attacks undermine the integrity of machine learning models by allowing attackers to manipulate predictions using poisoned training data. This paper considers a post-training backdoor defense task, aiming to detoxify the backdoors in pre-trained models. We propose Distance-Driven Detoxification (D3), an innovative approach that reformulates backdoor defense as a constrained optimization problem.
arXiv Detail & Related papers (2024-11-14T16:54:06Z)
Unlearn to Relearn Backdoors: Deferred Backdoor Functionality Attacks on Deep Learning Models [6.937795040660591]
We introduce Deferred Activated Backdoor Functionality (DABF) as a new paradigm in backdoor attacks. Unlike conventional attacks, DABF initially conceals its backdoor, producing benign outputs even when triggered. DABF attacks exploit the common practice in the life cycle of machine learning models to perform model updates and fine-tuning after initial deployment.
arXiv Detail & Related papers (2024-11-10T07:01:53Z)
Efficient Backdoor Defense in Multimodal Contrastive Learning: A Token-Level Unlearning Method for Mitigating Threats [52.94388672185062]
We propose an efficient defense mechanism against backdoor threats using a concept known as machine unlearning. This entails strategically creating a small set of poisoned samples to aid the model's rapid unlearning of backdoor vulnerabilities. In the backdoor unlearning process, we present a novel token-based portion unlearning training regime.
arXiv Detail & Related papers (2024-09-29T02:55:38Z)
Mitigating Backdoor Attacks using Activation-Guided Model Editing [8.00994004466919]
Backdoor attacks compromise the integrity and reliability of machine learning models. We propose a novel backdoor mitigation approach via machine unlearning to counter such backdoor attacks.
arXiv Detail & Related papers (2024-07-10T13:43:47Z)
SEEP: Training Dynamics Grounds Latent Representation Search for Mitigating Backdoor Poisoning Attacks [53.28390057407576]
Modern NLP models are often trained on public datasets drawn from diverse sources. Data poisoning attacks can manipulate the model's behavior in ways engineered by the attacker. Several strategies have been proposed to mitigate the risks associated with backdoor attacks.
arXiv Detail & Related papers (2024-05-19T14:50:09Z)
Backdoor Attack against One-Class Sequential Anomaly Detection Models [10.020488631167204]
We explore compromising deep sequential anomaly detection models by proposing a novel backdoor attack strategy. The attack approach comprises two primary steps, trigger generation and backdoor injection. Experiments demonstrate the effectiveness of our proposed attack strategy by injecting backdoors on two well-established one-class anomaly detection models.
arXiv Detail & Related papers (2024-02-15T19:19:54Z)
Exploiting Machine Unlearning for Backdoor Attacks in Deep Learning System [4.9233610638625604]
We propose a novel black-box backdoor attack based on machine unlearning. The attacker first augments the training set with carefully designed samples, including poison and mitigation data, to train a benign' model. Then, the attacker posts unlearning requests for the mitigation samples to remove the impact of relevant data on the model, gradually activating the hidden backdoor.
arXiv Detail & Related papers (2023-09-12T02:42:39Z)
Shared Adversarial Unlearning: Backdoor Mitigation by Unlearning Shared Adversarial Examples [67.66153875643964]
Backdoor attacks are serious security threats to machine learning models. In this paper, we explore the task of purifying a backdoored model using a small clean dataset. By establishing the connection between backdoor risk and adversarial risk, we derive a novel upper bound for backdoor risk.
arXiv Detail & Related papers (2023-07-20T03:56:04Z)
Backdoor Defense via Suppressing Model Shortcuts [91.30995749139012]
In this paper, we explore the backdoor mechanism from the angle of the model structure. We demonstrate that the attack success rate (ASR) decreases significantly when reducing the outputs of some key skip connections.
arXiv Detail & Related papers (2022-11-02T15:39:19Z)
Few-shot Backdoor Defense Using Shapley Estimation [123.56934991060788]
We develop a new approach called Shapley Pruning to mitigate backdoor attacks on deep neural networks. ShapPruning identifies the few infected neurons (under 1% of all neurons) and manages to protect the model's structure and accuracy. Experiments demonstrate the effectiveness and robustness of our method against various attacks and tasks.
arXiv Detail & Related papers (2021-12-30T02:27:03Z)
Black-box Detection of Backdoor Attacks with Limited Information and Data [56.0735480850555]
We propose a black-box backdoor detection (B3D) method to identify backdoor attacks with only query access to the model. In addition to backdoor detection, we also propose a simple strategy for reliable predictions using the identified backdoored models.
arXiv Detail & Related papers (2021-03-24T12:06:40Z)

This list is automatically generated from the titles and abstracts of the papers in this site.