Exploiting Machine Unlearning for Backdoor Attacks in Deep Learning
System
- URL: http://arxiv.org/abs/2310.10659v2
- Date: Wed, 13 Dec 2023 15:00:28 GMT
- Title: Exploiting Machine Unlearning for Backdoor Attacks in Deep Learning
System
- Authors: Peixin Zhang, Jun Sun, Mingtian Tan, Xinyu Wang
- Abstract summary: We propose a novel black-box backdoor attack based on machine unlearning.
The attacker first augments the training set with carefully designed samples, including poison and mitigation data, to train a benign' model.
Then, the attacker posts unlearning requests for the mitigation samples to remove the impact of relevant data on the model, gradually activating the hidden backdoor.
- Score: 4.9233610638625604
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In recent years, the security issues of artificial intelligence have become
increasingly prominent due to the rapid development of deep learning research
and applications. Backdoor attack is an attack targeting the vulnerability of
deep learning models, where hidden backdoors are activated by triggers embedded
by the attacker, thereby outputting malicious predictions that may not align
with the intended output for a given input. In this work, we propose a novel
black-box backdoor attack based on machine unlearning. The attacker first
augments the training set with carefully designed samples, including poison and
mitigation data, to train a `benign' model. Then, the attacker posts unlearning
requests for the mitigation samples to remove the impact of relevant data on
the model, gradually activating the hidden backdoor. Since backdoors are
implanted during the iterative unlearning process, it significantly increases
the computational overhead of existing defense methods for backdoor detection
or mitigation. To address this new security threat, we proposes two methods for
detecting or mitigating such malicious unlearning requests. We conduct the
experiment in both exact unlearning and approximate unlearning (i.e., SISA)
settings. Experimental results indicate that: 1) our attack approach can
successfully implant backdoor into the model, and sharding increases the
difficult of attack; 2) our detection algorithms are effective in identifying
the mitigation samples, while sharding reduces the effectiveness of our
detection algorithms.
Related papers
- Unlearn to Relearn Backdoors: Deferred Backdoor Functionality Attacks on Deep Learning Models [6.937795040660591]
We introduce Deferred Activated Backdoor Functionality (DABF) as a new paradigm in backdoor attacks.
Unlike conventional attacks, DABF initially conceals its backdoor, producing benign outputs even when triggered.
DABF attacks exploit the common practice in the life cycle of machine learning models to perform model updates and fine-tuning after initial deployment.
arXiv Detail & Related papers (2024-11-10T07:01:53Z) - Efficient Backdoor Defense in Multimodal Contrastive Learning: A Token-Level Unlearning Method for Mitigating Threats [52.94388672185062]
We propose an efficient defense mechanism against backdoor threats using a concept known as machine unlearning.
This entails strategically creating a small set of poisoned samples to aid the model's rapid unlearning of backdoor vulnerabilities.
In the backdoor unlearning process, we present a novel token-based portion unlearning training regime.
arXiv Detail & Related papers (2024-09-29T02:55:38Z) - Rethinking Backdoor Attacks [122.1008188058615]
In a backdoor attack, an adversary inserts maliciously constructed backdoor examples into a training set to make the resulting model vulnerable to manipulation.
Defending against such attacks typically involves viewing these inserted examples as outliers in the training set and using techniques from robust statistics to detect and remove them.
We show that without structural information about the training data distribution, backdoor attacks are indistinguishable from naturally-occurring features in the data.
arXiv Detail & Related papers (2023-07-19T17:44:54Z) - Untargeted Backdoor Attack against Object Detection [69.63097724439886]
We design a poison-only backdoor attack in an untargeted manner, based on task characteristics.
We show that, once the backdoor is embedded into the target model by our attack, it can trick the model to lose detection of any object stamped with our trigger patterns.
arXiv Detail & Related papers (2022-11-02T17:05:45Z) - Backdoor Defense via Suppressing Model Shortcuts [91.30995749139012]
In this paper, we explore the backdoor mechanism from the angle of the model structure.
We demonstrate that the attack success rate (ASR) decreases significantly when reducing the outputs of some key skip connections.
arXiv Detail & Related papers (2022-11-02T15:39:19Z) - Black-box Detection of Backdoor Attacks with Limited Information and
Data [56.0735480850555]
We propose a black-box backdoor detection (B3D) method to identify backdoor attacks with only query access to the model.
In addition to backdoor detection, we also propose a simple strategy for reliable predictions using the identified backdoored models.
arXiv Detail & Related papers (2021-03-24T12:06:40Z) - DeepPayload: Black-box Backdoor Attack on Deep Learning Models through
Neural Payload Injection [17.136757440204722]
We introduce a highly practical backdoor attack achieved with a set of reverse-engineering techniques over compiled deep learning models.
The injected backdoor can be triggered with a success rate of 93.5%, while only brought less than 2ms latency overhead and no more than 1.4% accuracy decrease.
We found 54 apps that were vulnerable to our attack, including popular and security-critical ones.
arXiv Detail & Related papers (2021-01-18T06:29:30Z) - Backdoor Learning: A Survey [75.59571756777342]
Backdoor attack intends to embed hidden backdoor into deep neural networks (DNNs)
Backdoor learning is an emerging and rapidly growing research area.
This paper presents the first comprehensive survey of this realm.
arXiv Detail & Related papers (2020-07-17T04:09:20Z) - Backdoor Smoothing: Demystifying Backdoor Attacks on Deep Neural
Networks [25.23881974235643]
We show that backdoor attacks induce a smoother decision function around the triggered samples -- a phenomenon which we refer to as textitbackdoor smoothing.
Our experiments show that smoothness increases when the trigger is added to the input samples, and that this phenomenon is more pronounced for more successful attacks.
arXiv Detail & Related papers (2020-06-11T18:28:54Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.