Related papers: Exploiting Machine Unlearning for Backdoor Attacks in Deep Learning System

Exploiting Machine Unlearning for Backdoor Attacks in Deep Learning System

URL: http://arxiv.org/abs/2310.10659v2
Date: Wed, 13 Dec 2023 15:00:28 GMT
Title: Exploiting Machine Unlearning for Backdoor Attacks in Deep Learning System
Authors: Peixin Zhang, Jun Sun, Mingtian Tan, Xinyu Wang
Abstract summary: We propose a novel black-box backdoor attack based on machine unlearning. The attacker first augments the training set with carefully designed samples, including poison and mitigation data, to train a benign' model. Then, the attacker posts unlearning requests for the mitigation samples to remove the impact of relevant data on the model, gradually activating the hidden backdoor.
Score: 4.9233610638625604
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: In recent years, the security issues of artificial intelligence have become increasingly prominent due to the rapid development of deep learning research and applications. Backdoor attack is an attack targeting the vulnerability of deep learning models, where hidden backdoors are activated by triggers embedded by the attacker, thereby outputting malicious predictions that may not align with the intended output for a given input. In this work, we propose a novel black-box backdoor attack based on machine unlearning. The attacker first augments the training set with carefully designed samples, including poison and mitigation data, to train a `benign' model. Then, the attacker posts unlearning requests for the mitigation samples to remove the impact of relevant data on the model, gradually activating the hidden backdoor. Since backdoors are implanted during the iterative unlearning process, it significantly increases the computational overhead of existing defense methods for backdoor detection or mitigation. To address this new security threat, we proposes two methods for detecting or mitigating such malicious unlearning requests. We conduct the experiment in both exact unlearning and approximate unlearning (i.e., SISA) settings. Experimental results indicate that: 1) our attack approach can successfully implant backdoor into the model, and sharding increases the difficult of attack; 2) our detection algorithms are effective in identifying the mitigation samples, while sharding reduces the effectiveness of our detection algorithms.

Related papers

Injection, Attack and Erasure: Revocable Backdoor Attacks via Machine Unlearning [2.1896295740048894]
We introduce the first paradigm of revocable backdoor attacks, where the backdoor can be proactively and thoroughly removed after the attack objective is achieved.<n>This work opens a new direction for backdoor attack research and presents new challenges for the security of machine learning systems.
arXiv Detail & Related papers (2025-10-15T09:09:43Z)
When Forgetting Triggers Backdoors: A Clean Unlearning Attack [1.8434042562191815]
We propose a novel em clean backdoor attack that exploits both the model learning phase and the subsequent unlearning requests.<n>This strategy results in a powerful and stealthy novel attack that is hard to detect or mitigate.
arXiv Detail & Related papers (2025-06-14T14:31:51Z)
Unlearn to Relearn Backdoors: Deferred Backdoor Functionality Attacks on Deep Learning Models [6.937795040660591]
We introduce Deferred Activated Backdoor Functionality (DABF) as a new paradigm in backdoor attacks. Unlike conventional attacks, DABF initially conceals its backdoor, producing benign outputs even when triggered. DABF attacks exploit the common practice in the life cycle of machine learning models to perform model updates and fine-tuning after initial deployment.
arXiv Detail & Related papers (2024-11-10T07:01:53Z)
Efficient Backdoor Defense in Multimodal Contrastive Learning: A Token-Level Unlearning Method for Mitigating Threats [52.94388672185062]
We propose an efficient defense mechanism against backdoor threats using a concept known as machine unlearning. This entails strategically creating a small set of poisoned samples to aid the model's rapid unlearning of backdoor vulnerabilities. In the backdoor unlearning process, we present a novel token-based portion unlearning training regime.
arXiv Detail & Related papers (2024-09-29T02:55:38Z)
Rethinking Backdoor Attacks [122.1008188058615]
In a backdoor attack, an adversary inserts maliciously constructed backdoor examples into a training set to make the resulting model vulnerable to manipulation. Defending against such attacks typically involves viewing these inserted examples as outliers in the training set and using techniques from robust statistics to detect and remove them. We show that without structural information about the training data distribution, backdoor attacks are indistinguishable from naturally-occurring features in the data.
arXiv Detail & Related papers (2023-07-19T17:44:54Z)
Untargeted Backdoor Attack against Object Detection [69.63097724439886]
We design a poison-only backdoor attack in an untargeted manner, based on task characteristics. We show that, once the backdoor is embedded into the target model by our attack, it can trick the model to lose detection of any object stamped with our trigger patterns.
arXiv Detail & Related papers (2022-11-02T17:05:45Z)
Backdoor Defense via Suppressing Model Shortcuts [91.30995749139012]
In this paper, we explore the backdoor mechanism from the angle of the model structure. We demonstrate that the attack success rate (ASR) decreases significantly when reducing the outputs of some key skip connections.
arXiv Detail & Related papers (2022-11-02T15:39:19Z)
Black-box Detection of Backdoor Attacks with Limited Information and Data [56.0735480850555]
We propose a black-box backdoor detection (B3D) method to identify backdoor attacks with only query access to the model. In addition to backdoor detection, we also propose a simple strategy for reliable predictions using the identified backdoored models.
arXiv Detail & Related papers (2021-03-24T12:06:40Z)
DeepPayload: Black-box Backdoor Attack on Deep Learning Models through Neural Payload Injection [17.136757440204722]
We introduce a highly practical backdoor attack achieved with a set of reverse-engineering techniques over compiled deep learning models. The injected backdoor can be triggered with a success rate of 93.5%, while only brought less than 2ms latency overhead and no more than 1.4% accuracy decrease. We found 54 apps that were vulnerable to our attack, including popular and security-critical ones.
arXiv Detail & Related papers (2021-01-18T06:29:30Z)
Backdoor Learning: A Survey [75.59571756777342]
Backdoor attack intends to embed hidden backdoor into deep neural networks (DNNs) Backdoor learning is an emerging and rapidly growing research area. This paper presents the first comprehensive survey of this realm.
arXiv Detail & Related papers (2020-07-17T04:09:20Z)
Backdoor Smoothing: Demystifying Backdoor Attacks on Deep Neural Networks [25.23881974235643]
We show that backdoor attacks induce a smoother decision function around the triggered samples -- a phenomenon which we refer to as textitbackdoor smoothing. Our experiments show that smoothness increases when the trigger is added to the input samples, and that this phenomenon is more pronounced for more successful attacks.
arXiv Detail & Related papers (2020-06-11T18:28:54Z)

This list is automatically generated from the titles and abstracts of the papers in this site.