Towards Understanding How Self-training Tolerates Data Backdoor
Poisoning
- URL: http://arxiv.org/abs/2301.08751v1
- Date: Fri, 20 Jan 2023 16:36:45 GMT
- Title: Towards Understanding How Self-training Tolerates Data Backdoor
Poisoning
- Authors: Soumyadeep Pal, Ren Wang, Yuguang Yao and Sijia Liu
- Abstract summary: We explore the potential of self-training via additional unlabeled data for mitigating backdoor attacks.
We find that the new self-training regime help in defending against backdoor attacks to a great extent.
- Score: 11.817302291033725
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Recent studies on backdoor attacks in model training have shown that
polluting a small portion of training data is sufficient to produce incorrect
manipulated predictions on poisoned test-time data while maintaining high clean
accuracy in downstream tasks. The stealthiness of backdoor attacks has imposed
tremendous defense challenges in today's machine learning paradigm. In this
paper, we explore the potential of self-training via additional unlabeled data
for mitigating backdoor attacks. We begin by making a pilot study to show that
vanilla self-training is not effective in backdoor mitigation. Spurred by that,
we propose to defend the backdoor attacks by leveraging strong but proper data
augmentations in the self-training pseudo-labeling stage. We find that the new
self-training regime help in defending against backdoor attacks to a great
extent. Its effectiveness is demonstrated through experiments for different
backdoor triggers on CIFAR-10 and a combination of CIFAR-10 with an additional
unlabeled 500K TinyImages dataset. Finally, we explore the direction of
combining self-supervised representation learning with self-training for
further improvement in backdoor defense.
Related papers
- Efficient Backdoor Defense in Multimodal Contrastive Learning: A Token-Level Unlearning Method for Mitigating Threats [52.94388672185062]
We propose an efficient defense mechanism against backdoor threats using a concept known as machine unlearning.
This entails strategically creating a small set of poisoned samples to aid the model's rapid unlearning of backdoor vulnerabilities.
In the backdoor unlearning process, we present a novel token-based portion unlearning training regime.
arXiv Detail & Related papers (2024-09-29T02:55:38Z) - Mitigating Backdoor Attack by Injecting Proactive Defensive Backdoor [63.84477483795964]
Data-poisoning backdoor attacks are serious security threats to machine learning models.
In this paper, we focus on in-training backdoor defense, aiming to train a clean model even when the dataset may be potentially poisoned.
We propose a novel defense approach called PDB (Proactive Defensive Backdoor)
arXiv Detail & Related papers (2024-05-25T07:52:26Z) - Exploiting Machine Unlearning for Backdoor Attacks in Deep Learning
System [4.9233610638625604]
We propose a novel black-box backdoor attack based on machine unlearning.
The attacker first augments the training set with carefully designed samples, including poison and mitigation data, to train a benign' model.
Then, the attacker posts unlearning requests for the mitigation samples to remove the impact of relevant data on the model, gradually activating the hidden backdoor.
arXiv Detail & Related papers (2023-09-12T02:42:39Z) - Rethinking Backdoor Attacks [122.1008188058615]
In a backdoor attack, an adversary inserts maliciously constructed backdoor examples into a training set to make the resulting model vulnerable to manipulation.
Defending against such attacks typically involves viewing these inserted examples as outliers in the training set and using techniques from robust statistics to detect and remove them.
We show that without structural information about the training data distribution, backdoor attacks are indistinguishable from naturally-occurring features in the data.
arXiv Detail & Related papers (2023-07-19T17:44:54Z) - Narcissus: A Practical Clean-Label Backdoor Attack with Limited
Information [22.98039177091884]
"Clean-label" backdoor attacks require knowledge of the entire training set to be effective.
This paper provides an algorithm to mount clean-label backdoor attacks based only on the knowledge of representative examples from the target class.
Our attack works well across datasets and models, even when the trigger presents in the physical world.
arXiv Detail & Related papers (2022-04-11T16:58:04Z) - On the Effectiveness of Adversarial Training against Backdoor Attacks [111.8963365326168]
A backdoored model always predicts a target class in the presence of a predefined trigger pattern.
In general, adversarial training is believed to defend against backdoor attacks.
We propose a hybrid strategy which provides satisfactory robustness across different backdoor attacks.
arXiv Detail & Related papers (2022-02-22T02:24:46Z) - Black-box Detection of Backdoor Attacks with Limited Information and
Data [56.0735480850555]
We propose a black-box backdoor detection (B3D) method to identify backdoor attacks with only query access to the model.
In addition to backdoor detection, we also propose a simple strategy for reliable predictions using the identified backdoored models.
arXiv Detail & Related papers (2021-03-24T12:06:40Z) - Backdoor Learning: A Survey [75.59571756777342]
Backdoor attack intends to embed hidden backdoor into deep neural networks (DNNs)
Backdoor learning is an emerging and rapidly growing research area.
This paper presents the first comprehensive survey of this realm.
arXiv Detail & Related papers (2020-07-17T04:09:20Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.