Model-Contrastive Learning for Backdoor Defense
- URL: http://arxiv.org/abs/2205.04411v1
- Date: Mon, 9 May 2022 16:36:46 GMT
- Title: Model-Contrastive Learning for Backdoor Defense
- Authors: Zhihao Yue, Jun Xia, Zhiwei Ling, Ting Wang, Xian Wei, Mingsong Chen
- Abstract summary: We propose a novel backdoor defense method named MCL based on model-contrastive learning.
MCL is more effective for reducing backdoor threats while maintaining higher accuracy of benign data.
- Score: 13.781375023320981
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Along with the popularity of Artificial Intelligence (AI) techniques, an
increasing number of backdoor injection attacks are designed to maliciously
threaten Deep Neural Networks (DNNs) deployed in safety-critical systems.
Although there exist various defense methods that can effectively erase
backdoor triggers from DNNs, they still greatly suffer from a non-negligible
Attack Success Rate (ASR) as well as a major loss in benign accuracy. Inspired
by the observation that a backdoored DNN will form new clusters in its feature
space for poisoned data, in this paper we propose a novel backdoor defense
method named MCL based on model-contrastive learning. Specifically,
model-contrastive learning to implement backdoor defense consists of two steps.
First, we use the backdoor attack trigger synthesis technique to invert the
trigger. Next, the inversion trigger is used to construct poisoned data, so
that model-contrastive learning can be used, which makes the feature
representations of poisoned data close to that of the benign data while staying
away from the original poisoned feature representations. Through extensive
experiments against five start-of-the-art attack methods on multiple benchmark
datasets, using only 5% of clean data, MCL is more effective for reducing
backdoor threats while maintaining higher accuracy of benign data. MCL can make
the benign accuracy degenerate by less than 1%.
Related papers
- "No Matter What You Do": Purifying GNN Models via Backdoor Unlearning [33.07926413485209]
backdoor attacks in GNNs lie in the fact that the attacker modifies a portion of graph data by embedding triggers.
We present GCleaner, the first backdoor mitigation method on GNNs.
GCleaner can reduce the backdoor attack success rate to 10% with only 1% of clean data, and has almost negligible degradation in model performance.
arXiv Detail & Related papers (2024-10-02T06:30:49Z) - Efficient Backdoor Defense in Multimodal Contrastive Learning: A Token-Level Unlearning Method for Mitigating Threats [52.94388672185062]
We propose an efficient defense mechanism against backdoor threats using a concept known as machine unlearning.
This entails strategically creating a small set of poisoned samples to aid the model's rapid unlearning of backdoor vulnerabilities.
In the backdoor unlearning process, we present a novel token-based portion unlearning training regime.
arXiv Detail & Related papers (2024-09-29T02:55:38Z) - UNIT: Backdoor Mitigation via Automated Neural Distribution Tightening [43.09750187130803]
Deep neural networks (DNNs) have demonstrated effectiveness in various fields.
DNNs are vulnerable to backdoor attacks, which inject a unique pattern, called trigger, into the input to cause misclassification to an attack-chosen target label.
In this paper, we introduce a novel post-training defense technique that can effectively eliminate backdoor effects for a variety of attacks.
arXiv Detail & Related papers (2024-07-16T04:33:05Z) - Reconstructive Neuron Pruning for Backdoor Defense [96.21882565556072]
We propose a novel defense called emphReconstructive Neuron Pruning (RNP) to expose and prune backdoor neurons.
In RNP, unlearning is operated at the neuron level while recovering is operated at the filter level, forming an asymmetric reconstructive learning procedure.
We show that such an asymmetric process on only a few clean samples can effectively expose and prune the backdoor neurons implanted by a wide range of attacks.
arXiv Detail & Related papers (2023-05-24T08:29:30Z) - Backdoor Attack with Sparse and Invisible Trigger [57.41876708712008]
Deep neural networks (DNNs) are vulnerable to backdoor attacks.
backdoor attack is an emerging yet threatening training-phase threat.
We propose a sparse and invisible backdoor attack (SIBA)
arXiv Detail & Related papers (2023-05-11T10:05:57Z) - Backdoor Defense via Deconfounded Representation Learning [17.28760299048368]
We propose a Causality-inspired Backdoor Defense (CBD) to learn deconfounded representations for reliable classification.
CBD is effective in reducing backdoor threats while maintaining high accuracy in predicting benign samples.
arXiv Detail & Related papers (2023-03-13T02:25:59Z) - Backdoor Defense via Suppressing Model Shortcuts [91.30995749139012]
In this paper, we explore the backdoor mechanism from the angle of the model structure.
We demonstrate that the attack success rate (ASR) decreases significantly when reducing the outputs of some key skip connections.
arXiv Detail & Related papers (2022-11-02T15:39:19Z) - Handcrafted Backdoors in Deep Neural Networks [33.21980707457639]
We introduce a handcrafted attack that directly manipulates the parameters of a pre-trained model to inject backdoors.
Our backdoors remain effective across four datasets and four network architectures with a success rate above 96%.
Our results suggest that further research is needed for understanding the complete space of supply-chain backdoor attacks.
arXiv Detail & Related papers (2021-06-08T20:58:23Z) - Black-box Detection of Backdoor Attacks with Limited Information and
Data [56.0735480850555]
We propose a black-box backdoor detection (B3D) method to identify backdoor attacks with only query access to the model.
In addition to backdoor detection, we also propose a simple strategy for reliable predictions using the identified backdoored models.
arXiv Detail & Related papers (2021-03-24T12:06:40Z) - Defending against Backdoor Attack on Deep Neural Networks [98.45955746226106]
We study the so-called textitbackdoor attack, which injects a backdoor trigger to a small portion of training data.
Experiments show that our method could effectively decrease the attack success rate, and also hold a high classification accuracy for clean images.
arXiv Detail & Related papers (2020-02-26T02:03:00Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.