NCL: Textual Backdoor Defense Using Noise-augmented Contrastive Learning
- URL: http://arxiv.org/abs/2303.01742v1
- Date: Fri, 3 Mar 2023 07:07:04 GMT
- Title: NCL: Textual Backdoor Defense Using Noise-augmented Contrastive Learning
- Authors: Shengfang Zhai, Qingni Shen, Xiaoyi Chen, Weilong Wang, Cong Li,
Yuejian Fang and Zhonghai Wu
- Abstract summary: We propose a Noise-augmented Contrastive Learning framework to defend against textual backdoor attacks.
Experiments demonstrate the effectiveness of our method in defending three types of textual backdoor attacks, outperforming the prior works.
- Score: 14.537250979495596
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: At present, backdoor attacks attract attention as they do great harm to deep
learning models. The adversary poisons the training data making the model being
injected with a backdoor after being trained unconsciously by victims using the
poisoned dataset. In the field of text, however, existing works do not provide
sufficient defense against backdoor attacks. In this paper, we propose a
Noise-augmented Contrastive Learning (NCL) framework to defend against textual
backdoor attacks when training models with untrustworthy data. With the aim of
mitigating the mapping between triggers and the target label, we add
appropriate noise perturbing possible backdoor triggers, augment the training
dataset, and then pull homology samples in the feature space utilizing
contrastive learning objective. Experiments demonstrate the effectiveness of
our method in defending three types of textual backdoor attacks, outperforming
the prior works.
Related papers
- Efficient Backdoor Defense in Multimodal Contrastive Learning: A Token-Level Unlearning Method for Mitigating Threats [52.94388672185062]
We propose an efficient defense mechanism against backdoor threats using a concept known as machine unlearning.
This entails strategically creating a small set of poisoned samples to aid the model's rapid unlearning of backdoor vulnerabilities.
In the backdoor unlearning process, we present a novel token-based portion unlearning training regime.
arXiv Detail & Related papers (2024-09-29T02:55:38Z) - DLP: towards active defense against backdoor attacks with decoupled learning process [2.686336957004475]
We propose a general training pipeline to defend against backdoor attacks.
We show that the model shows different learning behaviors in clean and poisoned subsets during training.
The effectiveness of our approach has been shown in numerous experiments across various backdoor attacks and datasets.
arXiv Detail & Related papers (2024-06-18T23:04:38Z) - Mitigating Backdoor Attack by Injecting Proactive Defensive Backdoor [63.84477483795964]
Data-poisoning backdoor attacks are serious security threats to machine learning models.
In this paper, we focus on in-training backdoor defense, aiming to train a clean model even when the dataset may be potentially poisoned.
We propose a novel defense approach called PDB (Proactive Defensive Backdoor)
arXiv Detail & Related papers (2024-05-25T07:52:26Z) - Rethinking Backdoor Attacks [122.1008188058615]
In a backdoor attack, an adversary inserts maliciously constructed backdoor examples into a training set to make the resulting model vulnerable to manipulation.
Defending against such attacks typically involves viewing these inserted examples as outliers in the training set and using techniques from robust statistics to detect and remove them.
We show that without structural information about the training data distribution, backdoor attacks are indistinguishable from naturally-occurring features in the data.
arXiv Detail & Related papers (2023-07-19T17:44:54Z) - Backdoor Attack with Sparse and Invisible Trigger [57.41876708712008]
Deep neural networks (DNNs) are vulnerable to backdoor attacks.
backdoor attack is an emerging yet threatening training-phase threat.
We propose a sparse and invisible backdoor attack (SIBA)
arXiv Detail & Related papers (2023-05-11T10:05:57Z) - Kallima: A Clean-label Framework for Textual Backdoor Attacks [25.332731545200808]
We propose the first clean-label framework Kallima for synthesizing mimesis-style backdoor samples.
We modify inputs belonging to the target class with adversarial perturbations, making the model rely more on the backdoor trigger.
arXiv Detail & Related papers (2022-06-03T21:44:43Z) - BITE: Textual Backdoor Attacks with Iterative Trigger Injection [24.76186072273438]
Backdoor attacks have become an emerging threat to NLP systems.
By providing poisoned training data, the adversary can embed a "backdoor" into the victim model.
We propose BITE, a backdoor attack that poisons the training data to establish strong correlations between the target label and a set of "trigger words"
arXiv Detail & Related papers (2022-05-25T11:58:38Z) - Backdoor Attack against NLP models with Robustness-Aware Perturbation
defense [0.0]
Backdoor attack intends to embed hidden backdoor into deep neural networks (DNNs)
In our work, we break this defense by controlling the robustness gap between poisoned and clean samples using adversarial training step.
arXiv Detail & Related papers (2022-04-08T10:08:07Z) - On the Effectiveness of Adversarial Training against Backdoor Attacks [111.8963365326168]
A backdoored model always predicts a target class in the presence of a predefined trigger pattern.
In general, adversarial training is believed to defend against backdoor attacks.
We propose a hybrid strategy which provides satisfactory robustness across different backdoor attacks.
arXiv Detail & Related papers (2022-02-22T02:24:46Z) - Hidden Killer: Invisible Textual Backdoor Attacks with Syntactic Trigger [48.59965356276387]
We propose to use syntactic structure as the trigger in textual backdoor attacks.
We conduct extensive experiments to demonstrate that the trigger-based attack method can achieve comparable attack performance.
These results also reveal the significant insidiousness and harmfulness of textual backdoor attacks.
arXiv Detail & Related papers (2021-05-26T08:54:19Z) - Black-box Detection of Backdoor Attacks with Limited Information and
Data [56.0735480850555]
We propose a black-box backdoor detection (B3D) method to identify backdoor attacks with only query access to the model.
In addition to backdoor detection, we also propose a simple strategy for reliable predictions using the identified backdoored models.
arXiv Detail & Related papers (2021-03-24T12:06:40Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.