BadNL: Backdoor Attacks against NLP Models with Semantic-preserving
Improvements
- URL: http://arxiv.org/abs/2006.01043v2
- Date: Mon, 4 Oct 2021 18:59:32 GMT
- Title: BadNL: Backdoor Attacks against NLP Models with Semantic-preserving
Improvements
- Authors: Xiaoyi Chen, Ahmed Salem, Dingfan Chen, Michael Backes, Shiqing Ma,
Qingni Shen, Zhonghai Wu, Yang Zhang
- Abstract summary: We propose BadNL, a general NLP backdoor attack framework including novel attack methods.
Our attacks achieve an almost perfect attack success rate with a negligible effect on the original model's utility.
- Score: 33.309299864983295
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Deep neural networks (DNNs) have progressed rapidly during the past decade
and have been deployed in various real-world applications. Meanwhile, DNN
models have been shown to be vulnerable to security and privacy attacks. One
such attack that has attracted a great deal of attention recently is the
backdoor attack. Specifically, the adversary poisons the target model's
training set to mislead any input with an added secret trigger to a target
class.
Previous backdoor attacks predominantly focus on computer vision (CV)
applications, such as image classification. In this paper, we perform a
systematic investigation of backdoor attack on NLP models, and propose BadNL, a
general NLP backdoor attack framework including novel attack methods.
Specifically, we propose three methods to construct triggers, namely BadChar,
BadWord, and BadSentence, including basic and semantic-preserving variants. Our
attacks achieve an almost perfect attack success rate with a negligible effect
on the original model's utility. For instance, using the BadChar, our backdoor
attack achieves a 98.9% attack success rate with yielding a utility improvement
of 1.5% on the SST-5 dataset when only poisoning 3% of the original set.
Moreover, we conduct a user study to prove that our triggers can well preserve
the semantics from humans perspective.
Related papers
- Does Few-shot Learning Suffer from Backdoor Attacks? [63.9864247424967]
We show that few-shot learning can still be vulnerable to backdoor attacks.
Our method demonstrates a high Attack Success Rate (ASR) in FSL tasks with different few-shot learning paradigms.
This study reveals that few-shot learning still suffers from backdoor attacks, and its security should be given attention.
arXiv Detail & Related papers (2023-12-31T06:43:36Z) - Attention-Enhancing Backdoor Attacks Against BERT-based Models [54.070555070629105]
Investigating the strategies of backdoor attacks will help to understand the model's vulnerability.
We propose a novel Trojan Attention Loss (TAL) which enhances the Trojan behavior by directly manipulating the attention patterns.
arXiv Detail & Related papers (2023-10-23T01:24:56Z) - Backdoor Attack with Sparse and Invisible Trigger [57.41876708712008]
Deep neural networks (DNNs) are vulnerable to backdoor attacks.
backdoor attack is an emerging yet threatening training-phase threat.
We propose a sparse and invisible backdoor attack (SIBA)
arXiv Detail & Related papers (2023-05-11T10:05:57Z) - Kallima: A Clean-label Framework for Textual Backdoor Attacks [25.332731545200808]
We propose the first clean-label framework Kallima for synthesizing mimesis-style backdoor samples.
We modify inputs belonging to the target class with adversarial perturbations, making the model rely more on the backdoor trigger.
arXiv Detail & Related papers (2022-06-03T21:44:43Z) - Narcissus: A Practical Clean-Label Backdoor Attack with Limited
Information [22.98039177091884]
"Clean-label" backdoor attacks require knowledge of the entire training set to be effective.
This paper provides an algorithm to mount clean-label backdoor attacks based only on the knowledge of representative examples from the target class.
Our attack works well across datasets and models, even when the trigger presents in the physical world.
arXiv Detail & Related papers (2022-04-11T16:58:04Z) - Black-box Detection of Backdoor Attacks with Limited Information and
Data [56.0735480850555]
We propose a black-box backdoor detection (B3D) method to identify backdoor attacks with only query access to the model.
In addition to backdoor detection, we also propose a simple strategy for reliable predictions using the identified backdoored models.
arXiv Detail & Related papers (2021-03-24T12:06:40Z) - Hidden Backdoor Attack against Semantic Segmentation Models [60.0327238844584]
The emphbackdoor attack intends to embed hidden backdoors in deep neural networks (DNNs) by poisoning training data.
We propose a novel attack paradigm, the emphfine-grained attack, where we treat the target label from the object-level instead of the image-level.
Experiments show that the proposed methods can successfully attack semantic segmentation models by poisoning only a small proportion of training data.
arXiv Detail & Related papers (2021-03-06T05:50:29Z) - Light Can Hack Your Face! Black-box Backdoor Attack on Face Recognition
Systems [0.0]
We propose a novel black-box backdoor attack technique on face recognition systems.
We show that the backdoor trigger can be quite effective, where the attack success rate can be up to $88%$.
We highlight that our study revealed a new physical backdoor attack, which calls for the attention of the security issue of the existing face recognition/verification techniques.
arXiv Detail & Related papers (2020-09-15T11:50:29Z) - Natural Backdoor Attack on Text Data [15.35163515187413]
In this paper, we propose the textitbackdoor attacks on NLP models.
We exploit the various attack strategies to generate trigger on text data and investigate different types of triggers based on modification scope, human recognition, and special cases.
The results show the excellent performance of with 100% backdoor attacks success rate and sacrificing of 0.83% on the text classification task.
arXiv Detail & Related papers (2020-06-29T16:40:14Z) - Defending against Backdoor Attack on Deep Neural Networks [98.45955746226106]
We study the so-called textitbackdoor attack, which injects a backdoor trigger to a small portion of training data.
Experiments show that our method could effectively decrease the attack success rate, and also hold a high classification accuracy for clean images.
arXiv Detail & Related papers (2020-02-26T02:03:00Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.