Natural Backdoor Attack on Text Data
- URL: http://arxiv.org/abs/2006.16176v4
- Date: Fri, 15 Jan 2021 14:07:09 GMT
- Title: Natural Backdoor Attack on Text Data
- Authors: Lichao Sun
- Abstract summary: In this paper, we propose the textitbackdoor attacks on NLP models.
We exploit the various attack strategies to generate trigger on text data and investigate different types of triggers based on modification scope, human recognition, and special cases.
The results show the excellent performance of with 100% backdoor attacks success rate and sacrificing of 0.83% on the text classification task.
- Score: 15.35163515187413
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Recently, advanced NLP models have seen a surge in the usage of various
applications. This raises the security threats of the released models. In
addition to the clean models' unintentional weaknesses, {\em i.e.,} adversarial
attacks, the poisoned models with malicious intentions are much more dangerous
in real life. However, most existing works currently focus on the adversarial
attacks on NLP models instead of positioning attacks, also named
\textit{backdoor attacks}. In this paper, we first propose the \textit{natural
backdoor attacks} on NLP models. Moreover, we exploit the various attack
strategies to generate trigger on text data and investigate different types of
triggers based on modification scope, human recognition, and special cases.
Last, we evaluate the backdoor attacks, and the results show the excellent
performance of with 100\% backdoor attacks success rate and sacrificing of
0.83\% on the text classification task.
Related papers
- Attention-Enhancing Backdoor Attacks Against BERT-based Models [54.070555070629105]
Investigating the strategies of backdoor attacks will help to understand the model's vulnerability.
We propose a novel Trojan Attention Loss (TAL) which enhances the Trojan behavior by directly manipulating the attention patterns.
arXiv Detail & Related papers (2023-10-23T01:24:56Z) - Backdoor Attack with Sparse and Invisible Trigger [57.41876708712008]
Deep neural networks (DNNs) are vulnerable to backdoor attacks.
backdoor attack is an emerging yet threatening training-phase threat.
We propose a sparse and invisible backdoor attack (SIBA)
arXiv Detail & Related papers (2023-05-11T10:05:57Z) - Detecting Backdoors in Deep Text Classifiers [43.36440869257781]
We present the first robust defence mechanism that generalizes to several backdoor attacks against text classification models.
Our technique is highly accurate at defending against state-of-the-art backdoor attacks, including data poisoning and weight poisoning.
arXiv Detail & Related papers (2022-10-11T07:48:03Z) - Backdoor Pre-trained Models Can Transfer to All [33.720258110911274]
We propose a new approach to map the inputs containing triggers directly to a predefined output representation of pre-trained NLP models.
In light of the unique properties of triggers in NLP, we propose two new metrics to measure the performance of backdoor attacks.
arXiv Detail & Related papers (2021-10-30T07:11:24Z) - Mind the Style of Text! Adversarial and Backdoor Attacks Based on Text
Style Transfer [49.67011295450601]
We make the first attempt to conduct adversarial and backdoor attacks based on text style transfer.
Experimental results show that popular NLP models are vulnerable to both adversarial and backdoor attacks based on text style transfer.
arXiv Detail & Related papers (2021-10-14T03:54:16Z) - Turn the Combination Lock: Learnable Textual Backdoor Attacks via Word
Substitution [57.51117978504175]
Recent studies show that neural natural language processing (NLP) models are vulnerable to backdoor attacks.
Injected with backdoors, models perform normally on benign examples but produce attacker-specified predictions when the backdoor is activated.
We present invisible backdoors that are activated by a learnable combination of word substitution.
arXiv Detail & Related papers (2021-06-11T13:03:17Z) - Hidden Killer: Invisible Textual Backdoor Attacks with Syntactic Trigger [48.59965356276387]
We propose to use syntactic structure as the trigger in textual backdoor attacks.
We conduct extensive experiments to demonstrate that the trigger-based attack method can achieve comparable attack performance.
These results also reveal the significant insidiousness and harmfulness of textual backdoor attacks.
arXiv Detail & Related papers (2021-05-26T08:54:19Z) - Hidden Backdoor Attack against Semantic Segmentation Models [60.0327238844584]
The emphbackdoor attack intends to embed hidden backdoors in deep neural networks (DNNs) by poisoning training data.
We propose a novel attack paradigm, the emphfine-grained attack, where we treat the target label from the object-level instead of the image-level.
Experiments show that the proposed methods can successfully attack semantic segmentation models by poisoning only a small proportion of training data.
arXiv Detail & Related papers (2021-03-06T05:50:29Z) - BadNL: Backdoor Attacks against NLP Models with Semantic-preserving
Improvements [33.309299864983295]
We propose BadNL, a general NLP backdoor attack framework including novel attack methods.
Our attacks achieve an almost perfect attack success rate with a negligible effect on the original model's utility.
arXiv Detail & Related papers (2020-06-01T16:17:14Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.