Mind the Style of Text! Adversarial and Backdoor Attacks Based on Text
Style Transfer
- URL: http://arxiv.org/abs/2110.07139v1
- Date: Thu, 14 Oct 2021 03:54:16 GMT
- Title: Mind the Style of Text! Adversarial and Backdoor Attacks Based on Text
Style Transfer
- Authors: Fanchao Qi, Yangyi Chen, Xurui Zhang, Mukai Li, Zhiyuan Liu, Maosong
Sun
- Abstract summary: We make the first attempt to conduct adversarial and backdoor attacks based on text style transfer.
Experimental results show that popular NLP models are vulnerable to both adversarial and backdoor attacks based on text style transfer.
- Score: 49.67011295450601
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Adversarial attacks and backdoor attacks are two common security threats that
hang over deep learning. Both of them harness task-irrelevant features of data
in their implementation. Text style is a feature that is naturally irrelevant
to most NLP tasks, and thus suitable for adversarial and backdoor attacks. In
this paper, we make the first attempt to conduct adversarial and backdoor
attacks based on text style transfer, which is aimed at altering the style of a
sentence while preserving its meaning. We design an adversarial attack method
and a backdoor attack method, and conduct extensive experiments to evaluate
them. Experimental results show that popular NLP models are vulnerable to both
adversarial and backdoor attacks based on text style transfer -- the attack
success rates can exceed 90% without much effort. It reflects the limited
ability of NLP models to handle the feature of text style that has not been
widely realized. In addition, the style transfer-based adversarial and backdoor
attack methods show superiority to baselines in many aspects. All the code and
data of this paper can be obtained at https://github.com/thunlp/StyleAttack.
Related papers
- Attention-Enhancing Backdoor Attacks Against BERT-based Models [54.070555070629105]
Investigating the strategies of backdoor attacks will help to understand the model's vulnerability.
We propose a novel Trojan Attention Loss (TAL) which enhances the Trojan behavior by directly manipulating the attention patterns.
arXiv Detail & Related papers (2023-10-23T01:24:56Z) - NOTABLE: Transferable Backdoor Attacks Against Prompt-based NLP Models [17.52386568785587]
Prompt-based learning is vulnerable to backdoor attacks.
We propose transferable backdoor attacks against prompt-based models, called NOTABLE.
Notable injects backdoors into the encoders of PLMs by utilizing an adaptiver to bind triggers to specific words.
arXiv Detail & Related papers (2023-05-28T23:35:17Z) - Rethink Stealthy Backdoor Attacks in Natural Language Processing [35.6803390044542]
The capacity of stealthy backdoor attacks is overestimated when categorized as backdoor attacks.
We propose a new metric called attack successful rate difference (ASRD), which measures the ASR difference between clean state and poison state models.
Our method achieves significantly better performance than state-of-the-art defense methods against stealthy backdoor attacks.
arXiv Detail & Related papers (2022-01-09T12:34:12Z) - Textual Backdoor Attacks Can Be More Harmful via Two Simple Tricks [58.0225587881455]
In this paper, we find two simple tricks that can make existing textual backdoor attacks much more harmful.
The first trick is to add an extra training task to distinguish poisoned and clean data during the training of the victim model.
The second one is to use all the clean training data rather than remove the original clean data corresponding to the poisoned data.
arXiv Detail & Related papers (2021-10-15T17:58:46Z) - Turn the Combination Lock: Learnable Textual Backdoor Attacks via Word
Substitution [57.51117978504175]
Recent studies show that neural natural language processing (NLP) models are vulnerable to backdoor attacks.
Injected with backdoors, models perform normally on benign examples but produce attacker-specified predictions when the backdoor is activated.
We present invisible backdoors that are activated by a learnable combination of word substitution.
arXiv Detail & Related papers (2021-06-11T13:03:17Z) - Hidden Killer: Invisible Textual Backdoor Attacks with Syntactic Trigger [48.59965356276387]
We propose to use syntactic structure as the trigger in textual backdoor attacks.
We conduct extensive experiments to demonstrate that the trigger-based attack method can achieve comparable attack performance.
These results also reveal the significant insidiousness and harmfulness of textual backdoor attacks.
arXiv Detail & Related papers (2021-05-26T08:54:19Z) - ONION: A Simple and Effective Defense Against Textual Backdoor Attacks [91.83014758036575]
Backdoor attacks are a kind of emergent training-time threat to deep neural networks (DNNs)
In this paper, we propose a simple and effective textual backdoor defense named ONION.
Experiments demonstrate the effectiveness of our model in defending BiLSTM and BERT against five different backdoor attacks.
arXiv Detail & Related papers (2020-11-20T12:17:21Z) - Natural Backdoor Attack on Text Data [15.35163515187413]
In this paper, we propose the textitbackdoor attacks on NLP models.
We exploit the various attack strategies to generate trigger on text data and investigate different types of triggers based on modification scope, human recognition, and special cases.
The results show the excellent performance of with 100% backdoor attacks success rate and sacrificing of 0.83% on the text classification task.
arXiv Detail & Related papers (2020-06-29T16:40:14Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.