ONION: A Simple and Effective Defense Against Textual Backdoor Attacks
- URL: http://arxiv.org/abs/2011.10369v3
- Date: Wed, 3 Nov 2021 18:21:00 GMT
- Title: ONION: A Simple and Effective Defense Against Textual Backdoor Attacks
- Authors: Fanchao Qi, Yangyi Chen, Mukai Li, Yuan Yao, Zhiyuan Liu, Maosong Sun
- Abstract summary: Backdoor attacks are a kind of emergent training-time threat to deep neural networks (DNNs)
In this paper, we propose a simple and effective textual backdoor defense named ONION.
Experiments demonstrate the effectiveness of our model in defending BiLSTM and BERT against five different backdoor attacks.
- Score: 91.83014758036575
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Backdoor attacks are a kind of emergent training-time threat to deep neural
networks (DNNs). They can manipulate the output of DNNs and possess high
insidiousness. In the field of natural language processing, some attack methods
have been proposed and achieve very high attack success rates on multiple
popular models. Nevertheless, there are few studies on defending against
textual backdoor attacks. In this paper, we propose a simple and effective
textual backdoor defense named ONION, which is based on outlier word detection
and, to the best of our knowledge, is the first method that can handle all the
textual backdoor attack situations. Experiments demonstrate the effectiveness
of our model in defending BiLSTM and BERT against five different backdoor
attacks. All the code and data of this paper can be obtained at
https://github.com/thunlp/ONION.
Related papers
- UNIT: Backdoor Mitigation via Automated Neural Distribution Tightening [43.09750187130803]
Deep neural networks (DNNs) have demonstrated effectiveness in various fields.
DNNs are vulnerable to backdoor attacks, which inject a unique pattern, called trigger, into the input to cause misclassification to an attack-chosen target label.
In this paper, we introduce a novel post-training defense technique that can effectively eliminate backdoor effects for a variety of attacks.
arXiv Detail & Related papers (2024-07-16T04:33:05Z) - NOTABLE: Transferable Backdoor Attacks Against Prompt-based NLP Models [17.52386568785587]
Prompt-based learning is vulnerable to backdoor attacks.
We propose transferable backdoor attacks against prompt-based models, called NOTABLE.
Notable injects backdoors into the encoders of PLMs by utilizing an adaptiver to bind triggers to specific words.
arXiv Detail & Related papers (2023-05-28T23:35:17Z) - Backdoor Attack with Sparse and Invisible Trigger [57.41876708712008]
Deep neural networks (DNNs) are vulnerable to backdoor attacks.
backdoor attack is an emerging yet threatening training-phase threat.
We propose a sparse and invisible backdoor attack (SIBA)
arXiv Detail & Related papers (2023-05-11T10:05:57Z) - Mind the Style of Text! Adversarial and Backdoor Attacks Based on Text
Style Transfer [49.67011295450601]
We make the first attempt to conduct adversarial and backdoor attacks based on text style transfer.
Experimental results show that popular NLP models are vulnerable to both adversarial and backdoor attacks based on text style transfer.
arXiv Detail & Related papers (2021-10-14T03:54:16Z) - Turn the Combination Lock: Learnable Textual Backdoor Attacks via Word
Substitution [57.51117978504175]
Recent studies show that neural natural language processing (NLP) models are vulnerable to backdoor attacks.
Injected with backdoors, models perform normally on benign examples but produce attacker-specified predictions when the backdoor is activated.
We present invisible backdoors that are activated by a learnable combination of word substitution.
arXiv Detail & Related papers (2021-06-11T13:03:17Z) - Hidden Killer: Invisible Textual Backdoor Attacks with Syntactic Trigger [48.59965356276387]
We propose to use syntactic structure as the trigger in textual backdoor attacks.
We conduct extensive experiments to demonstrate that the trigger-based attack method can achieve comparable attack performance.
These results also reveal the significant insidiousness and harmfulness of textual backdoor attacks.
arXiv Detail & Related papers (2021-05-26T08:54:19Z) - On Certifying Robustness against Backdoor Attacks via Randomized
Smoothing [74.79764677396773]
We study the feasibility and effectiveness of certifying robustness against backdoor attacks using a recent technique called randomized smoothing.
Our results show the theoretical feasibility of using randomized smoothing to certify robustness against backdoor attacks.
Existing randomized smoothing methods have limited effectiveness at defending against backdoor attacks.
arXiv Detail & Related papers (2020-02-26T19:15:46Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.