Related papers: ONION: A Simple and Effective Defense Against Textual Backdoor Attacks

ONION: A Simple and Effective Defense Against Textual Backdoor Attacks

URL: http://arxiv.org/abs/2011.10369v3
Date: Wed, 3 Nov 2021 18:21:00 GMT
Title: ONION: A Simple and Effective Defense Against Textual Backdoor Attacks
Authors: Fanchao Qi, Yangyi Chen, Mukai Li, Yuan Yao, Zhiyuan Liu, Maosong Sun
Abstract summary: Backdoor attacks are a kind of emergent training-time threat to deep neural networks (DNNs) In this paper, we propose a simple and effective textual backdoor defense named ONION. Experiments demonstrate the effectiveness of our model in defending BiLSTM and BERT against five different backdoor attacks.
Score: 91.83014758036575
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: Backdoor attacks are a kind of emergent training-time threat to deep neural networks (DNNs). They can manipulate the output of DNNs and possess high insidiousness. In the field of natural language processing, some attack methods have been proposed and achieve very high attack success rates on multiple popular models. Nevertheless, there are few studies on defending against textual backdoor attacks. In this paper, we propose a simple and effective textual backdoor defense named ONION, which is based on outlier word detection and, to the best of our knowledge, is the first method that can handle all the textual backdoor attack situations. Experiments demonstrate the effectiveness of our model in defending BiLSTM and BERT against five different backdoor attacks. All the code and data of this paper can be obtained at https://github.com/thunlp/ONION.

Related papers

UNIT: Backdoor Mitigation via Automated Neural Distribution Tightening [43.09750187130803]
Deep neural networks (DNNs) have demonstrated effectiveness in various fields. DNNs are vulnerable to backdoor attacks, which inject a unique pattern, called trigger, into the input to cause misclassification to an attack-chosen target label. In this paper, we introduce a novel post-training defense technique that can effectively eliminate backdoor effects for a variety of attacks.
arXiv Detail & Related papers (2024-07-16T04:33:05Z)
NOTABLE: Transferable Backdoor Attacks Against Prompt-based NLP Models [17.52386568785587]
Prompt-based learning is vulnerable to backdoor attacks. We propose transferable backdoor attacks against prompt-based models, called NOTABLE. Notable injects backdoors into the encoders of PLMs by utilizing an adaptiver to bind triggers to specific words.
arXiv Detail & Related papers (2023-05-28T23:35:17Z)
Backdoor Attack with Sparse and Invisible Trigger [57.41876708712008]
Deep neural networks (DNNs) are vulnerable to backdoor attacks. backdoor attack is an emerging yet threatening training-phase threat. We propose a sparse and invisible backdoor attack (SIBA)
arXiv Detail & Related papers (2023-05-11T10:05:57Z)
Mind the Style of Text! Adversarial and Backdoor Attacks Based on Text Style Transfer [49.67011295450601]
We make the first attempt to conduct adversarial and backdoor attacks based on text style transfer. Experimental results show that popular NLP models are vulnerable to both adversarial and backdoor attacks based on text style transfer.
arXiv Detail & Related papers (2021-10-14T03:54:16Z)
Turn the Combination Lock: Learnable Textual Backdoor Attacks via Word Substitution [57.51117978504175]
Recent studies show that neural natural language processing (NLP) models are vulnerable to backdoor attacks. Injected with backdoors, models perform normally on benign examples but produce attacker-specified predictions when the backdoor is activated. We present invisible backdoors that are activated by a learnable combination of word substitution.
arXiv Detail & Related papers (2021-06-11T13:03:17Z)
Hidden Killer: Invisible Textual Backdoor Attacks with Syntactic Trigger [48.59965356276387]
We propose to use syntactic structure as the trigger in textual backdoor attacks. We conduct extensive experiments to demonstrate that the trigger-based attack method can achieve comparable attack performance. These results also reveal the significant insidiousness and harmfulness of textual backdoor attacks.
arXiv Detail & Related papers (2021-05-26T08:54:19Z)
On Certifying Robustness against Backdoor Attacks via Randomized Smoothing [74.79764677396773]
We study the feasibility and effectiveness of certifying robustness against backdoor attacks using a recent technique called randomized smoothing. Our results show the theoretical feasibility of using randomized smoothing to certify robustness against backdoor attacks. Existing randomized smoothing methods have limited effectiveness at defending against backdoor attacks.
arXiv Detail & Related papers (2020-02-26T19:15:46Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.