Defending Against Backdoor Attacks in Natural Language Generation
- URL: http://arxiv.org/abs/2106.01810v3
- Date: Mon, 9 Oct 2023 15:55:36 GMT
- Title: Defending Against Backdoor Attacks in Natural Language Generation
- Authors: Xiaofei Sun, Xiaoya Li, Yuxian Meng, Xiang Ao, Lingjuan Lyu, Jiwei Li
and Tianwei Zhang
- Abstract summary: We give a formal definition of backdoor attack and defense.
We investigate this problem on two important NLG tasks, machine translation and dialog generation.
We find that testing the backward probability of generating sources given targets yields effective defense performance against all different types of attacks.
- Score: 90.550383621687
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The frustratingly fragile nature of neural network models make current
natural language generation (NLG) systems prone to backdoor attacks and
generate malicious sequences that could be sexist or offensive. Unfortunately,
little effort has been invested to how backdoor attacks can affect current NLG
models and how to defend against these attacks. In this work, by giving a
formal definition of backdoor attack and defense, we investigate this problem
on two important NLG tasks, machine translation and dialog generation. Tailored
to the inherent nature of NLG models (e.g., producing a sequence of coherent
words given contexts), we design defending strategies against attacks. We find
that testing the backward probability of generating sources given targets
yields effective defense performance against all different types of attacks,
and is able to handle the {\it one-to-many} issue in many NLG tasks such as
dialog generation. We hope that this work can raise the awareness of backdoor
risks concealed in deep NLG systems and inspire more future work (both attack
and defense) towards this direction.
Related papers
- Breaking the False Sense of Security in Backdoor Defense through Re-Activation Attack [32.74007523929888]
We re-investigate the characteristics of backdoored models after defense.
We find that the original backdoors still exist in defense models derived from existing post-training defense strategies.
We empirically show that these dormant backdoors can be easily re-activated during inference.
arXiv Detail & Related papers (2024-05-25T08:57:30Z) - BadCLIP: Dual-Embedding Guided Backdoor Attack on Multimodal Contrastive
Learning [85.2564206440109]
This paper reveals the threats in this practical scenario that backdoor attacks can remain effective even after defenses.
We introduce the emphtoolns attack, which is resistant to backdoor detection and model fine-tuning defenses.
arXiv Detail & Related papers (2023-11-20T02:21:49Z) - Backdoor Attack with Sparse and Invisible Trigger [57.41876708712008]
Deep neural networks (DNNs) are vulnerable to backdoor attacks.
backdoor attack is an emerging yet threatening training-phase threat.
We propose a sparse and invisible backdoor attack (SIBA)
arXiv Detail & Related papers (2023-05-11T10:05:57Z) - Check Your Other Door! Establishing Backdoor Attacks in the Frequency
Domain [80.24811082454367]
We show the advantages of utilizing the frequency domain for establishing undetectable and powerful backdoor attacks.
We also show two possible defences that succeed against frequency-based backdoor attacks and possible ways for the attacker to bypass them.
arXiv Detail & Related papers (2021-09-12T12:44:52Z) - Turn the Combination Lock: Learnable Textual Backdoor Attacks via Word
Substitution [57.51117978504175]
Recent studies show that neural natural language processing (NLP) models are vulnerable to backdoor attacks.
Injected with backdoors, models perform normally on benign examples but produce attacker-specified predictions when the backdoor is activated.
We present invisible backdoors that are activated by a learnable combination of word substitution.
arXiv Detail & Related papers (2021-06-11T13:03:17Z) - ONION: A Simple and Effective Defense Against Textual Backdoor Attacks [91.83014758036575]
Backdoor attacks are a kind of emergent training-time threat to deep neural networks (DNNs)
In this paper, we propose a simple and effective textual backdoor defense named ONION.
Experiments demonstrate the effectiveness of our model in defending BiLSTM and BERT against five different backdoor attacks.
arXiv Detail & Related papers (2020-11-20T12:17:21Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.