Turn the Combination Lock: Learnable Textual Backdoor Attacks via Word
Substitution
- URL: http://arxiv.org/abs/2106.06361v1
- Date: Fri, 11 Jun 2021 13:03:17 GMT
- Title: Turn the Combination Lock: Learnable Textual Backdoor Attacks via Word
Substitution
- Authors: Fanchao Qi, Yuan Yao, Sophia Xu, Zhiyuan Liu, Maosong Sun
- Abstract summary: Recent studies show that neural natural language processing (NLP) models are vulnerable to backdoor attacks.
Injected with backdoors, models perform normally on benign examples but produce attacker-specified predictions when the backdoor is activated.
We present invisible backdoors that are activated by a learnable combination of word substitution.
- Score: 57.51117978504175
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Recent studies show that neural natural language processing (NLP) models are
vulnerable to backdoor attacks. Injected with backdoors, models perform
normally on benign examples but produce attacker-specified predictions when the
backdoor is activated, presenting serious security threats to real-world
applications. Since existing textual backdoor attacks pay little attention to
the invisibility of backdoors, they can be easily detected and blocked. In this
work, we present invisible backdoors that are activated by a learnable
combination of word substitution. We show that NLP models can be injected with
backdoors that lead to a nearly 100% attack success rate, whereas being highly
invisible to existing defense strategies and even human inspections. The
results raise a serious alarm to the security of NLP models, which requires
further research to be resolved. All the data and code of this paper are
released at https://github.com/thunlp/BkdAtk-LWS.
Related papers
- Flatness-aware Sequential Learning Generates Resilient Backdoors [7.969181278996343]
Recently, backdoor attacks have become an emerging threat to the security of machine learning models.
This paper counters CF of backdoors by leveraging continual learning (CL) techniques.
We propose a novel framework, named Sequential Backdoor Learning (SBL), that can generate resilient backdoors.
arXiv Detail & Related papers (2024-07-20T03:30:05Z) - Unelicitable Backdoors in Language Models via Cryptographic Transformer Circuits [1.1118610055902116]
We introduce a novel class of backdoors in autoregressive transformer models, that, in contrast to prior art, are unelicitable in nature.
Unelicitability prevents the defender from triggering the backdoor, making it impossible to evaluate or detect ahead of deployment.
We show that our novel construction is not only unelicitable thanks to using cryptographic techniques, but also has favourable robustness properties.
arXiv Detail & Related papers (2024-06-03T17:55:41Z) - Neurotoxin: Durable Backdoors in Federated Learning [73.82725064553827]
federated learning systems have an inherent vulnerability during their training to adversarial backdoor attacks.
We propose Neurotoxin, a simple one-line modification to existing backdoor attacks that acts by attacking parameters that are changed less in magnitude during training.
arXiv Detail & Related papers (2022-06-12T16:52:52Z) - Hidden Killer: Invisible Textual Backdoor Attacks with Syntactic Trigger [48.59965356276387]
We propose to use syntactic structure as the trigger in textual backdoor attacks.
We conduct extensive experiments to demonstrate that the trigger-based attack method can achieve comparable attack performance.
These results also reveal the significant insidiousness and harmfulness of textual backdoor attacks.
arXiv Detail & Related papers (2021-05-26T08:54:19Z) - Black-box Detection of Backdoor Attacks with Limited Information and
Data [56.0735480850555]
We propose a black-box backdoor detection (B3D) method to identify backdoor attacks with only query access to the model.
In addition to backdoor detection, we also propose a simple strategy for reliable predictions using the identified backdoored models.
arXiv Detail & Related papers (2021-03-24T12:06:40Z) - ONION: A Simple and Effective Defense Against Textual Backdoor Attacks [91.83014758036575]
Backdoor attacks are a kind of emergent training-time threat to deep neural networks (DNNs)
In this paper, we propose a simple and effective textual backdoor defense named ONION.
Experiments demonstrate the effectiveness of our model in defending BiLSTM and BERT against five different backdoor attacks.
arXiv Detail & Related papers (2020-11-20T12:17:21Z) - Backdoor Learning: A Survey [75.59571756777342]
Backdoor attack intends to embed hidden backdoor into deep neural networks (DNNs)
Backdoor learning is an emerging and rapidly growing research area.
This paper presents the first comprehensive survey of this realm.
arXiv Detail & Related papers (2020-07-17T04:09:20Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.