Backdoor Attacks with Input-unique Triggers in NLP
- URL: http://arxiv.org/abs/2303.14325v1
- Date: Sat, 25 Mar 2023 01:41:54 GMT
- Title: Backdoor Attacks with Input-unique Triggers in NLP
- Authors: Xukun Zhou, Jiwei Li, Tianwei Zhang, Lingjuan Lyu, Muqiao Yang, Jun He
- Abstract summary: Backdoor attack aims at inducing neural models to make incorrect predictions for poison data while keeping predictions on the clean dataset unchanged.
In this paper, we propose an input-unique backdoor attack(NURA), where we generate backdoor triggers unique to inputs.
- Score: 34.98477726215485
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Backdoor attack aims at inducing neural models to make incorrect predictions
for poison data while keeping predictions on the clean dataset unchanged, which
creates a considerable threat to current natural language processing (NLP)
systems. Existing backdoor attacking systems face two severe issues:firstly,
most backdoor triggers follow a uniform and usually input-independent pattern,
e.g., insertion of specific trigger words, synonym replacement. This
significantly hinders the stealthiness of the attacking model, leading the
trained backdoor model being easily identified as malicious by model probes.
Secondly, trigger-inserted poisoned sentences are usually disfluent,
ungrammatical, or even change the semantic meaning from the original sentence,
making them being easily filtered in the pre-processing stage. To resolve these
two issues, in this paper, we propose an input-unique backdoor attack(NURA),
where we generate backdoor triggers unique to inputs. IDBA generates
context-related triggers by continuing writing the input with a language model
like GPT2. The generated sentence is used as the backdoor trigger. This
strategy not only creates input-unique backdoor triggers, but also preserves
the semantics of the original input, simultaneously resolving the two issues
above. Experimental results show that the IDBA attack is effective for attack
and difficult to defend: it achieves high attack success rate across all the
widely applied benchmarks, while is immune to existing defending methods. In
addition, it is able to generate fluent, grammatical, and diverse backdoor
inputs, which can hardly be recognized through human inspection.
Related papers
- T2IShield: Defending Against Backdoors on Text-to-Image Diffusion Models [70.03122709795122]
We propose a comprehensive defense method named T2IShield to detect, localize, and mitigate backdoor attacks.
We find the "Assimilation Phenomenon" on the cross-attention maps caused by the backdoor trigger.
For backdoor sample detection, T2IShield achieves a detection F1 score of 88.9$%$ with low computational cost.
arXiv Detail & Related papers (2024-07-05T01:53:21Z) - Defense Against Syntactic Textual Backdoor Attacks with Token Substitution [15.496176148454849]
It embeds carefully chosen triggers into a victim model at the training stage, and makes the model erroneously predict inputs containing the same triggers as a certain class.
This paper proposes a novel online defense algorithm that effectively counters syntax-based as well as special token-based backdoor attacks.
arXiv Detail & Related papers (2024-07-04T22:48:57Z) - From Shortcuts to Triggers: Backdoor Defense with Denoised PoE [51.287157951953226]
Language models are often at risk of diverse backdoor attacks, especially data poisoning.
Existing backdoor defense methods mainly focus on backdoor attacks with explicit triggers.
We propose an end-to-end ensemble-based backdoor defense framework, DPoE, to defend various backdoor attacks.
arXiv Detail & Related papers (2023-05-24T08:59:25Z) - Backdoor Attack with Sparse and Invisible Trigger [57.41876708712008]
Deep neural networks (DNNs) are vulnerable to backdoor attacks.
backdoor attack is an emerging yet threatening training-phase threat.
We propose a sparse and invisible backdoor attack (SIBA)
arXiv Detail & Related papers (2023-05-11T10:05:57Z) - Prompt as Triggers for Backdoor Attack: Examining the Vulnerability in
Language Models [41.1058288041033]
We propose ProAttack, a novel and efficient method for performing clean-label backdoor attacks based on the prompt.
Our method does not require external triggers and ensures correct labeling of poisoned samples, improving the stealthy nature of the backdoor attack.
arXiv Detail & Related papers (2023-05-02T06:19:36Z) - BATT: Backdoor Attack with Transformation-based Triggers [72.61840273364311]
Deep neural networks (DNNs) are vulnerable to backdoor attacks.
Backdoor adversaries inject hidden backdoors that can be activated by adversary-specified trigger patterns.
One recent research revealed that most of the existing attacks failed in the real physical world.
arXiv Detail & Related papers (2022-11-02T16:03:43Z) - BITE: Textual Backdoor Attacks with Iterative Trigger Injection [24.76186072273438]
Backdoor attacks have become an emerging threat to NLP systems.
By providing poisoned training data, the adversary can embed a "backdoor" into the victim model.
We propose BITE, a backdoor attack that poisons the training data to establish strong correlations between the target label and a set of "trigger words"
arXiv Detail & Related papers (2022-05-25T11:58:38Z) - Imperceptible Backdoor Attack: From Input Space to Feature
Representation [24.82632240825927]
Backdoor attacks are rapidly emerging threats to deep neural networks (DNNs)
In this paper, we analyze the drawbacks of existing attack approaches and propose a novel imperceptible backdoor attack.
Our trigger only modifies less than 1% pixels of a benign image while the magnitude is 1.
arXiv Detail & Related papers (2022-05-06T13:02:26Z) - Turn the Combination Lock: Learnable Textual Backdoor Attacks via Word
Substitution [57.51117978504175]
Recent studies show that neural natural language processing (NLP) models are vulnerable to backdoor attacks.
Injected with backdoors, models perform normally on benign examples but produce attacker-specified predictions when the backdoor is activated.
We present invisible backdoors that are activated by a learnable combination of word substitution.
arXiv Detail & Related papers (2021-06-11T13:03:17Z) - Hidden Killer: Invisible Textual Backdoor Attacks with Syntactic Trigger [48.59965356276387]
We propose to use syntactic structure as the trigger in textual backdoor attacks.
We conduct extensive experiments to demonstrate that the trigger-based attack method can achieve comparable attack performance.
These results also reveal the significant insidiousness and harmfulness of textual backdoor attacks.
arXiv Detail & Related papers (2021-05-26T08:54:19Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.