Prompt as Triggers for Backdoor Attack: Examining the Vulnerability in
Language Models
- URL: http://arxiv.org/abs/2305.01219v6
- Date: Fri, 10 Nov 2023 11:28:53 GMT
- Title: Prompt as Triggers for Backdoor Attack: Examining the Vulnerability in
Language Models
- Authors: Shuai Zhao, Jinming Wen, Luu Anh Tuan, Junbo Zhao, Jie Fu
- Abstract summary: We propose ProAttack, a novel and efficient method for performing clean-label backdoor attacks based on the prompt.
Our method does not require external triggers and ensures correct labeling of poisoned samples, improving the stealthy nature of the backdoor attack.
- Score: 41.1058288041033
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The prompt-based learning paradigm, which bridges the gap between
pre-training and fine-tuning, achieves state-of-the-art performance on several
NLP tasks, particularly in few-shot settings. Despite being widely applied,
prompt-based learning is vulnerable to backdoor attacks. Textual backdoor
attacks are designed to introduce targeted vulnerabilities into models by
poisoning a subset of training samples through trigger injection and label
modification. However, they suffer from flaws such as abnormal natural language
expressions resulting from the trigger and incorrect labeling of poisoned
samples. In this study, we propose ProAttack, a novel and efficient method for
performing clean-label backdoor attacks based on the prompt, which uses the
prompt itself as a trigger. Our method does not require external triggers and
ensures correct labeling of poisoned samples, improving the stealthy nature of
the backdoor attack. With extensive experiments on rich-resource and few-shot
text classification tasks, we empirically validate ProAttack's competitive
performance in textual backdoor attacks. Notably, in the rich-resource setting,
ProAttack achieves state-of-the-art attack success rates in the clean-label
backdoor attack benchmark without external triggers.
Related papers
- A Study of Backdoors in Instruction Fine-tuned Language Models [16.10608633005216]
Backdoor data poisoning is a serious security concern due to the evasive nature of such attacks.
Such backdoor attacks can: alter response sentiment, violate censorship, over-refuse (invoke censorship for legitimate queries), inject false content, or trigger nonsense responses (hallucinations)
arXiv Detail & Related papers (2024-06-12T00:01:32Z) - SEEP: Training Dynamics Grounds Latent Representation Search for Mitigating Backdoor Poisoning Attacks [53.28390057407576]
Modern NLP models are often trained on public datasets drawn from diverse sources.
Data poisoning attacks can manipulate the model's behavior in ways engineered by the attacker.
Several strategies have been proposed to mitigate the risks associated with backdoor attacks.
arXiv Detail & Related papers (2024-05-19T14:50:09Z) - Does Few-shot Learning Suffer from Backdoor Attacks? [63.9864247424967]
We show that few-shot learning can still be vulnerable to backdoor attacks.
Our method demonstrates a high Attack Success Rate (ASR) in FSL tasks with different few-shot learning paradigms.
This study reveals that few-shot learning still suffers from backdoor attacks, and its security should be given attention.
arXiv Detail & Related papers (2023-12-31T06:43:36Z) - Attention-Enhancing Backdoor Attacks Against BERT-based Models [54.070555070629105]
Investigating the strategies of backdoor attacks will help to understand the model's vulnerability.
We propose a novel Trojan Attention Loss (TAL) which enhances the Trojan behavior by directly manipulating the attention patterns.
arXiv Detail & Related papers (2023-10-23T01:24:56Z) - Kallima: A Clean-label Framework for Textual Backdoor Attacks [25.332731545200808]
We propose the first clean-label framework Kallima for synthesizing mimesis-style backdoor samples.
We modify inputs belonging to the target class with adversarial perturbations, making the model rely more on the backdoor trigger.
arXiv Detail & Related papers (2022-06-03T21:44:43Z) - BITE: Textual Backdoor Attacks with Iterative Trigger Injection [24.76186072273438]
Backdoor attacks have become an emerging threat to NLP systems.
By providing poisoned training data, the adversary can embed a "backdoor" into the victim model.
We propose BITE, a backdoor attack that poisons the training data to establish strong correlations between the target label and a set of "trigger words"
arXiv Detail & Related papers (2022-05-25T11:58:38Z) - Hidden Killer: Invisible Textual Backdoor Attacks with Syntactic Trigger [48.59965356276387]
We propose to use syntactic structure as the trigger in textual backdoor attacks.
We conduct extensive experiments to demonstrate that the trigger-based attack method can achieve comparable attack performance.
These results also reveal the significant insidiousness and harmfulness of textual backdoor attacks.
arXiv Detail & Related papers (2021-05-26T08:54:19Z) - Backdoor Smoothing: Demystifying Backdoor Attacks on Deep Neural
Networks [25.23881974235643]
We show that backdoor attacks induce a smoother decision function around the triggered samples -- a phenomenon which we refer to as textitbackdoor smoothing.
Our experiments show that smoothness increases when the trigger is added to the input samples, and that this phenomenon is more pronounced for more successful attacks.
arXiv Detail & Related papers (2020-06-11T18:28:54Z) - Rethinking the Trigger of Backdoor Attack [83.98031510668619]
Currently, most of existing backdoor attacks adopted the setting of emphstatic trigger, $i.e.,$ triggers across the training and testing images follow the same appearance and are located in the same area.
We demonstrate that such an attack paradigm is vulnerable when the trigger in testing images is not consistent with the one used for training.
arXiv Detail & Related papers (2020-04-09T17:19:37Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.