Related papers: NOTABLE: Transferable Backdoor Attacks Against Prompt-based NLP Models

NOTABLE: Transferable Backdoor Attacks Against Prompt-based NLP Models

URL: http://arxiv.org/abs/2305.17826v1
Date: Sun, 28 May 2023 23:35:17 GMT
Title: NOTABLE: Transferable Backdoor Attacks Against Prompt-based NLP Models
Authors: Kai Mei, Zheng Li, Zhenting Wang, Yang Zhang, Shiqing Ma
Abstract summary: Prompt-based learning is vulnerable to backdoor attacks. We propose transferable backdoor attacks against prompt-based models, called NOTABLE. Notable injects backdoors into the encoders of PLMs by utilizing an adaptiver to bind triggers to specific words.
Score: 17.52386568785587
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Prompt-based learning is vulnerable to backdoor attacks. Existing backdoor attacks against prompt-based models consider injecting backdoors into the entire embedding layers or word embedding vectors. Such attacks can be easily affected by retraining on downstream tasks and with different prompting strategies, limiting the transferability of backdoor attacks. In this work, we propose transferable backdoor attacks against prompt-based models, called NOTABLE, which is independent of downstream tasks and prompting strategies. Specifically, NOTABLE injects backdoors into the encoders of PLMs by utilizing an adaptive verbalizer to bind triggers to specific words (i.e., anchors). It activates the backdoor by pasting input with triggers to reach adversary-desired anchors, achieving independence from downstream tasks and prompting strategies. We conduct experiments on six NLP tasks, three popular models, and three prompting strategies. Empirical results show that NOTABLE achieves superior attack performance (i.e., attack success rate over 90% on all the datasets), and outperforms two state-of-the-art baselines. Evaluations on three defenses show the robustness of NOTABLE. Our code can be found at https://github.com/RU-System-Software-and-Security/Notable.

Related papers

Revisiting Backdoor Attacks against Large Vision-Language Models [76.42014292255944]
This paper empirically examines the generalizability of backdoor attacks during the instruction tuning of LVLMs. We modify existing backdoor attacks based on the above key observations. This paper underscores that even simple traditional backdoor strategies pose a serious threat to LVLMs.
arXiv Detail & Related papers (2024-06-27T02:31:03Z)
Attention-Enhancing Backdoor Attacks Against BERT-based Models [54.070555070629105]
Investigating the strategies of backdoor attacks will help to understand the model's vulnerability. We propose a novel Trojan Attention Loss (TAL) which enhances the Trojan behavior by directly manipulating the attention patterns.
arXiv Detail & Related papers (2023-10-23T01:24:56Z)
Kallima: A Clean-label Framework for Textual Backdoor Attacks [25.332731545200808]
We propose the first clean-label framework Kallima for synthesizing mimesis-style backdoor samples. We modify inputs belonging to the target class with adversarial perturbations, making the model rely more on the backdoor trigger.
arXiv Detail & Related papers (2022-06-03T21:44:43Z)
BITE: Textual Backdoor Attacks with Iterative Trigger Injection [24.76186072273438]
Backdoor attacks have become an emerging threat to NLP systems. By providing poisoned training data, the adversary can embed a "backdoor" into the victim model. We propose BITE, a backdoor attack that poisons the training data to establish strong correlations between the target label and a set of "trigger words"
arXiv Detail & Related papers (2022-05-25T11:58:38Z)
Textual Backdoor Attacks Can Be More Harmful via Two Simple Tricks [58.0225587881455]
In this paper, we find two simple tricks that can make existing textual backdoor attacks much more harmful. The first trick is to add an extra training task to distinguish poisoned and clean data during the training of the victim model. The second one is to use all the clean training data rather than remove the original clean data corresponding to the poisoned data.
arXiv Detail & Related papers (2021-10-15T17:58:46Z)
Mind the Style of Text! Adversarial and Backdoor Attacks Based on Text Style Transfer [49.67011295450601]
We make the first attempt to conduct adversarial and backdoor attacks based on text style transfer. Experimental results show that popular NLP models are vulnerable to both adversarial and backdoor attacks based on text style transfer.
arXiv Detail & Related papers (2021-10-14T03:54:16Z)
Turn the Combination Lock: Learnable Textual Backdoor Attacks via Word Substitution [57.51117978504175]
Recent studies show that neural natural language processing (NLP) models are vulnerable to backdoor attacks. Injected with backdoors, models perform normally on benign examples but produce attacker-specified predictions when the backdoor is activated. We present invisible backdoors that are activated by a learnable combination of word substitution.
arXiv Detail & Related papers (2021-06-11T13:03:17Z)
Hidden Killer: Invisible Textual Backdoor Attacks with Syntactic Trigger [48.59965356276387]
We propose to use syntactic structure as the trigger in textual backdoor attacks. We conduct extensive experiments to demonstrate that the trigger-based attack method can achieve comparable attack performance. These results also reveal the significant insidiousness and harmfulness of textual backdoor attacks.
arXiv Detail & Related papers (2021-05-26T08:54:19Z)
ONION: A Simple and Effective Defense Against Textual Backdoor Attacks [91.83014758036575]
Backdoor attacks are a kind of emergent training-time threat to deep neural networks (DNNs) In this paper, we propose a simple and effective textual backdoor defense named ONION. Experiments demonstrate the effectiveness of our model in defending BiLSTM and BERT against five different backdoor attacks.
arXiv Detail & Related papers (2020-11-20T12:17:21Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.