BadPre: Task-agnostic Backdoor Attacks to Pre-trained NLP Foundation
Models
- URL: http://arxiv.org/abs/2110.02467v1
- Date: Wed, 6 Oct 2021 02:48:58 GMT
- Title: BadPre: Task-agnostic Backdoor Attacks to Pre-trained NLP Foundation
Models
- Authors: Kangjie Chen, Yuxian Meng, Xiaofei Sun, Shangwei Guo, Tianwei Zhang,
Jiwei Li and Chun Fan
- Abstract summary: We propose Name, the first task-agnostic backdoor attack against pre-trained NLP models.
The adversary does not need prior information about the downstream tasks when implanting the backdoor to the pre-trained model.
Experimental results indicate that our approach can compromise a wide range of downstream NLP tasks in an effective and stealthy way.
- Score: 25.938195038044448
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Pre-trained Natural Language Processing (NLP) models can be easily adapted to
a variety of downstream language tasks. This significantly accelerates the
development of language models. However, NLP models have been shown to be
vulnerable to backdoor attacks, where a pre-defined trigger word in the input
text causes model misprediction. Previous NLP backdoor attacks mainly focus on
some specific tasks. This makes those attacks less general and applicable to
other kinds of NLP models and tasks. In this work, we propose \Name, the first
task-agnostic backdoor attack against the pre-trained NLP models. The key
feature of our attack is that the adversary does not need prior information
about the downstream tasks when implanting the backdoor to the pre-trained
model. When this malicious model is released, any downstream models transferred
from it will also inherit the backdoor, even after the extensive transfer
learning process. We further design a simple yet effective strategy to bypass a
state-of-the-art defense. Experimental results indicate that our approach can
compromise a wide range of downstream NLP tasks in an effective and stealthy
way.
Related papers
- TrojFM: Resource-efficient Backdoor Attacks against Very Large Foundation Models [69.37990698561299]
TrojFM is a novel backdoor attack tailored for very large foundation models.
Our approach injects backdoors by fine-tuning only a very small proportion of model parameters.
We demonstrate that TrojFM can launch effective backdoor attacks against widely used large GPT-style models.
arXiv Detail & Related papers (2024-05-27T03:10:57Z) - Mitigating Backdoor Attack by Injecting Proactive Defensive Backdoor [63.84477483795964]
Data-poisoning backdoor attacks are serious security threats to machine learning models.
In this paper, we focus on in-training backdoor defense, aiming to train a clean model even when the dataset may be potentially poisoned.
We propose a novel defense approach called PDB (Proactive Defensive Backdoor)
arXiv Detail & Related papers (2024-05-25T07:52:26Z) - Setting the Trap: Capturing and Defeating Backdoors in Pretrained
Language Models through Honeypots [68.84056762301329]
Recent research has exposed the susceptibility of pretrained language models (PLMs) to backdoor attacks.
We propose and integrate a honeypot module into the original PLM to absorb backdoor information exclusively.
Our design is motivated by the observation that lower-layer representations in PLMs carry sufficient backdoor features.
arXiv Detail & Related papers (2023-10-28T08:21:16Z) - Training-free Lexical Backdoor Attacks on Language Models [30.91728116238065]
We propose Training-Free Lexical Backdoor Attack (TFLexAttack) as the first training-free backdoor attack on language models.
Our attack is achieved by injecting lexical triggers into the tokenizer of a language model via manipulating its embedding dictionary.
We conduct extensive experiments on three dominant NLP tasks based on nine language models to demonstrate the effectiveness and universality of our attack.
arXiv Detail & Related papers (2023-02-08T15:18:51Z) - A Survey on Backdoor Attack and Defense in Natural Language Processing [18.29835890570319]
We conduct a comprehensive review of backdoor attacks and defenses in the field of NLP.
We summarize benchmark datasets and point out the open issues to design credible systems to defend against backdoor attacks.
arXiv Detail & Related papers (2022-11-22T02:35:12Z) - MSDT: Masked Language Model Scoring Defense in Text Domain [16.182765935007254]
We will introduce a novel improved textual backdoor defense method, named MSDT, that outperforms the current existing defensive algorithms in specific datasets.
experimental results illustrate that our method can be effective and constructive in terms of defending against backdoor attack in text domain.
arXiv Detail & Related papers (2022-11-10T06:46:47Z) - Fine-mixing: Mitigating Backdoors in Fine-tuned Language Models [48.82102540209956]
Deep Neural Networks (DNNs) are known to be vulnerable to backdoor attacks.
In Natural Language Processing (NLP), DNNs are often backdoored during the fine-tuning process of a large-scale Pre-trained Language Model (PLM) with poisoned samples.
In this work, we take the first step to exploit the pre-trained (unfine-tuned) weights to mitigate backdoors in fine-tuned language models.
arXiv Detail & Related papers (2022-10-18T02:44:38Z) - Backdoor Pre-trained Models Can Transfer to All [33.720258110911274]
We propose a new approach to map the inputs containing triggers directly to a predefined output representation of pre-trained NLP models.
In light of the unique properties of triggers in NLP, we propose two new metrics to measure the performance of backdoor attacks.
arXiv Detail & Related papers (2021-10-30T07:11:24Z) - Turn the Combination Lock: Learnable Textual Backdoor Attacks via Word
Substitution [57.51117978504175]
Recent studies show that neural natural language processing (NLP) models are vulnerable to backdoor attacks.
Injected with backdoors, models perform normally on benign examples but produce attacker-specified predictions when the backdoor is activated.
We present invisible backdoors that are activated by a learnable combination of word substitution.
arXiv Detail & Related papers (2021-06-11T13:03:17Z) - Red Alarm for Pre-trained Models: Universal Vulnerability to
Neuron-Level Backdoor Attacks [98.15243373574518]
Pre-trained models (PTMs) have been widely used in various downstream tasks.
In this work, we demonstrate the universal vulnerability of PTMs, where fine-tuned PTMs can be easily controlled by backdoor attacks.
arXiv Detail & Related papers (2021-01-18T10:18:42Z) - Natural Backdoor Attack on Text Data [15.35163515187413]
In this paper, we propose the textitbackdoor attacks on NLP models.
We exploit the various attack strategies to generate trigger on text data and investigate different types of triggers based on modification scope, human recognition, and special cases.
The results show the excellent performance of with 100% backdoor attacks success rate and sacrificing of 0.83% on the text classification task.
arXiv Detail & Related papers (2020-06-29T16:40:14Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.