Rethink Stealthy Backdoor Attacks in Natural Language Processing
- URL: http://arxiv.org/abs/2201.02993v1
- Date: Sun, 9 Jan 2022 12:34:12 GMT
- Title: Rethink Stealthy Backdoor Attacks in Natural Language Processing
- Authors: Lingfeng Shen, Haiyun Jiang, Lemao Liu, Shuming Shi
- Abstract summary: The capacity of stealthy backdoor attacks is overestimated when categorized as backdoor attacks.
We propose a new metric called attack successful rate difference (ASRD), which measures the ASR difference between clean state and poison state models.
Our method achieves significantly better performance than state-of-the-art defense methods against stealthy backdoor attacks.
- Score: 35.6803390044542
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Recently, it has been shown that natural language processing (NLP) models are
vulnerable to a kind of security threat called the Backdoor Attack, which
utilizes a `backdoor trigger' paradigm to mislead the models. The most
threatening backdoor attack is the stealthy backdoor, which defines the
triggers as text style or syntactic. Although they have achieved an incredible
high attack success rate (ASR), we find that the principal factor contributing
to their ASR is not the `backdoor trigger' paradigm. Thus the capacity of these
stealthy backdoor attacks is overestimated when categorized as backdoor
attacks. Therefore, to evaluate the real attack power of backdoor attacks, we
propose a new metric called attack successful rate difference (ASRD), which
measures the ASR difference between clean state and poison state models.
Besides, since the defenses against stealthy backdoor attacks are absent, we
propose Trigger Breaker, consisting of two too simple tricks that can defend
against stealthy backdoor attacks effectively. Experiments on text
classification tasks show that our method achieves significantly better
performance than state-of-the-art defense methods against stealthy backdoor
attacks.
Related papers
- A Spatiotemporal Stealthy Backdoor Attack against Cooperative Multi-Agent Deep Reinforcement Learning [12.535344011523897]
cooperative multi-agent deep reinforcement learning (c-MADRL) is under the threat of backdoor attacks.
We propose a novel backdoor attack against c-MADRL, which attacks entire multi-agent team by embedding backdoor only in one agent.
Our backdoor attacks are able to reach a high attack success rate (91.6%) while maintaining a low clean performance variance rate (3.7%)
arXiv Detail & Related papers (2024-09-12T06:17:37Z) - Breaking the False Sense of Security in Backdoor Defense through Re-Activation Attack [32.74007523929888]
We re-investigate the characteristics of backdoored models after defense.
We find that the original backdoors still exist in defense models derived from existing post-training defense strategies.
We empirically show that these dormant backdoors can be easily re-activated during inference.
arXiv Detail & Related papers (2024-05-25T08:57:30Z) - Mitigating Backdoor Attack by Injecting Proactive Defensive Backdoor [63.84477483795964]
Data-poisoning backdoor attacks are serious security threats to machine learning models.
In this paper, we focus on in-training backdoor defense, aiming to train a clean model even when the dataset may be potentially poisoned.
We propose a novel defense approach called PDB (Proactive Defensive Backdoor)
arXiv Detail & Related papers (2024-05-25T07:52:26Z) - BELT: Old-School Backdoor Attacks can Evade the State-of-the-Art Defense with Backdoor Exclusivity Lifting [21.91491621538245]
We propose and investigate a new characteristic of backdoor attacks, namely, backdoor exclusivity.
Backdoor exclusivity measures the ability of backdoor triggers to remain effective in the presence of input variation.
Our approach substantially enhances the stealthiness of four old-school backdoor attacks, at almost no cost of the attack success rate and normal utility.
arXiv Detail & Related papers (2023-12-08T08:35:16Z) - Attention-Enhancing Backdoor Attacks Against BERT-based Models [54.070555070629105]
Investigating the strategies of backdoor attacks will help to understand the model's vulnerability.
We propose a novel Trojan Attention Loss (TAL) which enhances the Trojan behavior by directly manipulating the attention patterns.
arXiv Detail & Related papers (2023-10-23T01:24:56Z) - Rethinking Backdoor Attacks [122.1008188058615]
In a backdoor attack, an adversary inserts maliciously constructed backdoor examples into a training set to make the resulting model vulnerable to manipulation.
Defending against such attacks typically involves viewing these inserted examples as outliers in the training set and using techniques from robust statistics to detect and remove them.
We show that without structural information about the training data distribution, backdoor attacks are indistinguishable from naturally-occurring features in the data.
arXiv Detail & Related papers (2023-07-19T17:44:54Z) - Turn the Combination Lock: Learnable Textual Backdoor Attacks via Word
Substitution [57.51117978504175]
Recent studies show that neural natural language processing (NLP) models are vulnerable to backdoor attacks.
Injected with backdoors, models perform normally on benign examples but produce attacker-specified predictions when the backdoor is activated.
We present invisible backdoors that are activated by a learnable combination of word substitution.
arXiv Detail & Related papers (2021-06-11T13:03:17Z) - Hidden Killer: Invisible Textual Backdoor Attacks with Syntactic Trigger [48.59965356276387]
We propose to use syntactic structure as the trigger in textual backdoor attacks.
We conduct extensive experiments to demonstrate that the trigger-based attack method can achieve comparable attack performance.
These results also reveal the significant insidiousness and harmfulness of textual backdoor attacks.
arXiv Detail & Related papers (2021-05-26T08:54:19Z) - Don't Trigger Me! A Triggerless Backdoor Attack Against Deep Neural
Networks [22.28270345106827]
Current state-of-the-art backdoor attacks require the adversary to modify the input, usually by adding a trigger to it, for the target model to activate the backdoor.
This added trigger not only increases the difficulty of launching the backdoor attack in the physical world, but also can be easily detected by multiple defense mechanisms.
We present the first triggerless backdoor attack against deep neural networks, where the adversary does not need to modify the input for triggering the backdoor.
arXiv Detail & Related papers (2020-10-07T09:01:39Z) - On Certifying Robustness against Backdoor Attacks via Randomized
Smoothing [74.79764677396773]
We study the feasibility and effectiveness of certifying robustness against backdoor attacks using a recent technique called randomized smoothing.
Our results show the theoretical feasibility of using randomized smoothing to certify robustness against backdoor attacks.
Existing randomized smoothing methods have limited effectiveness at defending against backdoor attacks.
arXiv Detail & Related papers (2020-02-26T19:15:46Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.