Textual Backdoor Attacks Can Be More Harmful via Two Simple Tricks
- URL: http://arxiv.org/abs/2110.08247v1
- Date: Fri, 15 Oct 2021 17:58:46 GMT
- Title: Textual Backdoor Attacks Can Be More Harmful via Two Simple Tricks
- Authors: Yangyi Chen, Fanchao Qi, Zhiyuan Liu, Maosong Sun
- Abstract summary: In this paper, we find two simple tricks that can make existing textual backdoor attacks much more harmful.
The first trick is to add an extra training task to distinguish poisoned and clean data during the training of the victim model.
The second one is to use all the clean training data rather than remove the original clean data corresponding to the poisoned data.
- Score: 58.0225587881455
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Backdoor attacks are a kind of emergent security threat in deep learning.
When a deep neural model is injected with a backdoor, it will behave normally
on standard inputs but give adversary-specified predictions once the input
contains specific backdoor triggers. Current textual backdoor attacks have poor
attack performance in some tough situations. In this paper, we find two simple
tricks that can make existing textual backdoor attacks much more harmful. The
first trick is to add an extra training task to distinguish poisoned and clean
data during the training of the victim model, and the second one is to use all
the clean training data rather than remove the original clean data
corresponding to the poisoned data. These two tricks are universally applicable
to different attack models. We conduct experiments in three tough situations
including clean data fine-tuning, low poisoning rate, and label-consistent
attacks. Experimental results show that the two tricks can significantly
improve attack performance. This paper exhibits the great potential harmfulness
of backdoor attacks. All the code and data will be made public to facilitate
further research.
Related papers
- Beating Backdoor Attack at Its Own Game [10.131734154410763]
Deep neural networks (DNNs) are vulnerable to backdoor attack.
Existing defense methods have greatly reduced attack success rate.
We propose a highly effective framework which injects non-adversarial backdoors targeting poisoned samples.
arXiv Detail & Related papers (2023-07-28T13:07:42Z) - Rethinking Backdoor Attacks [122.1008188058615]
In a backdoor attack, an adversary inserts maliciously constructed backdoor examples into a training set to make the resulting model vulnerable to manipulation.
Defending against such attacks typically involves viewing these inserted examples as outliers in the training set and using techniques from robust statistics to detect and remove them.
We show that without structural information about the training data distribution, backdoor attacks are indistinguishable from naturally-occurring features in the data.
arXiv Detail & Related papers (2023-07-19T17:44:54Z) - Backdoor Attack with Sparse and Invisible Trigger [57.41876708712008]
Deep neural networks (DNNs) are vulnerable to backdoor attacks.
backdoor attack is an emerging yet threatening training-phase threat.
We propose a sparse and invisible backdoor attack (SIBA)
arXiv Detail & Related papers (2023-05-11T10:05:57Z) - Narcissus: A Practical Clean-Label Backdoor Attack with Limited
Information [22.98039177091884]
"Clean-label" backdoor attacks require knowledge of the entire training set to be effective.
This paper provides an algorithm to mount clean-label backdoor attacks based only on the knowledge of representative examples from the target class.
Our attack works well across datasets and models, even when the trigger presents in the physical world.
arXiv Detail & Related papers (2022-04-11T16:58:04Z) - On the Effectiveness of Adversarial Training against Backdoor Attacks [111.8963365326168]
A backdoored model always predicts a target class in the presence of a predefined trigger pattern.
In general, adversarial training is believed to defend against backdoor attacks.
We propose a hybrid strategy which provides satisfactory robustness across different backdoor attacks.
arXiv Detail & Related papers (2022-02-22T02:24:46Z) - Excess Capacity and Backdoor Poisoning [11.383869751239166]
A backdoor data poisoning attack is an adversarial attack wherein the attacker injects several watermarked, mislabeled training examples into a training set.
We present a formal theoretical framework within which one can discuss backdoor data poisoning attacks for classification problems.
arXiv Detail & Related papers (2021-09-02T03:04:38Z) - Hidden Killer: Invisible Textual Backdoor Attacks with Syntactic Trigger [48.59965356276387]
We propose to use syntactic structure as the trigger in textual backdoor attacks.
We conduct extensive experiments to demonstrate that the trigger-based attack method can achieve comparable attack performance.
These results also reveal the significant insidiousness and harmfulness of textual backdoor attacks.
arXiv Detail & Related papers (2021-05-26T08:54:19Z) - Clean-Label Backdoor Attacks on Video Recognition Models [87.46539956587908]
We show that image backdoor attacks are far less effective on videos.
We propose the use of a universal adversarial trigger as the backdoor trigger to attack video recognition models.
Our proposed backdoor attack is resistant to state-of-the-art backdoor defense/detection methods.
arXiv Detail & Related papers (2020-03-06T04:51:48Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.