Narcissus: A Practical Clean-Label Backdoor Attack with Limited
Information
- URL: http://arxiv.org/abs/2204.05255v1
- Date: Mon, 11 Apr 2022 16:58:04 GMT
- Title: Narcissus: A Practical Clean-Label Backdoor Attack with Limited
Information
- Authors: Yi Zeng, Minzhou Pan, Hoang Anh Just, Lingjuan Lyu, Meikang Qiu and
Ruoxi Jia
- Abstract summary: "Clean-label" backdoor attacks require knowledge of the entire training set to be effective.
This paper provides an algorithm to mount clean-label backdoor attacks based only on the knowledge of representative examples from the target class.
Our attack works well across datasets and models, even when the trigger presents in the physical world.
- Score: 22.98039177091884
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Backdoor attacks insert malicious data into a training set so that, during
inference time, it misclassifies inputs that have been patched with a backdoor
trigger as the malware specified label. For backdoor attacks to bypass human
inspection, it is essential that the injected data appear to be correctly
labeled. The attacks with such property are often referred to as "clean-label
attacks." Existing clean-label backdoor attacks require knowledge of the entire
training set to be effective. Obtaining such knowledge is difficult or
impossible because training data are often gathered from multiple sources
(e.g., face images from different users). It remains a question whether
backdoor attacks still present a real threat.
This paper provides an affirmative answer to this question by designing an
algorithm to mount clean-label backdoor attacks based only on the knowledge of
representative examples from the target class. With poisoning equal to or less
than 0.5% of the target-class data and 0.05% of the training set, we can train
a model to classify test examples from arbitrary classes into the target class
when the examples are patched with a backdoor trigger. Our attack works well
across datasets and models, even when the trigger presents in the physical
world.
We explore the space of defenses and find that, surprisingly, our attack can
evade the latest state-of-the-art defenses in their vanilla form, or after a
simple twist, we can adapt to the downstream defenses. We study the cause of
the intriguing effectiveness and find that because the trigger synthesized by
our attack contains features as persistent as the original semantic features of
the target class, any attempt to remove such triggers would inevitably hurt the
model accuracy first.
Related papers
- Wicked Oddities: Selectively Poisoning for Effective Clean-Label Backdoor Attacks [11.390175856652856]
Clean-label attacks are a more stealthy form of backdoor attacks that can perform the attack without changing the labels of poisoned data.
We study different strategies for selectively poisoning a small set of training samples in the target class to boost the attack success rate.
Our threat model poses a serious threat in training machine learning models with third-party datasets.
arXiv Detail & Related papers (2024-07-15T15:38:21Z) - Clean-image Backdoor Attacks [34.051173092777844]
We propose clean-image backdoor attacks which uncover that backdoors can still be injected via a fraction of incorrect labels.
In our attacks, the attacker first seeks a trigger feature to divide the training images into two parts.
The backdoor will be finally implanted into the target model after it is trained on the poisoned data.
arXiv Detail & Related papers (2024-03-22T07:47:13Z) - Rethinking Backdoor Attacks [122.1008188058615]
In a backdoor attack, an adversary inserts maliciously constructed backdoor examples into a training set to make the resulting model vulnerable to manipulation.
Defending against such attacks typically involves viewing these inserted examples as outliers in the training set and using techniques from robust statistics to detect and remove them.
We show that without structural information about the training data distribution, backdoor attacks are indistinguishable from naturally-occurring features in the data.
arXiv Detail & Related papers (2023-07-19T17:44:54Z) - Backdoor Attack with Sparse and Invisible Trigger [57.41876708712008]
Deep neural networks (DNNs) are vulnerable to backdoor attacks.
backdoor attack is an emerging yet threatening training-phase threat.
We propose a sparse and invisible backdoor attack (SIBA)
arXiv Detail & Related papers (2023-05-11T10:05:57Z) - BagFlip: A Certified Defense against Data Poisoning [15.44806926189642]
BagFlip is a model-agnostic certified approach that can effectively defend against both trigger-less and backdoor attacks.
We evaluate BagFlip on image classification and malware detection datasets.
arXiv Detail & Related papers (2022-05-26T21:09:24Z) - BITE: Textual Backdoor Attacks with Iterative Trigger Injection [24.76186072273438]
Backdoor attacks have become an emerging threat to NLP systems.
By providing poisoned training data, the adversary can embed a "backdoor" into the victim model.
We propose BITE, a backdoor attack that poisons the training data to establish strong correlations between the target label and a set of "trigger words"
arXiv Detail & Related papers (2022-05-25T11:58:38Z) - Sleeper Agent: Scalable Hidden Trigger Backdoors for Neural Networks
Trained from Scratch [99.90716010490625]
Backdoor attackers tamper with training data to embed a vulnerability in models that are trained on that data.
This vulnerability is then activated at inference time by placing a "trigger" into the model's input.
We develop a new hidden trigger attack, Sleeper Agent, which employs gradient matching, data selection, and target model re-training during the crafting process.
arXiv Detail & Related papers (2021-06-16T17:09:55Z) - Backdoor Attack in the Physical World [49.64799477792172]
Backdoor attack intends to inject hidden backdoor into the deep neural networks (DNNs)
Most existing backdoor attacks adopted the setting of static trigger, $i.e.,$ triggers across the training and testing images.
We demonstrate that this attack paradigm is vulnerable when the trigger in testing images is not consistent with the one used for training.
arXiv Detail & Related papers (2021-04-06T08:37:33Z) - Hidden Backdoor Attack against Semantic Segmentation Models [60.0327238844584]
The emphbackdoor attack intends to embed hidden backdoors in deep neural networks (DNNs) by poisoning training data.
We propose a novel attack paradigm, the emphfine-grained attack, where we treat the target label from the object-level instead of the image-level.
Experiments show that the proposed methods can successfully attack semantic segmentation models by poisoning only a small proportion of training data.
arXiv Detail & Related papers (2021-03-06T05:50:29Z) - Rethinking the Trigger of Backdoor Attack [83.98031510668619]
Currently, most of existing backdoor attacks adopted the setting of emphstatic trigger, $i.e.,$ triggers across the training and testing images follow the same appearance and are located in the same area.
We demonstrate that such an attack paradigm is vulnerable when the trigger in testing images is not consistent with the one used for training.
arXiv Detail & Related papers (2020-04-09T17:19:37Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.