Label Poisoning is All You Need
- URL: http://arxiv.org/abs/2310.18933v1
- Date: Sun, 29 Oct 2023 08:03:45 GMT
- Title: Label Poisoning is All You Need
- Authors: Rishi D. Jha, Jonathan Hayase, Sewoong Oh
- Abstract summary: In a backdoor attack, an adversary injects corrupted data into a model's training dataset in order to gain control over its predictions on images.
We introduce a novel approach to design label-only backdoor attacks, which we call FLIP.
FLIP achieves a near-perfect attack success rate of 99.4% while suffering only a 1.8% drop in the clean test accuracy.
- Score: 38.23099403381095
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In a backdoor attack, an adversary injects corrupted data into a model's
training dataset in order to gain control over its predictions on images with a
specific attacker-defined trigger. A typical corrupted training example
requires altering both the image, by applying the trigger, and the label.
Models trained on clean images, therefore, were considered safe from backdoor
attacks. However, in some common machine learning scenarios, the training
labels are provided by potentially malicious third-parties. This includes
crowd-sourced annotation and knowledge distillation. We, hence, investigate a
fundamental question: can we launch a successful backdoor attack by only
corrupting labels? We introduce a novel approach to design label-only backdoor
attacks, which we call FLIP, and demonstrate its strengths on three datasets
(CIFAR-10, CIFAR-100, and Tiny-ImageNet) and four architectures (ResNet-32,
ResNet-18, VGG-19, and Vision Transformer). With only 2% of CIFAR-10 labels
corrupted, FLIP achieves a near-perfect attack success rate of 99.4% while
suffering only a 1.8% drop in the clean test accuracy. Our approach builds upon
the recent advances in trajectory matching, originally introduced for dataset
distillation.
Related papers
- Distributed Backdoor Attacks on Federated Graph Learning and Certified Defenses [50.53476890313741]
We propose an effective, stealthy, and persistent backdoor attack on FedGL.
We develop a certified defense for any backdoored FedGL model against the trigger with any shape at any location.
Our attack results show our attack can obtain > 90% backdoor accuracy in almost all datasets.
arXiv Detail & Related papers (2024-07-12T02:43:44Z) - Clean-image Backdoor Attacks [34.051173092777844]
We propose clean-image backdoor attacks which uncover that backdoors can still be injected via a fraction of incorrect labels.
In our attacks, the attacker first seeks a trigger feature to divide the training images into two parts.
The backdoor will be finally implanted into the target model after it is trained on the poisoned data.
arXiv Detail & Related papers (2024-03-22T07:47:13Z) - Elijah: Eliminating Backdoors Injected in Diffusion Models via
Distribution Shift [86.92048184556936]
We propose the first backdoor detection and removal framework for DMs.
We evaluate our framework Elijah on hundreds of DMs of 3 types including DDPM, NCSN and LDM.
Our approach can have close to 100% detection accuracy and reduce the backdoor effects to close to zero without significantly sacrificing the model utility.
arXiv Detail & Related papers (2023-11-27T23:58:56Z) - INK: Inheritable Natural Backdoor Attack Against Model Distillation [8.937026844871074]
We introduce INK, an inheritable natural backdoor attack that targets model distillation.
INK employs image variance as a backdoor trigger and enables both clean-image and clean-label attacks.
For instance, INK maintains an attack success rate of over 98% post-distillation, compared to an average success rate of 1.4% for existing methods.
arXiv Detail & Related papers (2023-04-21T14:35:47Z) - Trap and Replace: Defending Backdoor Attacks by Trapping Them into an
Easy-to-Replace Subnetwork [105.0735256031911]
Deep neural networks (DNNs) are vulnerable to backdoor attacks.
We propose a brand-new backdoor defense strategy, which makes it much easier to remove the harmful influence of backdoor samples.
We evaluate our method against ten different backdoor attacks.
arXiv Detail & Related papers (2022-10-12T17:24:01Z) - Enhancing Clean Label Backdoor Attack with Two-phase Specific Triggers [6.772389744240447]
We propose a two-phase and image-specific triggers generation method to enhance clean-label backdoor attacks.
Our approach can achieve a fantastic attack success rate(98.98%) with low poisoning rate, high stealthiness under many evaluation metrics and is resistant to backdoor defense methods.
arXiv Detail & Related papers (2022-06-10T05:34:06Z) - Narcissus: A Practical Clean-Label Backdoor Attack with Limited
Information [22.98039177091884]
"Clean-label" backdoor attacks require knowledge of the entire training set to be effective.
This paper provides an algorithm to mount clean-label backdoor attacks based only on the knowledge of representative examples from the target class.
Our attack works well across datasets and models, even when the trigger presents in the physical world.
arXiv Detail & Related papers (2022-04-11T16:58:04Z) - Backdoor Attack on Hash-based Image Retrieval via Clean-label Data
Poisoning [54.15013757920703]
We propose the confusing perturbations-induced backdoor attack (CIBA)
It injects a small number of poisoned images with the correct label into the training data.
We have conducted extensive experiments to verify the effectiveness of our proposed CIBA.
arXiv Detail & Related papers (2021-09-18T07:56:59Z) - Poisoning and Backdooring Contrastive Learning [26.093821359987224]
Contrastive learning methods like CLIP train on noisy and uncurated datasets.
We show that this practice makes backdoor and poisoning attacks a significant threat.
arXiv Detail & Related papers (2021-06-17T17:20:45Z) - Hidden Backdoor Attack against Semantic Segmentation Models [60.0327238844584]
The emphbackdoor attack intends to embed hidden backdoors in deep neural networks (DNNs) by poisoning training data.
We propose a novel attack paradigm, the emphfine-grained attack, where we treat the target label from the object-level instead of the image-level.
Experiments show that the proposed methods can successfully attack semantic segmentation models by poisoning only a small proportion of training data.
arXiv Detail & Related papers (2021-03-06T05:50:29Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.