Related papers: Backdoor Attacks on Self-Supervised Learning

Backdoor Attacks on Self-Supervised Learning

URL: http://arxiv.org/abs/2105.10123v1
Date: Fri, 21 May 2021 04:22:05 GMT
Title: Backdoor Attacks on Self-Supervised Learning
Authors: Aniruddha Saha, Ajinkya Tejankar, Soroush Abbasi Koohpayegani, Hamed Pirsiavash
Abstract summary: We show that self-supervised learning methods are vulnerable to backdoor attacks. An attacker poisons a part of the unlabeled data by adding a small trigger (known to the attacker) to the images. We propose a knowledge distillation based defense algorithm that succeeds in neutralizing the attack.
Score: 22.24046752858929
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Large-scale unlabeled data has allowed recent progress in self-supervised learning methods that learn rich visual representations. State-of-the-art self-supervised methods for learning representations from images (MoCo and BYOL) use an inductive bias that different augmentations (e.g. random crops) of an image should produce similar embeddings. We show that such methods are vulnerable to backdoor attacks where an attacker poisons a part of the unlabeled data by adding a small trigger (known to the attacker) to the images. The model performance is good on clean test images but the attacker can manipulate the decision of the model by showing the trigger at test time. Backdoor attacks have been studied extensively in supervised learning and to the best of our knowledge, we are the first to study them for self-supervised learning. Backdoor attacks are more practical in self-supervised learning since the unlabeled data is large and as a result, an inspection of the data to avoid the presence of poisoned data is prohibitive. We show that in our targeted attack, the attacker can produce many false positives for the target category by using the trigger at test time. We also propose a knowledge distillation based defense algorithm that succeeds in neutralizing the attack. Our code is available here: https://github.com/UMBCvision/SSL-Backdoor .

Related papers

Towards Imperceptible Backdoor Attack in Self-supervised Learning [34.107940147916835]
Self-supervised learning models are vulnerable to backdoor attacks. Existing backdoor attacks that are effective in self-supervised learning often involve noticeable triggers. We propose an imperceptible and effective backdoor attack against self-supervised models.
arXiv Detail & Related papers (2024-05-23T15:08:31Z)
Clean-image Backdoor Attacks [34.051173092777844]
We propose clean-image backdoor attacks which uncover that backdoors can still be injected via a fraction of incorrect labels. In our attacks, the attacker first seeks a trigger feature to divide the training images into two parts. The backdoor will be finally implanted into the target model after it is trained on the poisoned data.
arXiv Detail & Related papers (2024-03-22T07:47:13Z)
Object-oriented backdoor attack against image captioning [40.5688859498834]
Backdoor attack against image classification task has been widely studied and proven to be successful. In this paper, we explore backdoor attack towards image captioning models by poisoning training data. Our method proves the weakness of image captioning models to backdoor attack and we hope this work can raise the awareness of defending against backdoor attack in the image captioning field.
arXiv Detail & Related papers (2024-01-05T01:52:13Z)
Does Few-shot Learning Suffer from Backdoor Attacks? [63.9864247424967]
We show that few-shot learning can still be vulnerable to backdoor attacks. Our method demonstrates a high Attack Success Rate (ASR) in FSL tasks with different few-shot learning paradigms. This study reveals that few-shot learning still suffers from backdoor attacks, and its security should be given attention.
arXiv Detail & Related papers (2023-12-31T06:43:36Z)
Rethinking Backdoor Attacks [122.1008188058615]
In a backdoor attack, an adversary inserts maliciously constructed backdoor examples into a training set to make the resulting model vulnerable to manipulation. Defending against such attacks typically involves viewing these inserted examples as outliers in the training set and using techniques from robust statistics to detect and remove them. We show that without structural information about the training data distribution, backdoor attacks are indistinguishable from naturally-occurring features in the data.
arXiv Detail & Related papers (2023-07-19T17:44:54Z)
Backdoor Attack with Sparse and Invisible Trigger [57.41876708712008]
Deep neural networks (DNNs) are vulnerable to backdoor attacks. backdoor attack is an emerging yet threatening training-phase threat. We propose a sparse and invisible backdoor attack (SIBA)
arXiv Detail & Related papers (2023-05-11T10:05:57Z)
Untargeted Backdoor Attack against Object Detection [69.63097724439886]
We design a poison-only backdoor attack in an untargeted manner, based on task characteristics. We show that, once the backdoor is embedded into the target model by our attack, it can trick the model to lose detection of any object stamped with our trigger patterns.
arXiv Detail & Related papers (2022-11-02T17:05:45Z)
Just Rotate it: Deploying Backdoor Attacks via Rotation Transformation [48.238349062995916]
We find that highly effective backdoors can be easily inserted using rotation-based image transformation. Our work highlights a new, simple, physically realizable, and highly effective vector for backdoor attacks.
arXiv Detail & Related papers (2022-07-22T00:21:18Z)
Invisible Backdoor Attacks Using Data Poisoning in the Frequency Domain [8.64369418938889]
We propose a generalized backdoor attack method based on the frequency domain. It can implement backdoor implantation without mislabeling and accessing the training process. We evaluate our approach in the no-label and clean-label cases on three datasets.
arXiv Detail & Related papers (2022-07-09T07:05:53Z)
Narcissus: A Practical Clean-Label Backdoor Attack with Limited Information [22.98039177091884]
"Clean-label" backdoor attacks require knowledge of the entire training set to be effective. This paper provides an algorithm to mount clean-label backdoor attacks based only on the knowledge of representative examples from the target class. Our attack works well across datasets and models, even when the trigger presents in the physical world.
arXiv Detail & Related papers (2022-04-11T16:58:04Z)
Clean-Label Backdoor Attacks on Video Recognition Models [87.46539956587908]
We show that image backdoor attacks are far less effective on videos. We propose the use of a universal adversarial trigger as the backdoor trigger to attack video recognition models. Our proposed backdoor attack is resistant to state-of-the-art backdoor defense/detection methods.
arXiv Detail & Related papers (2020-03-06T04:51:48Z)

This list is automatically generated from the titles and abstracts of the papers in this site.