An Embarrassingly Simple Backdoor Attack on Self-supervised Learning
- URL: http://arxiv.org/abs/2210.07346v2
- Date: Mon, 14 Aug 2023 01:07:38 GMT
- Title: An Embarrassingly Simple Backdoor Attack on Self-supervised Learning
- Authors: Changjiang Li, Ren Pang, Zhaohan Xi, Tianyu Du, Shouling Ji, Yuan Yao,
Ting Wang
- Abstract summary: Self-supervised learning (SSL) is capable of learning high-quality representations of complex data without relying on labels.
We study the inherent vulnerability of SSL to backdoor attacks.
- Score: 52.28670953101126
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: As a new paradigm in machine learning, self-supervised learning (SSL) is
capable of learning high-quality representations of complex data without
relying on labels. In addition to eliminating the need for labeled data,
research has found that SSL improves the adversarial robustness over supervised
learning since lacking labels makes it more challenging for adversaries to
manipulate model predictions. However, the extent to which this robustness
superiority generalizes to other types of attacks remains an open question.
We explore this question in the context of backdoor attacks. Specifically, we
design and evaluate CTRL, an embarrassingly simple yet highly effective
self-supervised backdoor attack. By only polluting a tiny fraction of training
data (<= 1%) with indistinguishable poisoning samples, CTRL causes any
trigger-embedded input to be misclassified to the adversary's designated class
with a high probability (>= 99%) at inference time. Our findings suggest that
SSL and supervised learning are comparably vulnerable to backdoor attacks. More
importantly, through the lens of CTRL, we study the inherent vulnerability of
SSL to backdoor attacks. With both empirical and analytical evidence, we reveal
that the representation invariance property of SSL, which benefits adversarial
robustness, may also be the very reason making \ssl highly susceptible to
backdoor attacks. Our findings also imply that the existing defenses against
supervised backdoor attacks are not easily retrofitted to the unique
vulnerability of SSL.
Related papers
- EmInspector: Combating Backdoor Attacks in Federated Self-Supervised Learning Through Embedding Inspection [53.25863925815954]
Federated self-supervised learning (FSSL) has emerged as a promising paradigm that enables the exploitation of clients' vast amounts of unlabeled data.
While FSSL offers advantages, its susceptibility to backdoor attacks has not been investigated.
We propose the Embedding Inspector (EmInspector) that detects malicious clients by inspecting the embedding space of local models.
arXiv Detail & Related papers (2024-05-21T06:14:49Z) - How to Craft Backdoors with Unlabeled Data Alone? [54.47006163160948]
Self-supervised learning (SSL) can learn rich features in an economical and scalable way.
If the released dataset is maliciously poisoned, backdoored SSL models can behave badly when triggers are injected to test samples.
We propose two strategies for poison selection: clustering-based selection using pseudolabels, and contrastive selection derived from the mutual information principle.
arXiv Detail & Related papers (2024-04-10T02:54:18Z) - An Embarrassingly Simple Defense Against Backdoor Attacks On SSL [0.0]
Self Supervised Learning (SSL) has emerged as a powerful paradigm to tackle data landscapes with absence of human supervision.
Recent work indicates SSL to be vulnerable to backdoor attacks, wherein models can be controlled, possibly maliciously, to suit an adversary's motives.
We devise two defense strategies against frequency-based attacks in SSL.
arXiv Detail & Related papers (2024-03-23T19:21:31Z) - Does Few-shot Learning Suffer from Backdoor Attacks? [63.9864247424967]
We show that few-shot learning can still be vulnerable to backdoor attacks.
Our method demonstrates a high Attack Success Rate (ASR) in FSL tasks with different few-shot learning paradigms.
This study reveals that few-shot learning still suffers from backdoor attacks, and its security should be given attention.
arXiv Detail & Related papers (2023-12-31T06:43:36Z) - Erasing Self-Supervised Learning Backdoor by Cluster Activation Masking [69.34631376261102]
Self-Supervised Learning (SSL) is vulnerable to backdoor attacks.
In this paper, we propose to erase the SSL backdoor by cluster activation masking and propose a novel PoisonCAM method.
arXiv Detail & Related papers (2023-12-13T08:01:15Z) - Rethinking Backdoor Attacks [122.1008188058615]
In a backdoor attack, an adversary inserts maliciously constructed backdoor examples into a training set to make the resulting model vulnerable to manipulation.
Defending against such attacks typically involves viewing these inserted examples as outliers in the training set and using techniques from robust statistics to detect and remove them.
We show that without structural information about the training data distribution, backdoor attacks are indistinguishable from naturally-occurring features in the data.
arXiv Detail & Related papers (2023-07-19T17:44:54Z) - Narcissus: A Practical Clean-Label Backdoor Attack with Limited
Information [22.98039177091884]
"Clean-label" backdoor attacks require knowledge of the entire training set to be effective.
This paper provides an algorithm to mount clean-label backdoor attacks based only on the knowledge of representative examples from the target class.
Our attack works well across datasets and models, even when the trigger presents in the physical world.
arXiv Detail & Related papers (2022-04-11T16:58:04Z) - Targeted Forgetting and False Memory Formation in Continual Learners
through Adversarial Backdoor Attacks [2.830541450812474]
We explore the vulnerability of Elastic Weight Consolidation (EWC), a popular continual learning algorithm for avoiding catastrophic forgetting.
We show that an intelligent adversary can bypass the EWC's defenses, and instead cause gradual and deliberate forgetting by introducing small amounts of misinformation to the model during training.
We demonstrate such an adversary's ability to assume control of the model via injection of "backdoor" attack samples on both permuted and split benchmark variants of the MNIST dataset.
arXiv Detail & Related papers (2020-02-17T18:13:09Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.