Erasing Self-Supervised Learning Backdoor by Cluster Activation Masking
- URL: http://arxiv.org/abs/2312.07955v1
- Date: Wed, 13 Dec 2023 08:01:15 GMT
- Title: Erasing Self-Supervised Learning Backdoor by Cluster Activation Masking
- Authors: Shengsheng Qian, Yifei Wang, Dizhan Xue, Shengjie Zhang, Huaiwen
Zhang, Changsheng Xu
- Abstract summary: Self-Supervised Learning (SSL) is vulnerable to backdoor attacks.
In this paper, we propose to erase the SSL backdoor by cluster activation masking and propose a novel PoisonCAM method.
- Score: 69.34631376261102
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Researchers have recently found that Self-Supervised Learning (SSL) is
vulnerable to backdoor attacks. The attacker can embed hidden SSL backdoors via
a few poisoned examples in the training dataset and maliciously manipulate the
behavior of downstream models. To defend against SSL backdoor attacks, a
feasible route is to detect and remove the poisonous samples in the training
set. However, the existing SSL backdoor defense method fails to detect the
poisonous samples precisely. In this paper, we propose to erase the SSL
backdoor by cluster activation masking and propose a novel PoisonCAM method.
After obtaining the threat model trained on the poisoned dataset, our method
can precisely detect poisonous samples based on the assumption that masking the
backdoor trigger can effectively change the activation of a downstream
clustering model. In experiments, our PoisonCAM achieves 96% accuracy for
backdoor trigger detection compared to 3% of the state-of-the-art method on
poisoned ImageNet-100. Moreover, our proposed PoisonCAM significantly improves
the performance of the trained SSL model under backdoor attacks compared to the
state-of-the-art method. Our code will be available at
https://github.com/LivXue/PoisonCAM.
Related papers
- SEEP: Training Dynamics Grounds Latent Representation Search for Mitigating Backdoor Poisoning Attacks [53.28390057407576]
Modern NLP models are often trained on public datasets drawn from diverse sources.
Data poisoning attacks can manipulate the model's behavior in ways engineered by the attacker.
Several strategies have been proposed to mitigate the risks associated with backdoor attacks.
arXiv Detail & Related papers (2024-05-19T14:50:09Z) - Does Few-shot Learning Suffer from Backdoor Attacks? [63.9864247424967]
We show that few-shot learning can still be vulnerable to backdoor attacks.
Our method demonstrates a high Attack Success Rate (ASR) in FSL tasks with different few-shot learning paradigms.
This study reveals that few-shot learning still suffers from backdoor attacks, and its security should be given attention.
arXiv Detail & Related papers (2023-12-31T06:43:36Z) - Backdoor Defense via Deconfounded Representation Learning [17.28760299048368]
We propose a Causality-inspired Backdoor Defense (CBD) to learn deconfounded representations for reliable classification.
CBD is effective in reducing backdoor threats while maintaining high accuracy in predicting benign samples.
arXiv Detail & Related papers (2023-03-13T02:25:59Z) - Untargeted Backdoor Attack against Object Detection [69.63097724439886]
We design a poison-only backdoor attack in an untargeted manner, based on task characteristics.
We show that, once the backdoor is embedded into the target model by our attack, it can trick the model to lose detection of any object stamped with our trigger patterns.
arXiv Detail & Related papers (2022-11-02T17:05:45Z) - An Embarrassingly Simple Backdoor Attack on Self-supervised Learning [52.28670953101126]
Self-supervised learning (SSL) is capable of learning high-quality representations of complex data without relying on labels.
We study the inherent vulnerability of SSL to backdoor attacks.
arXiv Detail & Related papers (2022-10-13T20:39:21Z) - Black-box Detection of Backdoor Attacks with Limited Information and
Data [56.0735480850555]
We propose a black-box backdoor detection (B3D) method to identify backdoor attacks with only query access to the model.
In addition to backdoor detection, we also propose a simple strategy for reliable predictions using the identified backdoored models.
arXiv Detail & Related papers (2021-03-24T12:06:40Z) - Mitigating backdoor attacks in LSTM-based Text Classification Systems by
Backdoor Keyword Identification [0.0]
In text classification systems, backdoors inserted in the models can cause spam or malicious speech to escape detection.
In this paper, through analyzing the changes in inner LSTM neurons, we proposed a defense method called Backdoor Keyword Identification (BKI) to mitigate backdoor attacks.
We evaluate our method on four different text classification datset: IMDB, DBpedia, 20 newsgroups and Reuters-21578 dataset.
arXiv Detail & Related papers (2020-07-11T09:05:16Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.