Related papers: Erasing Self-Supervised Learning Backdoor by Cluster Activation Masking

Erasing Self-Supervised Learning Backdoor by Cluster Activation Masking

URL: http://arxiv.org/abs/2312.07955v1
Date: Wed, 13 Dec 2023 08:01:15 GMT
Title: Erasing Self-Supervised Learning Backdoor by Cluster Activation Masking
Authors: Shengsheng Qian, Yifei Wang, Dizhan Xue, Shengjie Zhang, Huaiwen Zhang, Changsheng Xu
Abstract summary: Self-Supervised Learning (SSL) is vulnerable to backdoor attacks. In this paper, we propose to erase the SSL backdoor by cluster activation masking and propose a novel PoisonCAM method.
Score: 69.34631376261102
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Researchers have recently found that Self-Supervised Learning (SSL) is vulnerable to backdoor attacks. The attacker can embed hidden SSL backdoors via a few poisoned examples in the training dataset and maliciously manipulate the behavior of downstream models. To defend against SSL backdoor attacks, a feasible route is to detect and remove the poisonous samples in the training set. However, the existing SSL backdoor defense method fails to detect the poisonous samples precisely. In this paper, we propose to erase the SSL backdoor by cluster activation masking and propose a novel PoisonCAM method. After obtaining the threat model trained on the poisoned dataset, our method can precisely detect poisonous samples based on the assumption that masking the backdoor trigger can effectively change the activation of a downstream clustering model. In experiments, our PoisonCAM achieves 96% accuracy for backdoor trigger detection compared to 3% of the state-of-the-art method on poisoned ImageNet-100. Moreover, our proposed PoisonCAM significantly improves the performance of the trained SSL model under backdoor attacks compared to the state-of-the-art method. Our code will be available at https://github.com/LivXue/PoisonCAM.

Related papers

Filter, Obstruct and Dilute: Defending Against Backdoor Attacks on Semi-Supervised Learning [29.65600202138321]
Recent studies have verified that semi-supervised learning (SSL) is vulnerable to data poisoning backdoor attacks. This work aims to protect SSL against such risks, marking it as one of the few known efforts in this area.
arXiv Detail & Related papers (2025-02-09T03:22:15Z)
PBP: Post-training Backdoor Purification for Malware Classifiers [5.112004957241861]
In recent years, the rise of machine learning (ML) in cybersecurity has brought new challenges, including the increasing threat of backdoor poisoning attacks. Here, we introduce PBP, a post-training defense for malware classifiers that mitigates various types of backdoor embeddings without assuming any specific backdoor embedding mechanism. Our method demonstrates substantial advantages over several state-of-the-art methods, as evidenced by experiments on two datasets, two types of backdoor methods, and various attack configurations.
arXiv Detail & Related papers (2024-12-04T16:30:03Z)
Invisible Backdoor Attack against Self-supervised Learning [31.813240503037132]
Self-supervised learning (SSL) models are vulnerable to backdoor attacks. This paper proposes an imperceptible and effective backdoor attack against self-supervised models.
arXiv Detail & Related papers (2024-05-23T15:08:31Z)
Towards Adversarial Robustness And Backdoor Mitigation in SSL [0.562479170374811]
Self-Supervised Learning (SSL) has shown great promise in learning representations from unlabeled data. SSL methods have recently been shown to be vulnerable to backdoor attacks. This work aims to address defending against backdoor attacks in SSL.
arXiv Detail & Related papers (2024-03-23T19:21:31Z)
Does Few-shot Learning Suffer from Backdoor Attacks? [63.9864247424967]
We show that few-shot learning can still be vulnerable to backdoor attacks. Our method demonstrates a high Attack Success Rate (ASR) in FSL tasks with different few-shot learning paradigms. This study reveals that few-shot learning still suffers from backdoor attacks, and its security should be given attention.
arXiv Detail & Related papers (2023-12-31T06:43:36Z)
ASSET: Robust Backdoor Data Detection Across a Multiplicity of Deep Learning Paradigms [39.753721029332326]
Backdoor data detection is traditionally studied in an end-to-end supervised learning (SL) setting. Recent years have seen the proliferating adoption of self-supervised learning (SSL) and transfer learning (TL) due to their lesser need for labeled data. We show that the performance of most existing detection methods varies significantly across different attacks and poison ratios, and all fail on the state-of-the-art clean-label attack.
arXiv Detail & Related papers (2023-02-22T14:43:33Z)
Untargeted Backdoor Attack against Object Detection [69.63097724439886]
We design a poison-only backdoor attack in an untargeted manner, based on task characteristics. We show that, once the backdoor is embedded into the target model by our attack, it can trick the model to lose detection of any object stamped with our trigger patterns.
arXiv Detail & Related papers (2022-11-02T17:05:45Z)
An Embarrassingly Simple Backdoor Attack on Self-supervised Learning [52.28670953101126]
Self-supervised learning (SSL) is capable of learning high-quality representations of complex data without relying on labels. We study the inherent vulnerability of SSL to backdoor attacks.
arXiv Detail & Related papers (2022-10-13T20:39:21Z)
Black-box Detection of Backdoor Attacks with Limited Information and Data [56.0735480850555]
We propose a black-box backdoor detection (B3D) method to identify backdoor attacks with only query access to the model. In addition to backdoor detection, we also propose a simple strategy for reliable predictions using the identified backdoored models.
arXiv Detail & Related papers (2021-03-24T12:06:40Z)
Mitigating backdoor attacks in LSTM-based Text Classification Systems by Backdoor Keyword Identification [0.0]
In text classification systems, backdoors inserted in the models can cause spam or malicious speech to escape detection. In this paper, through analyzing the changes in inner LSTM neurons, we proposed a defense method called Backdoor Keyword Identification (BKI) to mitigate backdoor attacks. We evaluate our method on four different text classification datset: IMDB, DBpedia, 20 newsgroups and Reuters-21578 dataset.
arXiv Detail & Related papers (2020-07-11T09:05:16Z)
Scalable Backdoor Detection in Neural Networks [61.39635364047679]
Deep learning models are vulnerable to Trojan attacks, where an attacker can install a backdoor during training time to make the resultant model misidentify samples contaminated with a small trigger patch. We propose a novel trigger reverse-engineering based approach whose computational complexity does not scale with the number of labels, and is based on a measure that is both interpretable and universal across different network and patch types. In experiments, we observe that our method achieves a perfect score in separating Trojaned models from pure models, which is an improvement over the current state-of-the art method.
arXiv Detail & Related papers (2020-06-10T04:12:53Z)

This list is automatically generated from the titles and abstracts of the papers in this site.