SSL-Cleanse: Trojan Detection and Mitigation in Self-Supervised Learning
- URL: http://arxiv.org/abs/2303.09079v3
- Date: Tue, 16 Jul 2024 23:07:24 GMT
- Title: SSL-Cleanse: Trojan Detection and Mitigation in Self-Supervised Learning
- Authors: Mengxin Zheng, Jiaqi Xue, Zihao Wang, Xun Chen, Qian Lou, Lei Jiang, Xiaofeng Wang,
- Abstract summary: Self-supervised learning (SSL) is a prevalent approach for encoding data representations.
Trojan attacks embedded in SSL encoders can operate covertly, spreading across multiple users and devices.
We introduce SSL-Cleanse as a solution to identify and mitigate backdoor threats in SSL encoders.
- Score: 27.68997463681079
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Self-supervised learning (SSL) is a prevalent approach for encoding data representations. Using a pre-trained SSL image encoder and subsequently training a downstream classifier, impressive performance can be achieved on various tasks with very little labeled data. The growing adoption of SSL has led to an increase in security research on SSL encoders and associated Trojan attacks. Trojan attacks embedded in SSL encoders can operate covertly, spreading across multiple users and devices. The presence of backdoor behavior in Trojaned encoders can inadvertently be inherited by downstream classifiers, making it even more difficult to detect and mitigate the threat. Although current Trojan detection methods in supervised learning can potentially safeguard SSL downstream classifiers, identifying and addressing triggers in the SSL encoder before its widespread dissemination is a challenging task. This challenge arises because downstream tasks might be unknown, dataset labels may be unavailable, and the original unlabeled training dataset might be inaccessible during Trojan detection in SSL encoders. We introduce SSL-Cleanse as a solution to identify and mitigate backdoor threats in SSL encoders. We evaluated SSL-Cleanse on various datasets using 1200 encoders, achieving an average detection success rate of 82.2% on ImageNet-100. After mitigating backdoors, on average, backdoored encoders achieve 0.3% attack success rate without great accuracy loss, proving the effectiveness of SSL-Cleanse. The source code of SSL-Cleanse is available at https://github.com/UCF-ML-Research/SSL-Cleanse.
Related papers
- How to Craft Backdoors with Unlabeled Data Alone? [54.47006163160948]
Self-supervised learning (SSL) can learn rich features in an economical and scalable way.
If the released dataset is maliciously poisoned, backdoored SSL models can behave badly when triggers are injected to test samples.
We propose two strategies for poison selection: clustering-based selection using pseudolabels, and contrastive selection derived from the mutual information principle.
arXiv Detail & Related papers (2024-04-10T02:54:18Z) - Towards Adversarial Robustness And Backdoor Mitigation in SSL [0.562479170374811]
Self-Supervised Learning (SSL) has shown great promise in learning representations from unlabeled data.
SSL methods have recently been shown to be vulnerable to backdoor attacks.
This work aims to address defending against backdoor attacks in SSL.
arXiv Detail & Related papers (2024-03-23T19:21:31Z) - Erasing Self-Supervised Learning Backdoor by Cluster Activation Masking [65.44477004525231]
Researchers have recently found that Self-Supervised Learning (SSL) is vulnerable to backdoor attacks.
In this paper, we propose to erase the SSL backdoor by cluster activation masking and propose a novel PoisonCAM method.
Our method achieves 96% accuracy for backdoor trigger detection compared to 3% of the state-of-the-art method on poisoned ImageNet-100.
arXiv Detail & Related papers (2023-12-13T08:01:15Z) - SSL-Auth: An Authentication Framework by Fragile Watermarking for
Pre-trained Encoders in Self-supervised Learning [22.64707392046704]
Self-supervised learning (SSL), a paradigm harnessing unlabeled datasets to train robust encoders, has recently witnessed substantial success.
Recent studies have shed light on vulnerabilities in pre-trained encoders, including backdoor and adversarial threats.
Safeguarding the intellectual property of encoder trainers and ensuring the trustworthiness of deployed encoders pose notable challenges in SSL.
We introduce SSL-Auth, the first authentication framework designed explicitly for pre-trained encoders.
arXiv Detail & Related papers (2023-08-09T02:54:11Z) - ESTAS: Effective and Stable Trojan Attacks in Self-supervised Encoders
with One Target Unlabelled Sample [16.460288815336902]
ESTAS achieves > 99% attacks success rate (ASR) with one target-class sample.
Compared to prior works, ESTAS attains > 30% ASR increase and > 8.3% accuracy improvement on average.
arXiv Detail & Related papers (2022-11-20T08:58:34Z) - An Embarrassingly Simple Backdoor Attack on Self-supervised Learning [52.28670953101126]
Self-supervised learning (SSL) is capable of learning high-quality representations of complex data without relying on labels.
We study the inherent vulnerability of SSL to backdoor attacks.
arXiv Detail & Related papers (2022-10-13T20:39:21Z) - SSL-WM: A Black-Box Watermarking Approach for Encoders Pre-trained by
Self-supervised Learning [21.36798280084255]
Self-Supervised Learning (SSL) has been widely utilized to facilitate various downstream tasks in Computer Vision (CV) and Natural Language Processing (NLP) domains.
attackers may steal such SSL models and commercialize them for profit, making it crucial to verify the ownership of the SSL models.
Most existing ownership protection solutions (e.g., backdoor-based watermarks) are designed for supervised learning models.
We propose a novel black-box watermarking solution, named SSL-WM, for verifying the ownership of SSL models.
arXiv Detail & Related papers (2022-09-08T05:02:11Z) - OpenLDN: Learning to Discover Novel Classes for Open-World
Semi-Supervised Learning [110.40285771431687]
Semi-supervised learning (SSL) is one of the dominant approaches to address the annotation bottleneck of supervised learning.
Recent SSL methods can effectively leverage a large repository of unlabeled data to improve performance while relying on a small set of labeled data.
This work introduces OpenLDN that utilizes a pairwise similarity loss to discover novel classes.
arXiv Detail & Related papers (2022-07-05T18:51:05Z) - DATA: Domain-Aware and Task-Aware Pre-training [94.62676913928831]
We present DATA, a simple yet effective NAS approach specialized for self-supervised learning (SSL)
Our method achieves promising results across a wide range of computation costs on downstream tasks, including image classification, object detection and semantic segmentation.
arXiv Detail & Related papers (2022-03-17T02:38:49Z) - Robust Deep Semi-Supervised Learning: A Brief Introduction [63.09703308309176]
Semi-supervised learning (SSL) aims to improve learning performance by leveraging unlabeled data when labels are insufficient.
SSL with deep models has proven to be successful on standard benchmark tasks.
However, they are still vulnerable to various robustness threats in real-world applications.
arXiv Detail & Related papers (2022-02-12T04:16:41Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.