DeDe: Detecting Backdoor Samples for SSL Encoders via Decoders
- URL: http://arxiv.org/abs/2411.16154v1
- Date: Mon, 25 Nov 2024 07:26:22 GMT
- Title: DeDe: Detecting Backdoor Samples for SSL Encoders via Decoders
- Authors: Sizai Hou, Songze Li, Duanyi Yao,
- Abstract summary: Self-supervised learning (SSL) is pervasively exploited in training high-quality upstream encoders with a large amount of unlabeled data.
backdoor attacks merely via polluting a small portion of training data.
We propose a novel detection mechanism, DeDe, which detects the activation of the backdoor mapping with the cooccurrence of victim encoder and trigger inputs.
- Score: 6.698677477097004
- License:
- Abstract: Self-supervised learning (SSL) is pervasively exploited in training high-quality upstream encoders with a large amount of unlabeled data. However, it is found to be susceptible to backdoor attacks merely via polluting a small portion of training data. The victim encoders mismatch triggered inputs with target embeddings, e.g., match the triggered cat input to an airplane embedding, such that the downstream tasks are affected to misbehave when the trigger is activated. Emerging backdoor attacks have shown great threats in different SSL paradigms such as contrastive learning and CLIP, while few research is devoted to defending against such attacks. Besides, the existing ones fall short in detecting advanced stealthy backdoors. To address the limitations, we propose a novel detection mechanism, DeDe, which detects the activation of the backdoor mapping with the cooccurrence of victim encoder and trigger inputs. Specifically, DeDe trains a decoder for the SSL encoder on an auxiliary dataset (can be out-of-distribution or even slightly poisoned), such that for any triggered input that misleads to the target embedding, the decoder outputs an image significantly different from the input. We empirically evaluate DeDe on both contrastive learning and CLIP models against various types of backdoor attacks, and demonstrate its superior performance over SOTA detection methods in both upstream detection performance and ability of preventing backdoors in downstream tasks.
Related papers
- Pre-trained Encoder Inference: Revealing Upstream Encoders In Downstream Machine Learning Services [10.367966878807714]
Pre-trained encoders can be easily accessed online to build downstream machine learning (ML) services quickly.
This paper unveils a new vulnerability: the Pre-trained Inference (PEI) attack, which posts privacy threats toward encoders hidden behind downstream ML services.
arXiv Detail & Related papers (2024-08-05T20:27:54Z) - EmInspector: Combating Backdoor Attacks in Federated Self-Supervised Learning Through Embedding Inspection [53.25863925815954]
Federated self-supervised learning (FSSL) has emerged as a promising paradigm that enables the exploitation of clients' vast amounts of unlabeled data.
While FSSL offers advantages, its susceptibility to backdoor attacks has not been investigated.
We propose the Embedding Inspector (EmInspector) that detects malicious clients by inspecting the embedding space of local models.
arXiv Detail & Related papers (2024-05-21T06:14:49Z) - SSL-OTA: Unveiling Backdoor Threats in Self-Supervised Learning for Object Detection [8.178238811631093]
We propose the first backdoor attack designed for object detection tasks in SSL scenarios, called Object Transform Attack (SSL-OTA)
SSL-OTA employs a trigger capable of altering predictions of the target object to the desired category.
We conduct extensive experiments on benchmark datasets, demonstrating the effectiveness of our proposed attack and its resistance to potential defenses.
arXiv Detail & Related papers (2023-12-30T04:21:12Z) - Erasing Self-Supervised Learning Backdoor by Cluster Activation Masking [65.44477004525231]
Researchers have recently found that Self-Supervised Learning (SSL) is vulnerable to backdoor attacks.
In this paper, we propose to erase the SSL backdoor by cluster activation masking and propose a novel PoisonCAM method.
Our method achieves 96% accuracy for backdoor trigger detection compared to 3% of the state-of-the-art method on poisoned ImageNet-100.
arXiv Detail & Related papers (2023-12-13T08:01:15Z) - GhostEncoder: Stealthy Backdoor Attacks with Dynamic Triggers to
Pre-trained Encoders in Self-supervised Learning [15.314217530697928]
Self-supervised learning (SSL) pertains to training pre-trained image encoders utilizing a substantial quantity of unlabeled images.
We propose GhostEncoder, the first dynamic invisible backdoor attack on SSL.
arXiv Detail & Related papers (2023-10-01T09:39:27Z) - Downstream-agnostic Adversarial Examples [66.8606539786026]
AdvEncoder is first framework for generating downstream-agnostic universal adversarial examples based on pre-trained encoder.
Unlike traditional adversarial example works, the pre-trained encoder only outputs feature vectors rather than classification labels.
Our results show that an attacker can successfully attack downstream tasks without knowing either the pre-training dataset or the downstream dataset.
arXiv Detail & Related papers (2023-07-23T10:16:47Z) - CleanCLIP: Mitigating Data Poisoning Attacks in Multimodal Contrastive
Learning [63.72975421109622]
CleanCLIP is a finetuning framework that weakens the learned spurious associations introduced by backdoor attacks.
CleanCLIP maintains model performance on benign examples while erasing a range of backdoor attacks on multimodal contrastive learning.
arXiv Detail & Related papers (2023-03-06T17:48:32Z) - Vulnerabilities of Deep Learning-Driven Semantic Communications to
Backdoor (Trojan) Attacks [70.51799606279883]
This paper highlights vulnerabilities of deep learning-driven semantic communications to backdoor (Trojan) attacks.
Backdoor attack can effectively change the semantic information transferred for poisoned input samples to a target meaning.
Design guidelines are presented to preserve the meaning of transferred information in the presence of backdoor attacks.
arXiv Detail & Related papers (2022-12-21T17:22:27Z) - An Embarrassingly Simple Backdoor Attack on Self-supervised Learning [52.28670953101126]
Self-supervised learning (SSL) is capable of learning high-quality representations of complex data without relying on labels.
We study the inherent vulnerability of SSL to backdoor attacks.
arXiv Detail & Related papers (2022-10-13T20:39:21Z) - A temporal chrominance trigger for clean-label backdoor attack against
anti-spoof rebroadcast detection [41.735725886912185]
We propose a stealthy clean-label video backdoor attack against Deep Learning (DL)-based models.
The injected backdoor does not affect spoofing detection in normal conditions, but induces a misclassification in the presence of a triggering signal.
The effectiveness of the proposed backdoor attack and its generality are validated experimentally on different datasets.
arXiv Detail & Related papers (2022-06-02T15:30:42Z) - Backdoor Smoothing: Demystifying Backdoor Attacks on Deep Neural
Networks [25.23881974235643]
We show that backdoor attacks induce a smoother decision function around the triggered samples -- a phenomenon which we refer to as textitbackdoor smoothing.
Our experiments show that smoothness increases when the trigger is added to the input samples, and that this phenomenon is more pronounced for more successful attacks.
arXiv Detail & Related papers (2020-06-11T18:28:54Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.