DeDe: Detecting Backdoor Samples for SSL Encoders via Decoders
- URL: http://arxiv.org/abs/2411.16154v2
- Date: Thu, 20 Mar 2025 07:05:27 GMT
- Title: DeDe: Detecting Backdoor Samples for SSL Encoders via Decoders
- Authors: Sizai Hou, Songze Li, Duanyi Yao,
- Abstract summary: Self-supervised learning (SSL) is pervasively exploited in training high-quality upstream encoders with a large amount of unlabeled data.<n>Victim encoders associate triggered inputs with target embeddings, such that the downstream tasks inherit unintended behaviors when the trigger is activated.<n>We propose a novel detection mechanism, DeDe, which detects the activation of backdoor mappings caused by triggered inputs on victim encoders.
- Score: 6.698677477097004
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Self-supervised learning (SSL) is pervasively exploited in training high-quality upstream encoders with a large amount of unlabeled data. However, it is found to be susceptible to backdoor attacks merely via polluting a small portion of training data. The victim encoders associate triggered inputs with target embeddings, e.g., mapping a triggered cat image to an airplane embedding, such that the downstream tasks inherit unintended behaviors when the trigger is activated. Emerging backdoor attacks have shown great threats across different SSL paradigms such as contrastive learning and CLIP, yet limited research is devoted to defending against such attacks, and existing defenses fall short in detecting advanced stealthy backdoors. To address the limitations, we propose a novel detection mechanism, DeDe, which detects the activation of backdoor mappings caused by triggered inputs on victim encoders. Specifically, DeDe trains a decoder for any given SSL encoder using an auxiliary dataset (which can be out-of-distribution or even slightly poisoned), so that for any triggered input that misleads the encoder into the target embedding, the decoder generates an output image significantly different from the input. DeDe leverages the discrepancy between the input and the decoded output to identify potential backdoor misbehavior during inference. We empirically evaluate DeDe on both contrastive learning and CLIP models against various types of backdoor attacks. Our results demonstrate promising detection effectiveness over various advanced attacks and superior performance compared over state-of-the-art detection methods.
Related papers
- Fooling the Decoder: An Adversarial Attack on Quantum Error Correction [49.48516314472825]
In this work, we target a basic RL surface code decoder (DeepQ) to create the first adversarial attack on quantum error correction.
We demonstrate an attack that reduces the logical qubit lifetime in memory experiments by up to five orders of magnitude.
This attack highlights the susceptibility of machine learning-based QEC and underscores the importance of further research into robust QEC methods.
arXiv Detail & Related papers (2025-04-28T10:10:05Z) - Pre-trained Encoder Inference: Revealing Upstream Encoders In Downstream Machine Learning Services [10.367966878807714]
Pre-trained encoders can be easily accessed online to build downstream machine learning (ML) services quickly.
This paper unveils a new vulnerability: the Pre-trained Inference (PEI) attack, which posts privacy threats toward encoders hidden behind downstream ML services.
arXiv Detail & Related papers (2024-08-05T20:27:54Z) - EmInspector: Combating Backdoor Attacks in Federated Self-Supervised Learning Through Embedding Inspection [53.25863925815954]
Federated self-supervised learning (FSSL) has emerged as a promising paradigm that enables the exploitation of clients' vast amounts of unlabeled data.
While FSSL offers advantages, its susceptibility to backdoor attacks has not been investigated.
We propose the Embedding Inspector (EmInspector) that detects malicious clients by inspecting the embedding space of local models.
arXiv Detail & Related papers (2024-05-21T06:14:49Z) - SSL-OTA: Unveiling Backdoor Threats in Self-Supervised Learning for Object Detection [8.178238811631093]
We propose the first backdoor attack designed for object detection tasks in SSL scenarios, called Object Transform Attack (SSL-OTA)
SSL-OTA employs a trigger capable of altering predictions of the target object to the desired category.
We conduct extensive experiments on benchmark datasets, demonstrating the effectiveness of our proposed attack and its resistance to potential defenses.
arXiv Detail & Related papers (2023-12-30T04:21:12Z) - Erasing Self-Supervised Learning Backdoor by Cluster Activation Masking [65.44477004525231]
Researchers have recently found that Self-Supervised Learning (SSL) is vulnerable to backdoor attacks.
In this paper, we propose to erase the SSL backdoor by cluster activation masking and propose a novel PoisonCAM method.
Our method achieves 96% accuracy for backdoor trigger detection compared to 3% of the state-of-the-art method on poisoned ImageNet-100.
arXiv Detail & Related papers (2023-12-13T08:01:15Z) - GhostEncoder: Stealthy Backdoor Attacks with Dynamic Triggers to
Pre-trained Encoders in Self-supervised Learning [15.314217530697928]
Self-supervised learning (SSL) pertains to training pre-trained image encoders utilizing a substantial quantity of unlabeled images.
We propose GhostEncoder, the first dynamic invisible backdoor attack on SSL.
arXiv Detail & Related papers (2023-10-01T09:39:27Z) - Downstream-agnostic Adversarial Examples [66.8606539786026]
AdvEncoder is first framework for generating downstream-agnostic universal adversarial examples based on pre-trained encoder.
Unlike traditional adversarial example works, the pre-trained encoder only outputs feature vectors rather than classification labels.
Our results show that an attacker can successfully attack downstream tasks without knowing either the pre-training dataset or the downstream dataset.
arXiv Detail & Related papers (2023-07-23T10:16:47Z) - Detecting Backdoors in Pre-trained Encoders [25.105186092387633]
We propose DECREE, the first backdoor detection approach for pre-trained encoders.
We show the effectiveness of our method on image encoders pre-trained on ImageNet and OpenAI's CLIP 400 million image-text pairs.
arXiv Detail & Related papers (2023-03-23T19:04:40Z) - CleanCLIP: Mitigating Data Poisoning Attacks in Multimodal Contrastive
Learning [63.72975421109622]
CleanCLIP is a finetuning framework that weakens the learned spurious associations introduced by backdoor attacks.
CleanCLIP maintains model performance on benign examples while erasing a range of backdoor attacks on multimodal contrastive learning.
arXiv Detail & Related papers (2023-03-06T17:48:32Z) - Vulnerabilities of Deep Learning-Driven Semantic Communications to
Backdoor (Trojan) Attacks [70.51799606279883]
This paper highlights vulnerabilities of deep learning-driven semantic communications to backdoor (Trojan) attacks.
Backdoor attack can effectively change the semantic information transferred for poisoned input samples to a target meaning.
Design guidelines are presented to preserve the meaning of transferred information in the presence of backdoor attacks.
arXiv Detail & Related papers (2022-12-21T17:22:27Z) - Is Semantic Communications Secure? A Tale of Multi-Domain Adversarial
Attacks [70.51799606279883]
We introduce test-time adversarial attacks on deep neural networks (DNNs) for semantic communications.
We show that it is possible to change the semantics of the transferred information even when the reconstruction loss remains low.
arXiv Detail & Related papers (2022-12-20T17:13:22Z) - An Embarrassingly Simple Backdoor Attack on Self-supervised Learning [52.28670953101126]
Self-supervised learning (SSL) is capable of learning high-quality representations of complex data without relying on labels.
We study the inherent vulnerability of SSL to backdoor attacks.
arXiv Detail & Related papers (2022-10-13T20:39:21Z) - A temporal chrominance trigger for clean-label backdoor attack against
anti-spoof rebroadcast detection [41.735725886912185]
We propose a stealthy clean-label video backdoor attack against Deep Learning (DL)-based models.
The injected backdoor does not affect spoofing detection in normal conditions, but induces a misclassification in the presence of a triggering signal.
The effectiveness of the proposed backdoor attack and its generality are validated experimentally on different datasets.
arXiv Detail & Related papers (2022-06-02T15:30:42Z) - PoisonedEncoder: Poisoning the Unlabeled Pre-training Data in
Contrastive Learning [69.70602220716718]
We propose PoisonedEncoder, a data poisoning attack to contrastive learning.
In particular, an attacker injects carefully crafted poisoning inputs into the unlabeled pre-training data.
We evaluate five defenses against PoisonedEncoder, including one pre-processing, three in-processing, and one post-processing defenses.
arXiv Detail & Related papers (2022-05-13T00:15:44Z) - PiDAn: A Coherence Optimization Approach for Backdoor Attack Detection
and Mitigation in Deep Neural Networks [22.900501880865658]
Backdoor attacks impose a new threat in Deep Neural Networks (DNNs)
We propose PiDAn, an algorithm based on coherence optimization purifying the poisoned data.
Our PiDAn algorithm can detect more than 90% infected classes and identify 95% poisoned samples.
arXiv Detail & Related papers (2022-03-17T12:37:21Z) - Backdoor Smoothing: Demystifying Backdoor Attacks on Deep Neural
Networks [25.23881974235643]
We show that backdoor attacks induce a smoother decision function around the triggered samples -- a phenomenon which we refer to as textitbackdoor smoothing.
Our experiments show that smoothness increases when the trigger is added to the input samples, and that this phenomenon is more pronounced for more successful attacks.
arXiv Detail & Related papers (2020-06-11T18:28:54Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.