Backdooring Self-Supervised Contrastive Learning by Noisy Alignment
- URL: http://arxiv.org/abs/2508.14015v1
- Date: Tue, 19 Aug 2025 17:25:43 GMT
- Title: Backdooring Self-Supervised Contrastive Learning by Noisy Alignment
- Authors: Tuo Chen, Jie Gui, Minjing Dong, Ju Jia, Lanting Fang, Jian Liu,
- Abstract summary: Self-supervised contrastive learning (CL) effectively learns transferable representations from unlabeled data containing images or image-text pairs.<n>An adversary can inject poisoned images into pretraining datasets, causing compromised CL encoders to exhibit misbehavior in downstream tasks.<n>We propose Noisy Alignment (NA), a DPCL method that explicitly suppresses noise components in poisoned images.
- Score: 19.58754341352538
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Self-supervised contrastive learning (CL) effectively learns transferable representations from unlabeled data containing images or image-text pairs but suffers vulnerability to data poisoning backdoor attacks (DPCLs). An adversary can inject poisoned images into pretraining datasets, causing compromised CL encoders to exhibit targeted misbehavior in downstream tasks. Existing DPCLs, however, achieve limited efficacy due to their dependence on fragile implicit co-occurrence between backdoor and target object and inadequate suppression of discriminative features in backdoored images. We propose Noisy Alignment (NA), a DPCL method that explicitly suppresses noise components in poisoned images. Inspired by powerful training-controllable CL attacks, we identify and extract the critical objective of noisy alignment, adapting it effectively into data-poisoning scenarios. Our method implements noisy alignment by strategically manipulating contrastive learning's random cropping mechanism, formulating this process as an image layout optimization problem with theoretically derived optimal parameters. The resulting method is simple yet effective, achieving state-of-the-art performance compared to existing DPCLs, while maintaining clean-data accuracy. Furthermore, Noisy Alignment demonstrates robustness against common backdoor defenses. Codes can be found at https://github.com/jsrdcht/Noisy-Alignment.
Related papers
- Pre-training CLIP against Data Poisoning with Optimal Transport-based Matching and Alignment [65.51957843888061]
Contrastive Language-Image Pre-training models are threatened by targeted data poisoning and attacks.<n>Previous defense methods correct poisoned image-caption pairs by matching a new caption for each image.<n>We propose an Optimal Transport-based framework to reconstruct image-caption pairs, named OTCCLIP.
arXiv Detail & Related papers (2025-09-23T07:05:43Z) - PromptFlare: Prompt-Generalized Defense via Cross-Attention Decoy in Diffusion-Based Inpainting [25.24109316946351]
We propose PromptFlare, a novel adversarial protection method designed to protect images from malicious modifications facilitated by diffusion-based inpainting models.<n>Our approach exploits the intrinsic properties of prompt embeddings and injects adversarial noise to suppress the sampling process.<n>Experiments on the EditBench dataset demonstrate that our method achieves state-of-the-art performance across various metrics.
arXiv Detail & Related papers (2025-08-22T08:42:46Z) - Active Adversarial Noise Suppression for Image Forgery Localization [56.98050814363447]
We introduce an Adversarial Noise Suppression Module (ANSM) that generate a defensive perturbation to suppress the attack effect of adversarial noise.<n>To our best knowledge, this is the first report of adversarial defense in image forgery localization tasks.
arXiv Detail & Related papers (2025-06-15T14:53:27Z) - SSP-RACL: Classification of Noisy Fundus Images with Self-Supervised Pretraining and Robust Adaptive Credal Loss [3.8739860035485143]
Fundus image classification is crucial in the computer aided diagnosis tasks, but label noise significantly impairs the performance of deep neural networks.
We propose a robust framework, Self-Supervised Pre-training with Robust Adaptive Credal Loss (SSP-RACL), for handling label noise in fundus image datasets.
arXiv Detail & Related papers (2024-09-25T02:41:58Z) - Rethinking and Defending Protective Perturbation in Personalized Diffusion Models [21.30373461975769]
We study the fine-tuning process of personalized diffusion models (PDMs) through the lens of shortcut learning.
PDMs are susceptible to minor adversarial perturbations, leading to significant degradation when fine-tuned on corrupted datasets.
We propose a systematic defense framework that includes data purification and contrastive decoupling learning.
arXiv Detail & Related papers (2024-06-27T07:14:14Z) - BadCLIP: Trigger-Aware Prompt Learning for Backdoor Attacks on CLIP [55.33331463515103]
BadCLIP is built on a novel and effective mechanism in backdoor attacks on CLIP.
It consists of a learnable trigger applied to images and a trigger-aware context generator, such that the trigger can change text features via trigger-aware prompts.
arXiv Detail & Related papers (2023-11-26T14:24:13Z) - Robust Contrastive Language-Image Pre-training against Data Poisoning
and Backdoor Attacks [52.26631767748843]
We propose ROCLIP, the first effective method for robust pre-training multimodal vision-language models against targeted data poisoning and backdoor attacks.
ROCLIP effectively breaks the association between poisoned image-caption pairs by considering a relatively large and varying pool of random captions.
Our experiments show that ROCLIP renders state-of-the-art targeted data poisoning and backdoor attacks ineffective during pre-training CLIP models.
arXiv Detail & Related papers (2023-03-13T04:49:46Z) - CleanCLIP: Mitigating Data Poisoning Attacks in Multimodal Contrastive
Learning [63.72975421109622]
CleanCLIP is a finetuning framework that weakens the learned spurious associations introduced by backdoor attacks.
CleanCLIP maintains model performance on benign examples while erasing a range of backdoor attacks on multimodal contrastive learning.
arXiv Detail & Related papers (2023-03-06T17:48:32Z) - TransCAB: Transferable Clean-Annotation Backdoor to Object Detection
with Natural Trigger in Real-World [16.14185033829483]
We propose MACAB that crafts clean-annotated images to stealthily implant the backdoor into the object detectors trained on them.
We observe that the backdoor effect of both misclassification and the cloaking are robustly achieved in the wild.
MACAB exhibits more than 90% attack success rate under various real-world scenes.
arXiv Detail & Related papers (2022-09-06T09:56:33Z) - Towards Adversarially Robust Deep Image Denoising [199.2458715635285]
This work systematically investigates the adversarial robustness of deep image denoisers (DIDs)
We propose a novel adversarial attack, namely Observation-based Zero-mean Attack (sc ObsAtk) to craft adversarial zero-mean perturbations on given noisy images.
To robustify DIDs, we propose hybrid adversarial training (sc HAT) that jointly trains DIDs with adversarial and non-adversarial noisy data.
arXiv Detail & Related papers (2022-01-12T10:23:14Z) - Backdoor Attack on Hash-based Image Retrieval via Clean-label Data
Poisoning [54.15013757920703]
We propose the confusing perturbations-induced backdoor attack (CIBA)
It injects a small number of poisoned images with the correct label into the training data.
We have conducted extensive experiments to verify the effectiveness of our proposed CIBA.
arXiv Detail & Related papers (2021-09-18T07:56:59Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.