Related papers: CorruptEncoder: Data Poisoning based Backdoor Attacks to Contrastive Learning

CorruptEncoder: Data Poisoning based Backdoor Attacks to Contrastive Learning

URL: http://arxiv.org/abs/2211.08229v5
Date: Thu, 29 Feb 2024 21:26:45 GMT
Title: CorruptEncoder: Data Poisoning based Backdoor Attacks to Contrastive Learning
Authors: Jinghuai Zhang and Hongbin Liu and Jinyuan Jia and Neil Zhenqiang Gong
Abstract summary: Contrastive learning pre-trains general-purpose encoders using an unlabeled pre-training dataset. DPBAs inject poisoned inputs into the pre-training dataset so the encoder is backdoored. CorruptEncoder introduces a new attack strategy to create poisoned inputs and uses a theory-guided method to maximize attack effectiveness. Our results show that our defense can reduce the effectiveness of DPBAs, but it sacrifices the utility of the encoder, highlighting the need for new defenses.
Score: 71.25518220297639
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Contrastive learning (CL) pre-trains general-purpose encoders using an unlabeled pre-training dataset, which consists of images or image-text pairs. CL is vulnerable to data poisoning based backdoor attacks (DPBAs), in which an attacker injects poisoned inputs into the pre-training dataset so the encoder is backdoored. However, existing DPBAs achieve limited effectiveness. In this work, we take the first step to analyze the limitations of existing backdoor attacks and propose new DPBAs called CorruptEncoder to CL. CorruptEncoder introduces a new attack strategy to create poisoned inputs and uses a theory-guided method to maximize attack effectiveness. Our experiments show that CorruptEncoder substantially outperforms existing DPBAs. In particular, CorruptEncoder is the first DPBA that achieves more than 90% attack success rates with only a few (3) reference images and a small poisoning ratio 0.5%. Moreover, we also propose a defense, called localized cropping, to defend against DPBAs. Our results show that our defense can reduce the effectiveness of DPBAs, but it sacrifices the utility of the encoder, highlighting the need for new defenses.

Related papers

DeDe: Detecting Backdoor Samples for SSL Encoders via Decoders [6.698677477097004]
Self-supervised learning (SSL) is pervasively exploited in training high-quality upstream encoders with a large amount of unlabeled data. Victim encoders associate triggered inputs with target embeddings, such that the downstream tasks inherit unintended behaviors when the trigger is activated. We propose a novel detection mechanism, DeDe, which detects the activation of backdoor mappings caused by triggered inputs on victim encoders.
arXiv Detail & Related papers (2024-11-25T07:26:22Z)
SEEP: Training Dynamics Grounds Latent Representation Search for Mitigating Backdoor Poisoning Attacks [53.28390057407576]
Modern NLP models are often trained on public datasets drawn from diverse sources. Data poisoning attacks can manipulate the model's behavior in ways engineered by the attacker. Several strategies have been proposed to mitigate the risks associated with backdoor attacks.
arXiv Detail & Related papers (2024-05-19T14:50:09Z)
Poisoning-based Backdoor Attacks for Arbitrary Target Label with Positive Triggers [8.15496105932744]
Poisoning-based backdoor attacks expose vulnerabilities in the data preparation stage of deep neural network (DNN) training. We develop a new categorization of triggers inspired by the adversarial technique and develop a multi-label and multi-payload Poisoning-based backdoor attack with Positive Triggers (PPT) Under both dirty- and clean-label settings, we show empirically that the proposed attack achieves a high attack success rate without sacrificing accuracy across various datasets.
arXiv Detail & Related papers (2024-05-09T06:45:11Z)
UltraClean: A Simple Framework to Train Robust Neural Networks against Backdoor Attacks [19.369701116838776]
Backdoor attacks are emerging threats to deep neural networks. They typically embed malicious behaviors into a victim model by injecting poisoned samples. We propose UltraClean, a framework that simplifies the identification of poisoned samples.
arXiv Detail & Related papers (2023-12-17T09:16:17Z)
Does Differential Privacy Prevent Backdoor Attacks in Practice? [8.951356689083166]
We investigate the effectiveness of Differential Privacy techniques in preventing backdoor attacks in machine learning models. We propose Label-DP as a faster and more accurate alternative to DP-SGD and PATE.
arXiv Detail & Related papers (2023-11-10T18:32:08Z)
Backdoor Attack with Sparse and Invisible Trigger [57.41876708712008]
Deep neural networks (DNNs) are vulnerable to backdoor attacks. backdoor attack is an emerging yet threatening training-phase threat. We propose a sparse and invisible backdoor attack (SIBA)
arXiv Detail & Related papers (2023-05-11T10:05:57Z)
PoisonedEncoder: Poisoning the Unlabeled Pre-training Data in Contrastive Learning [69.70602220716718]
We propose PoisonedEncoder, a data poisoning attack to contrastive learning. In particular, an attacker injects carefully crafted poisoning inputs into the unlabeled pre-training data. We evaluate five defenses against PoisonedEncoder, including one pre-processing, three in-processing, and one post-processing defenses.
arXiv Detail & Related papers (2022-05-13T00:15:44Z)
Model-Contrastive Learning for Backdoor Defense [13.781375023320981]
We propose a novel backdoor defense method named MCL based on model-contrastive learning. MCL is more effective for reducing backdoor threats while maintaining higher accuracy of benign data.
arXiv Detail & Related papers (2022-05-09T16:36:46Z)
Backdoor Attack on Hash-based Image Retrieval via Clean-label Data Poisoning [54.15013757920703]
We propose the confusing perturbations-induced backdoor attack (CIBA) It injects a small number of poisoned images with the correct label into the training data. We have conducted extensive experiments to verify the effectiveness of our proposed CIBA.
arXiv Detail & Related papers (2021-09-18T07:56:59Z)
DP-InstaHide: Provably Defusing Poisoning and Backdoor Attacks with Differentially Private Data Augmentations [54.960853673256]
We show that strong data augmentations, such as mixup and random additive noise, nullify poison attacks while enduring only a small accuracy trade-off. A rigorous analysis of DP-InstaHide shows that mixup does indeed have privacy advantages, and that training with k-way mixup provably yields at least k times stronger DP guarantees than a naive DP mechanism.
arXiv Detail & Related papers (2021-03-02T23:07:31Z)

This list is automatically generated from the titles and abstracts of the papers in this site.