Related papers: PoisonedEncoder: Poisoning the Unlabeled Pre-training Data in Contrastive Learning

PoisonedEncoder: Poisoning the Unlabeled Pre-training Data in Contrastive Learning

URL: http://arxiv.org/abs/2205.06401v2
Date: Tue, 17 May 2022 17:15:36 GMT
Title: PoisonedEncoder: Poisoning the Unlabeled Pre-training Data in Contrastive Learning
Authors: Hongbin Liu, Jinyuan Jia, Neil Zhenqiang Gong
Abstract summary: We propose PoisonedEncoder, a data poisoning attack to contrastive learning. In particular, an attacker injects carefully crafted poisoning inputs into the unlabeled pre-training data. We evaluate five defenses against PoisonedEncoder, including one pre-processing, three in-processing, and one post-processing defenses.
Score: 69.70602220716718
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Contrastive learning pre-trains an image encoder using a large amount of unlabeled data such that the image encoder can be used as a general-purpose feature extractor for various downstream tasks. In this work, we propose PoisonedEncoder, a data poisoning attack to contrastive learning. In particular, an attacker injects carefully crafted poisoning inputs into the unlabeled pre-training data, such that the downstream classifiers built based on the poisoned encoder for multiple target downstream tasks simultaneously classify attacker-chosen, arbitrary clean inputs as attacker-chosen, arbitrary classes. We formulate our data poisoning attack as a bilevel optimization problem, whose solution is the set of poisoning inputs; and we propose a contrastive-learning-tailored method to approximately solve it. Our evaluation on multiple datasets shows that PoisonedEncoder achieves high attack success rates while maintaining the testing accuracy of the downstream classifiers built upon the poisoned encoder for non-attacker-chosen inputs. We also evaluate five defenses against PoisonedEncoder, including one pre-processing, three in-processing, and one post-processing defenses. Our results show that these defenses can decrease the attack success rate of PoisonedEncoder, but they also sacrifice the utility of the encoder or require a large clean pre-training dataset.

Related papers

Probe-Me-Not: Protecting Pre-trained Encoders from Malicious Probing [14.290156958543845]
Adapting pre-trained deep learning models to customized tasks has become a popular choice for developers. probing--training a downstream head on a pre-trained encoder--has been widely adopted in transfer learning. Such generalizability of pre-trained encoders raises concerns about the potential misuse of probing for harmful intentions. We introduceLock, a novel applicability authorization method designed to protect pre-trained encoders from malicious probing.
arXiv Detail & Related papers (2024-11-19T13:50:08Z)
Poisoning-based Backdoor Attacks for Arbitrary Target Label with Positive Triggers [8.15496105932744]
Poisoning-based backdoor attacks expose vulnerabilities in the data preparation stage of deep neural network (DNN) training. We develop a new categorization of triggers inspired by the adversarial technique and develop a multi-label and multi-payload Poisoning-based backdoor attack with Positive Triggers (PPT) Under both dirty- and clean-label settings, we show empirically that the proposed attack achieves a high attack success rate without sacrificing accuracy across various datasets.
arXiv Detail & Related papers (2024-05-09T06:45:11Z)
Downstream-agnostic Adversarial Examples [66.8606539786026]
AdvEncoder is first framework for generating downstream-agnostic universal adversarial examples based on pre-trained encoder. Unlike traditional adversarial example works, the pre-trained encoder only outputs feature vectors rather than classification labels. Our results show that an attacker can successfully attack downstream tasks without knowing either the pre-training dataset or the downstream dataset.
arXiv Detail & Related papers (2023-07-23T10:16:47Z)
CorruptEncoder: Data Poisoning based Backdoor Attacks to Contrastive Learning [71.25518220297639]
Contrastive learning pre-trains general-purpose encoders using an unlabeled pre-training dataset. DPBAs inject poisoned inputs into the pre-training dataset so the encoder is backdoored. CorruptEncoder introduces a new attack strategy to create poisoned inputs and uses a theory-guided method to maximize attack effectiveness. Our results show that our defense can reduce the effectiveness of DPBAs, but it sacrifices the utility of the encoder, highlighting the need for new defenses.
arXiv Detail & Related papers (2022-11-15T15:48:28Z)
AWEncoder: Adversarial Watermarking Pre-trained Encoders in Contrastive Learning [18.90841192412555]
We introduce AWEncoder, an adversarial method for watermarking the pre-trained encoder in contrastive learning. The proposed work enjoys pretty good effectiveness and robustness on different contrastive learning algorithms and downstream tasks.
arXiv Detail & Related papers (2022-08-08T07:23:37Z)
StolenEncoder: Stealing Pre-trained Encoders [62.02156378126672]
We propose the first attack called StolenEncoder to steal pre-trained image encoders. Our results show that the encoders stolen by StolenEncoder have similar functionality with the target encoders.
arXiv Detail & Related papers (2022-01-15T17:04:38Z)
Classification Auto-Encoder based Detector against Diverse Data Poisoning Attacks [7.150136251781658]
Poisoning attacks are a category of adversarial machine learning threats. In this paper, we propose CAE, a Classification Auto-Encoder based detector against poisoned data. We show that an enhanced version of CAE (called CAE+) does not have to employ a clean data set to train the defense model.
arXiv Detail & Related papers (2021-08-09T17:46:52Z)
How Robust are Randomized Smoothing based Defenses to Data Poisoning? [66.80663779176979]
We present a previously unrecognized threat to robust machine learning models that highlights the importance of training-data quality. We propose a novel bilevel optimization-based data poisoning attack that degrades the robustness guarantees of certifiably robust classifiers. Our attack is effective even when the victim trains the models from scratch using state-of-the-art robust training methods.
arXiv Detail & Related papers (2020-12-02T15:30:21Z)
Witches' Brew: Industrial Scale Data Poisoning via Gradient Matching [56.280018325419896]
Data Poisoning attacks modify training data to maliciously control a model trained on such data. We analyze a particularly malicious poisoning attack that is both "from scratch" and "clean label" We show that it is the first poisoning method to cause targeted misclassification in modern deep networks trained from scratch on a full-sized, poisoned ImageNet dataset.
arXiv Detail & Related papers (2020-09-04T16:17:54Z)

This list is automatically generated from the titles and abstracts of the papers in this site.