PoisonedEncoder: Poisoning the Unlabeled Pre-training Data in
Contrastive Learning
- URL: http://arxiv.org/abs/2205.06401v2
- Date: Tue, 17 May 2022 17:15:36 GMT
- Title: PoisonedEncoder: Poisoning the Unlabeled Pre-training Data in
Contrastive Learning
- Authors: Hongbin Liu, Jinyuan Jia, Neil Zhenqiang Gong
- Abstract summary: We propose PoisonedEncoder, a data poisoning attack to contrastive learning.
In particular, an attacker injects carefully crafted poisoning inputs into the unlabeled pre-training data.
We evaluate five defenses against PoisonedEncoder, including one pre-processing, three in-processing, and one post-processing defenses.
- Score: 69.70602220716718
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Contrastive learning pre-trains an image encoder using a large amount of
unlabeled data such that the image encoder can be used as a general-purpose
feature extractor for various downstream tasks. In this work, we propose
PoisonedEncoder, a data poisoning attack to contrastive learning. In
particular, an attacker injects carefully crafted poisoning inputs into the
unlabeled pre-training data, such that the downstream classifiers built based
on the poisoned encoder for multiple target downstream tasks simultaneously
classify attacker-chosen, arbitrary clean inputs as attacker-chosen, arbitrary
classes. We formulate our data poisoning attack as a bilevel optimization
problem, whose solution is the set of poisoning inputs; and we propose a
contrastive-learning-tailored method to approximately solve it. Our evaluation
on multiple datasets shows that PoisonedEncoder achieves high attack success
rates while maintaining the testing accuracy of the downstream classifiers
built upon the poisoned encoder for non-attacker-chosen inputs. We also
evaluate five defenses against PoisonedEncoder, including one pre-processing,
three in-processing, and one post-processing defenses. Our results show that
these defenses can decrease the attack success rate of PoisonedEncoder, but
they also sacrifice the utility of the encoder or require a large clean
pre-training dataset.
Related papers
- Probe-Me-Not: Protecting Pre-trained Encoders from Malicious Probing [14.290156958543845]
Adapting pre-trained deep learning models to customized tasks has become a popular choice for developers.
probing--training a downstream head on a pre-trained encoder--has been widely adopted in transfer learning.
Such generalizability of pre-trained encoders raises concerns about the potential misuse of probing for harmful intentions.
We introduceLock, a novel applicability authorization method designed to protect pre-trained encoders from malicious probing.
arXiv Detail & Related papers (2024-11-19T13:50:08Z) - Poisoning-based Backdoor Attacks for Arbitrary Target Label with Positive Triggers [8.15496105932744]
Poisoning-based backdoor attacks expose vulnerabilities in the data preparation stage of deep neural network (DNN) training.
We develop a new categorization of triggers inspired by the adversarial technique and develop a multi-label and multi-payload Poisoning-based backdoor attack with Positive Triggers (PPT)
Under both dirty- and clean-label settings, we show empirically that the proposed attack achieves a high attack success rate without sacrificing accuracy across various datasets.
arXiv Detail & Related papers (2024-05-09T06:45:11Z) - Downstream-agnostic Adversarial Examples [66.8606539786026]
AdvEncoder is first framework for generating downstream-agnostic universal adversarial examples based on pre-trained encoder.
Unlike traditional adversarial example works, the pre-trained encoder only outputs feature vectors rather than classification labels.
Our results show that an attacker can successfully attack downstream tasks without knowing either the pre-training dataset or the downstream dataset.
arXiv Detail & Related papers (2023-07-23T10:16:47Z) - CorruptEncoder: Data Poisoning based Backdoor Attacks to Contrastive
Learning [71.25518220297639]
Contrastive learning pre-trains general-purpose encoders using an unlabeled pre-training dataset.
DPBAs inject poisoned inputs into the pre-training dataset so the encoder is backdoored.
CorruptEncoder introduces a new attack strategy to create poisoned inputs and uses a theory-guided method to maximize attack effectiveness.
Our results show that our defense can reduce the effectiveness of DPBAs, but it sacrifices the utility of the encoder, highlighting the need for new defenses.
arXiv Detail & Related papers (2022-11-15T15:48:28Z) - AWEncoder: Adversarial Watermarking Pre-trained Encoders in Contrastive
Learning [18.90841192412555]
We introduce AWEncoder, an adversarial method for watermarking the pre-trained encoder in contrastive learning.
The proposed work enjoys pretty good effectiveness and robustness on different contrastive learning algorithms and downstream tasks.
arXiv Detail & Related papers (2022-08-08T07:23:37Z) - StolenEncoder: Stealing Pre-trained Encoders [62.02156378126672]
We propose the first attack called StolenEncoder to steal pre-trained image encoders.
Our results show that the encoders stolen by StolenEncoder have similar functionality with the target encoders.
arXiv Detail & Related papers (2022-01-15T17:04:38Z) - Classification Auto-Encoder based Detector against Diverse Data
Poisoning Attacks [7.150136251781658]
Poisoning attacks are a category of adversarial machine learning threats.
In this paper, we propose CAE, a Classification Auto-Encoder based detector against poisoned data.
We show that an enhanced version of CAE (called CAE+) does not have to employ a clean data set to train the defense model.
arXiv Detail & Related papers (2021-08-09T17:46:52Z) - How Robust are Randomized Smoothing based Defenses to Data Poisoning? [66.80663779176979]
We present a previously unrecognized threat to robust machine learning models that highlights the importance of training-data quality.
We propose a novel bilevel optimization-based data poisoning attack that degrades the robustness guarantees of certifiably robust classifiers.
Our attack is effective even when the victim trains the models from scratch using state-of-the-art robust training methods.
arXiv Detail & Related papers (2020-12-02T15:30:21Z) - Witches' Brew: Industrial Scale Data Poisoning via Gradient Matching [56.280018325419896]
Data Poisoning attacks modify training data to maliciously control a model trained on such data.
We analyze a particularly malicious poisoning attack that is both "from scratch" and "clean label"
We show that it is the first poisoning method to cause targeted misclassification in modern deep networks trained from scratch on a full-sized, poisoned ImageNet dataset.
arXiv Detail & Related papers (2020-09-04T16:17:54Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.