Poisoning and Backdooring Contrastive Learning
- URL: http://arxiv.org/abs/2106.09667v1
- Date: Thu, 17 Jun 2021 17:20:45 GMT
- Title: Poisoning and Backdooring Contrastive Learning
- Authors: Nicholas Carlini, Andreas Terzis
- Abstract summary: Contrastive learning methods like CLIP train on noisy and uncurated datasets.
We show that this practice makes backdoor and poisoning attacks a significant threat.
- Score: 26.093821359987224
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Contrastive learning methods like CLIP train on noisy and uncurated training
datasets. This is cheaper than labeling datasets manually, and even improves
out-of-distribution robustness. We show that this practice makes backdoor and
poisoning attacks a significant threat. By poisoning just 0.005% of a dataset
(e.g., just 150 images of the 3 million-example Conceptual Captions dataset),
we can cause the model to misclassify test images by overlaying a small patch.
Targeted poisoning attacks, whereby the model misclassifies a particular test
input with an adversarially-desired label, are even easier requiring control of
less than 0.0001% of the dataset (e.g., just two out of the 3 million images).
Our attacks call into question whether training on noisy and uncurated Internet
scrapes is desirable.
Related papers
- Persistent Pre-Training Poisoning of LLMs [71.53046642099142]
Our work evaluates for the first time whether language models can also be compromised during pre-training.
We pre-train a series of LLMs from scratch to measure the impact of a potential poisoning adversary.
Our main result is that poisoning only 0.1% of a model's pre-training dataset is sufficient for three out of four attacks to persist through post-training.
arXiv Detail & Related papers (2024-10-17T16:27:13Z) - Wicked Oddities: Selectively Poisoning for Effective Clean-Label Backdoor Attacks [11.390175856652856]
Clean-label attacks are a more stealthy form of backdoor attacks that can perform the attack without changing the labels of poisoned data.
We study different strategies for selectively poisoning a small set of training samples in the target class to boost the attack success rate.
Our threat model poses a serious threat in training machine learning models with third-party datasets.
arXiv Detail & Related papers (2024-07-15T15:38:21Z) - Can We Trust the Unlabeled Target Data? Towards Backdoor Attack and Defense on Model Adaptation [120.42853706967188]
We explore the potential backdoor attacks on model adaptation launched by well-designed poisoning target data.
We propose a plug-and-play method named MixAdapt, combining it with existing adaptation algorithms.
arXiv Detail & Related papers (2024-01-11T16:42:10Z) - Label Poisoning is All You Need [38.23099403381095]
In a backdoor attack, an adversary injects corrupted data into a model's training dataset in order to gain control over its predictions on images.
We introduce a novel approach to design label-only backdoor attacks, which we call FLIP.
FLIP achieves a near-perfect attack success rate of 99.4% while suffering only a 1.8% drop in the clean test accuracy.
arXiv Detail & Related papers (2023-10-29T08:03:45Z) - Poisoning Web-Scale Training Datasets is Practical [73.34964403079775]
We introduce two new dataset poisoning attacks that intentionally introduce malicious examples to a model's performance.
First attack, split-view poisoning, exploits the mutable nature of internet content to ensure a dataset annotator's initial view of the dataset differs from the view downloaded by subsequent clients.
Second attack, frontrunning poisoning, targets web-scale datasets that periodically snapshot crowd-sourced content.
arXiv Detail & Related papers (2023-02-20T18:30:54Z) - Invisible Backdoor Attacks Using Data Poisoning in the Frequency Domain [8.64369418938889]
We propose a generalized backdoor attack method based on the frequency domain.
It can implement backdoor implantation without mislabeling and accessing the training process.
We evaluate our approach in the no-label and clean-label cases on three datasets.
arXiv Detail & Related papers (2022-07-09T07:05:53Z) - Narcissus: A Practical Clean-Label Backdoor Attack with Limited
Information [22.98039177091884]
"Clean-label" backdoor attacks require knowledge of the entire training set to be effective.
This paper provides an algorithm to mount clean-label backdoor attacks based only on the knowledge of representative examples from the target class.
Our attack works well across datasets and models, even when the trigger presents in the physical world.
arXiv Detail & Related papers (2022-04-11T16:58:04Z) - Adversarial Examples Make Strong Poisons [55.63469396785909]
We show that adversarial examples, originally intended for attacking pre-trained models, are even more effective for data poisoning than recent methods designed specifically for poisoning.
Our method, adversarial poisoning, is substantially more effective than existing poisoning methods for secure dataset release.
arXiv Detail & Related papers (2021-06-21T01:57:14Z) - Poisoning the Unlabeled Dataset of Semi-Supervised Learning [26.093821359987224]
We study a new class of vulnerabilities: poisoning attacks that modify the unlabeled dataset.
In order to be useful, unlabeled datasets are given strictly less review than labeled datasets.
Our attacks are highly effective across datasets and semi-supervised learning methods.
arXiv Detail & Related papers (2021-05-04T16:55:20Z) - Witches' Brew: Industrial Scale Data Poisoning via Gradient Matching [56.280018325419896]
Data Poisoning attacks modify training data to maliciously control a model trained on such data.
We analyze a particularly malicious poisoning attack that is both "from scratch" and "clean label"
We show that it is the first poisoning method to cause targeted misclassification in modern deep networks trained from scratch on a full-sized, poisoned ImageNet dataset.
arXiv Detail & Related papers (2020-09-04T16:17:54Z) - Intrinsic Certified Robustness of Bagging against Data Poisoning Attacks [75.46678178805382]
In a emphdata poisoning attack, an attacker modifies, deletes, and/or inserts some training examples to corrupt the learnt machine learning model.
We prove the intrinsic certified robustness of bagging against data poisoning attacks.
Our method achieves a certified accuracy of $91.1%$ on MNIST when arbitrarily modifying, deleting, and/or inserting 100 training examples.
arXiv Detail & Related papers (2020-08-11T03:12:42Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.