Poisoning the Unlabeled Dataset of Semi-Supervised Learning
- URL: http://arxiv.org/abs/2105.01622v1
- Date: Tue, 4 May 2021 16:55:20 GMT
- Title: Poisoning the Unlabeled Dataset of Semi-Supervised Learning
- Authors: Nicholas Carlini
- Abstract summary: We study a new class of vulnerabilities: poisoning attacks that modify the unlabeled dataset.
In order to be useful, unlabeled datasets are given strictly less review than labeled datasets.
Our attacks are highly effective across datasets and semi-supervised learning methods.
- Score: 26.093821359987224
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Semi-supervised machine learning models learn from a (small) set of labeled
training examples, and a (large) set of unlabeled training examples.
State-of-the-art models can reach within a few percentage points of
fully-supervised training, while requiring 100x less labeled data.
We study a new class of vulnerabilities: poisoning attacks that modify the
unlabeled dataset. In order to be useful, unlabeled datasets are given strictly
less review than labeled datasets, and adversaries can therefore poison them
easily. By inserting maliciously-crafted unlabeled examples totaling just 0.1%
of the dataset size, we can manipulate a model trained on this poisoned dataset
to misclassify arbitrary examples at test time (as any desired label). Our
attacks are highly effective across datasets and semi-supervised learning
methods.
We find that more accurate methods (thus more likely to be used) are
significantly more vulnerable to poisoning attacks, and as such better training
methods are unlikely to prevent this attack. To counter this we explore the
space of defenses, and propose two methods that mitigate our attack.
Related papers
- Wicked Oddities: Selectively Poisoning for Effective Clean-Label Backdoor Attacks [11.390175856652856]
Clean-label attacks are a more stealthy form of backdoor attacks that can perform the attack without changing the labels of poisoned data.
We study different strategies for selectively poisoning a small set of training samples in the target class to boost the attack success rate.
Our threat model poses a serious threat in training machine learning models with third-party datasets.
arXiv Detail & Related papers (2024-07-15T15:38:21Z) - FlatMatch: Bridging Labeled Data and Unlabeled Data with Cross-Sharpness
for Semi-Supervised Learning [73.13448439554497]
Semi-Supervised Learning (SSL) has been an effective way to leverage abundant unlabeled data with extremely scarce labeled data.
Most SSL methods are commonly based on instance-wise consistency between different data transformations.
We propose FlatMatch which minimizes a cross-sharpness measure to ensure consistent learning performance between the two datasets.
arXiv Detail & Related papers (2023-10-25T06:57:59Z) - Learning to Unlearn: Instance-wise Unlearning for Pre-trained
Classifiers [71.70205894168039]
We consider instance-wise unlearning, of which the goal is to delete information on a set of instances from a pre-trained model.
We propose two methods that reduce forgetting on the remaining data: 1) utilizing adversarial examples to overcome forgetting at the representation-level and 2) leveraging weight importance metrics to pinpoint network parameters guilty of propagating unwanted information.
arXiv Detail & Related papers (2023-01-27T07:53:50Z) - Rethinking Backdoor Data Poisoning Attacks in the Context of
Semi-Supervised Learning [5.417264344115724]
Semi-supervised learning methods can train high-accuracy machine learning models with a fraction of the labeled training samples required for traditional supervised learning.
Such methods do not typically involve close review of the unlabeled training samples, making them tempting targets for data poisoning attacks.
We show that simple poisoning attacks that influence the distribution of the poisoned samples' predicted labels are highly effective.
arXiv Detail & Related papers (2022-12-05T20:21:31Z) - Learning from Multiple Unlabeled Datasets with Partial Risk
Regularization [80.54710259664698]
In this paper, we aim to learn an accurate classifier without any class labels.
We first derive an unbiased estimator of the classification risk that can be estimated from the given unlabeled sets.
We then find that the classifier obtained as such tends to cause overfitting as its empirical risks go negative during training.
Experiments demonstrate that our method effectively mitigates overfitting and outperforms state-of-the-art methods for learning from multiple unlabeled sets.
arXiv Detail & Related papers (2022-07-04T16:22:44Z) - Poisoning and Backdooring Contrastive Learning [26.093821359987224]
Contrastive learning methods like CLIP train on noisy and uncurated datasets.
We show that this practice makes backdoor and poisoning attacks a significant threat.
arXiv Detail & Related papers (2021-06-17T17:20:45Z) - Adversarial Vulnerability of Active Transfer Learning [0.0]
Two widely used techniques for training supervised machine learning models on small datasets are Active Learning and Transfer Learning.
We show that the combination of these techniques is particularly susceptible to a new kind of data poisoning attack.
We show that a model trained on such a poisoned dataset has a significantly deteriorated performance, dropping from 86% to 34% test accuracy.
arXiv Detail & Related papers (2021-01-26T14:07:09Z) - Active Learning Under Malicious Mislabeling and Poisoning Attacks [2.4660652494309936]
Deep neural networks usually require large labeled datasets for training.
Most of these data are unlabeled and are vulnerable to data poisoning attacks.
In this paper, we develop an efficient active learning method that requires fewer labeled instances.
arXiv Detail & Related papers (2021-01-01T03:43:36Z) - Witches' Brew: Industrial Scale Data Poisoning via Gradient Matching [56.280018325419896]
Data Poisoning attacks modify training data to maliciously control a model trained on such data.
We analyze a particularly malicious poisoning attack that is both "from scratch" and "clean label"
We show that it is the first poisoning method to cause targeted misclassification in modern deep networks trained from scratch on a full-sized, poisoned ImageNet dataset.
arXiv Detail & Related papers (2020-09-04T16:17:54Z) - Intrinsic Certified Robustness of Bagging against Data Poisoning Attacks [75.46678178805382]
In a emphdata poisoning attack, an attacker modifies, deletes, and/or inserts some training examples to corrupt the learnt machine learning model.
We prove the intrinsic certified robustness of bagging against data poisoning attacks.
Our method achieves a certified accuracy of $91.1%$ on MNIST when arbitrarily modifying, deleting, and/or inserting 100 training examples.
arXiv Detail & Related papers (2020-08-11T03:12:42Z) - Adversarial Self-Supervised Contrastive Learning [62.17538130778111]
Existing adversarial learning approaches mostly use class labels to generate adversarial samples that lead to incorrect predictions.
We propose a novel adversarial attack for unlabeled data, which makes the model confuse the instance-level identities of the perturbed data samples.
We present a self-supervised contrastive learning framework to adversarially train a robust neural network without labeled data.
arXiv Detail & Related papers (2020-06-13T08:24:33Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.