Intrinsic Certified Robustness of Bagging against Data Poisoning Attacks
- URL: http://arxiv.org/abs/2008.04495v7
- Date: Wed, 9 Dec 2020 21:44:40 GMT
- Title: Intrinsic Certified Robustness of Bagging against Data Poisoning Attacks
- Authors: Jinyuan Jia and Xiaoyu Cao and Neil Zhenqiang Gong
- Abstract summary: In a emphdata poisoning attack, an attacker modifies, deletes, and/or inserts some training examples to corrupt the learnt machine learning model.
We prove the intrinsic certified robustness of bagging against data poisoning attacks.
Our method achieves a certified accuracy of $91.1%$ on MNIST when arbitrarily modifying, deleting, and/or inserting 100 training examples.
- Score: 75.46678178805382
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In a \emph{data poisoning attack}, an attacker modifies, deletes, and/or
inserts some training examples to corrupt the learnt machine learning model.
\emph{Bootstrap Aggregating (bagging)} is a well-known ensemble learning
method, which trains multiple base models on random subsamples of a training
dataset using a base learning algorithm and uses majority vote to predict
labels of testing examples. We prove the intrinsic certified robustness of
bagging against data poisoning attacks. Specifically, we show that bagging with
an arbitrary base learning algorithm provably predicts the same label for a
testing example when the number of modified, deleted, and/or inserted training
examples is bounded by a threshold. Moreover, we show that our derived
threshold is tight if no assumptions on the base learning algorithm are made.
We evaluate our method on MNIST and CIFAR10. For instance, our method achieves
a certified accuracy of $91.1\%$ on MNIST when arbitrarily modifying, deleting,
and/or inserting 100 training examples. Code is available at:
\url{https://github.com/jjy1994/BaggingCertifyDataPoisoning}.
Related papers
- On the Exploitability of Instruction Tuning [103.8077787502381]
In this work, we investigate how an adversary can exploit instruction tuning to change a model's behavior.
We propose textitAutoPoison, an automated data poisoning pipeline.
Our results show that AutoPoison allows an adversary to change a model's behavior by poisoning only a small fraction of data.
arXiv Detail & Related papers (2023-06-28T17:54:04Z) - Learning to Unlearn: Instance-wise Unlearning for Pre-trained
Classifiers [71.70205894168039]
We consider instance-wise unlearning, of which the goal is to delete information on a set of instances from a pre-trained model.
We propose two methods that reduce forgetting on the remaining data: 1) utilizing adversarial examples to overcome forgetting at the representation-level and 2) leveraging weight importance metrics to pinpoint network parameters guilty of propagating unwanted information.
arXiv Detail & Related papers (2023-01-27T07:53:50Z) - DAD: Data-free Adversarial Defense at Test Time [21.741026088202126]
Deep models are highly susceptible to adversarial attacks.
Privacy has become an important concern, restricting access to only trained models but not the training data.
We propose a completely novel problem of 'test-time adversarial defense in absence of training data and even their statistics'
arXiv Detail & Related papers (2022-04-04T15:16:13Z) - Identifying a Training-Set Attack's Target Using Renormalized Influence
Estimation [11.663072799764542]
This work proposes the task of target identification, which determines whether a specific test instance is the target of a training-set attack.
Rather than focusing on a single attack method or data modality, we build on influence estimation, which quantifies each training instance's contribution to a model's prediction.
arXiv Detail & Related papers (2022-01-25T02:36:34Z) - Dash: Semi-Supervised Learning with Dynamic Thresholding [72.74339790209531]
We propose a semi-supervised learning (SSL) approach that uses unlabeled examples to train models.
Our proposed approach, Dash, enjoys its adaptivity in terms of unlabeled data selection.
arXiv Detail & Related papers (2021-09-01T23:52:29Z) - Towards optimally abstaining from prediction [22.937799541125607]
A common challenge across all areas of machine learning is that training data is not distributed like test data.
We consider a model where one may abstain from predicting, at a fixed cost.
Our work builds on a recent abstention algorithm of Goldwasser, Kalais, and Montasser ( 2020) for transductive binary classification.
arXiv Detail & Related papers (2021-05-28T21:44:48Z) - Poisoning the Unlabeled Dataset of Semi-Supervised Learning [26.093821359987224]
We study a new class of vulnerabilities: poisoning attacks that modify the unlabeled dataset.
In order to be useful, unlabeled datasets are given strictly less review than labeled datasets.
Our attacks are highly effective across datasets and semi-supervised learning methods.
arXiv Detail & Related papers (2021-05-04T16:55:20Z) - Continual Learning for Fake Audio Detection [62.54860236190694]
This paper proposes detecting fake without forgetting, a continual-learning-based method, to make the model learn new spoofing attacks incrementally.
Experiments are conducted on the ASVspoof 2019 dataset.
arXiv Detail & Related papers (2021-04-15T07:57:05Z) - Learning with Out-of-Distribution Data for Audio Classification [60.48251022280506]
We show that detecting and relabelling certain OOD instances, rather than discarding them, can have a positive effect on learning.
The proposed method is shown to improve the performance of convolutional neural networks by a significant margin.
arXiv Detail & Related papers (2020-02-11T21:08:06Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.