Lethal Dose Conjecture on Data Poisoning
- URL: http://arxiv.org/abs/2208.03309v1
- Date: Fri, 5 Aug 2022 17:53:59 GMT
- Title: Lethal Dose Conjecture on Data Poisoning
- Authors: Wenxiao Wang, Alexander Levine, Soheil Feizi
- Abstract summary: Data poisoning considers an adversary that distorts the training set of machine learning algorithms for malicious purposes.
In this work, we bring to light one conjecture regarding the fundamentals of data poisoning, which we call the Lethal Dose Conjecture.
- Score: 122.83280749890078
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Data poisoning considers an adversary that distorts the training set of
machine learning algorithms for malicious purposes. In this work, we bring to
light one conjecture regarding the fundamentals of data poisoning, which we
call the Lethal Dose Conjecture. The conjecture states: If $n$ clean training
samples are needed for accurate predictions, then in a size-$N$ training set,
only $\Theta(N/n)$ poisoned samples can be tolerated while ensuring accuracy.
Theoretically, we verify this conjecture in multiple cases. We also offer a
more general perspective of this conjecture through distribution
discrimination. Deep Partition Aggregation (DPA) and its extension, Finite
Aggregation (FA) are recent approaches for provable defenses against data
poisoning, where they predict through the majority vote of many base models
trained from different subsets of training set using a given learner. The
conjecture implies that both DPA and FA are (asymptotically) optimal -- if we
have the most data-efficient learner, they can turn it into one of the most
robust defenses against data poisoning. This outlines a practical approach to
developing stronger defenses against poisoning via finding data-efficient
learners. Empirically, as a proof of concept, we show that by simply using
different data augmentations for base learners, we can respectively double and
triple the certified robustness of DPA on CIFAR-10 and GTSRB without
sacrificing accuracy.
Related papers
- Unlearnable Examples Detection via Iterative Filtering [84.59070204221366]
Deep neural networks are proven to be vulnerable to data poisoning attacks.
It is quite beneficial and challenging to detect poisoned samples from a mixed dataset.
We propose an Iterative Filtering approach for UEs identification.
arXiv Detail & Related papers (2024-08-15T13:26:13Z) - On Practical Aspects of Aggregation Defenses against Data Poisoning
Attacks [58.718697580177356]
Attacks on deep learning models with malicious training samples are known as data poisoning.
Recent advances in defense strategies against data poisoning have highlighted the effectiveness of aggregation schemes in achieving certified poisoning robustness.
Here we focus on Deep Partition Aggregation, a representative aggregation defense, and assess its practical aspects, including efficiency, performance, and robustness.
arXiv Detail & Related papers (2023-06-28T17:59:35Z) - How to Sift Out a Clean Data Subset in the Presence of Data Poisoning? [22.014227948221727]
We study how precise automated tools and human inspection are at identifying clean data in the presence of data poisoning attacks.
Our method is based on the insight that existing attacks' poisoned samples shifts from clean data distributions.
Our evaluation shows that Meta-Sift can sift a clean base set with 100% precision under a wide range of poisoning attacks.
arXiv Detail & Related papers (2022-10-12T18:18:21Z) - Improved Certified Defenses against Data Poisoning with (Deterministic)
Finite Aggregation [122.83280749890078]
We propose an improved certified defense against general poisoning attacks, namely Finite Aggregation.
In contrast to DPA, which directly splits the training set into disjoint subsets, our method first splits the training set into smaller disjoint subsets.
We offer an alternative view of our method, bridging the designs of deterministic and aggregation-based certified defenses.
arXiv Detail & Related papers (2022-02-05T20:08:58Z) - Provable Defense Against Delusive Poisoning [64.69220849669948]
We show that adversarial training can be a principled defense method against delusive poisoning.
This implies that adversarial training can be a principled defense method against delusive poisoning.
arXiv Detail & Related papers (2021-02-09T09:19:47Z) - A Framework of Randomized Selection Based Certified Defenses Against
Data Poisoning Attacks [28.593598534525267]
This paper proposes a framework of random selection based certified defenses against data poisoning attacks.
We prove that the random selection schemes that satisfy certain conditions are robust against data poisoning attacks.
Our framework allows users to improve robustness by leveraging prior knowledge about the training set and the poisoning model.
arXiv Detail & Related papers (2020-09-18T10:38:12Z) - Deep Partition Aggregation: Provable Defense against General Poisoning
Attacks [136.79415677706612]
Adrial poisoning attacks distort training data in order to corrupt the test-time behavior of a classifier.
We propose two novel provable defenses against poisoning attacks.
DPA is a certified defense against a general poisoning threat model.
SS-DPA is a certified defense against label-flipping attacks.
arXiv Detail & Related papers (2020-06-26T03:16:31Z) - A Separation Result Between Data-oblivious and Data-aware Poisoning
Attacks [40.044030156696145]
Poisoning attacks have emerged as a significant security threat to machine learning algorithms.
Some of the stronger poisoning attacks require the full knowledge of the training data.
We show that full-information adversaries are provably stronger than the optimal attacker.
arXiv Detail & Related papers (2020-03-26T16:40:35Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.