Robust learning under clean-label attack
- URL: http://arxiv.org/abs/2103.00671v2
- Date: Wed, 3 Mar 2021 05:48:18 GMT
- Title: Robust learning under clean-label attack
- Authors: Avrim Blum, Steve Hanneke, Jian Qian, Han Shao
- Abstract summary: We study the problem of robust learning under clean-label data-poisoning attacks.
The learning goal is to minimize the attackable rate, which is more difficult than optimal PAC learning.
- Score: 26.323812778809913
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We study the problem of robust learning under clean-label data-poisoning
attacks, where the attacker injects (an arbitrary set of) correctly-labeled
examples to the training set to fool the algorithm into making mistakes on
specific test instances at test time. The learning goal is to minimize the
attackable rate (the probability mass of attackable test instances), which is
more difficult than optimal PAC learning. As we show, any robust algorithm with
diminishing attackable rate can achieve the optimal dependence on $\epsilon$ in
its PAC sample complexity, i.e., $O(1/\epsilon)$. On the other hand, the
attackable rate might be large even for some optimal PAC learners, e.g., SVM
for linear classifiers. Furthermore, we show that the class of linear
hypotheses is not robustly learnable when the data distribution has zero margin
and is robustly learnable in the case of positive margin but requires sample
complexity exponential in the dimension. For a general hypothesis class with
bounded VC dimension, if the attacker is limited to add at most $t>0$ poison
examples, the optimal robust learning sample complexity grows almost linearly
with $t$.
Related papers
- Probably Approximately Precision and Recall Learning [62.912015491907994]
Precision and Recall are foundational metrics in machine learning.
One-sided feedback--where only positive examples are observed during training--is inherent in many practical problems.
We introduce a PAC learning framework where each hypothesis is represented by a graph, with edges indicating positive interactions.
arXiv Detail & Related papers (2024-11-20T04:21:07Z) - Reaching Goals is Hard: Settling the Sample Complexity of the Stochastic
Shortest Path [106.37656068276902]
We study the sample complexity of learning an $epsilon$-optimal policy in the Shortest Path (SSP) problem.
We derive complexity bounds when the learner has access to a generative model.
We show that there exists a worst-case SSP instance with $S$ states, $A$ actions, minimum cost $c_min$, and maximum expected cost of the optimal policy over all states $B_star$.
arXiv Detail & Related papers (2022-10-10T18:34:32Z) - Sample Complexity Bounds for Robustly Learning Decision Lists against
Evasion Attacks [25.832511407411637]
A fundamental problem in adversarial machine learning is to quantify how much training data is needed in the presence of evasion attacks.
We work with probability distributions on the input data that satisfy a Lipschitz condition: nearby points have similar probability.
For every fixed $k$ the class of $k$-decision lists has sample complexity against a $log(n)$-bounded adversary.
arXiv Detail & Related papers (2022-05-12T14:40:18Z) - A Characterization of Semi-Supervised Adversarially-Robust PAC Learnability [57.502573663108535]
We study the problem of learning an adversarially robust predictor to test time attacks in the semi-supervised PAC model.
We show that there is a significant benefit in semi-supervised robust learning even in the worst-case distribution-free model.
arXiv Detail & Related papers (2022-02-11T03:01:45Z) - Beyond No Regret: Instance-Dependent PAC Reinforcement Learning [29.552894877883883]
We show that there exists a tradeoff between achieving low regret and identifying an $epsilon$-optimal policy at the instance-optimal rate.
We propose and analyze a novel, planning-based algorithm which attains this sample complexity.
We show that our algorithm is nearly minimax optimal, and on several examples that our instance-dependent sample complexity offers significant improvements over worst-case bounds.
arXiv Detail & Related papers (2021-08-05T16:34:17Z) - Learning and Certification under Instance-targeted Poisoning [49.55596073963654]
We study PAC learnability and certification under instance-targeted poisoning attacks.
We show that when the budget of the adversary scales sublinearly with the sample complexity, PAC learnability and certification are achievable.
We empirically study the robustness of K nearest neighbour, logistic regression, multi-layer perceptron, and convolutional neural network on real data sets.
arXiv Detail & Related papers (2021-05-18T17:48:15Z) - Intrinsic Certified Robustness of Bagging against Data Poisoning Attacks [75.46678178805382]
In a emphdata poisoning attack, an attacker modifies, deletes, and/or inserts some training examples to corrupt the learnt machine learning model.
We prove the intrinsic certified robustness of bagging against data poisoning attacks.
Our method achieves a certified accuracy of $91.1%$ on MNIST when arbitrarily modifying, deleting, and/or inserting 100 training examples.
arXiv Detail & Related papers (2020-08-11T03:12:42Z) - Distributionally Robust Weighted $k$-Nearest Neighbors [21.537952410507483]
Learning a robust classifier from a few samples remains a key challenge in machine learning.
In this paper, we study a minimax distributionally robust formulation of weighted $k$-nearest neighbors.
We develop an algorithm, textttDr.k-NN, that efficiently solves this functional optimization problem.
arXiv Detail & Related papers (2020-06-07T00:34:33Z) - Breaking the Sample Size Barrier in Model-Based Reinforcement Learning
with a Generative Model [50.38446482252857]
This paper is concerned with the sample efficiency of reinforcement learning, assuming access to a generative model (or simulator)
We first consider $gamma$-discounted infinite-horizon Markov decision processes (MDPs) with state space $mathcalS$ and action space $mathcalA$.
We prove that a plain model-based planning algorithm suffices to achieve minimax-optimal sample complexity given any target accuracy level.
arXiv Detail & Related papers (2020-05-26T17:53:18Z) - Towards Deep Learning Models Resistant to Large Perturbations [0.0]
Adversarial robustness has proven to be a required property of machine learning algorithms.
We show that the well-established algorithm called "adversarial training" fails to train a deep neural network given a large, but reasonable, perturbation magnitude.
arXiv Detail & Related papers (2020-03-30T12:03:09Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.