Provable Robustness of Adversarial Training for Learning Halfspaces with
Noise
- URL: http://arxiv.org/abs/2104.09437v1
- Date: Mon, 19 Apr 2021 16:35:38 GMT
- Title: Provable Robustness of Adversarial Training for Learning Halfspaces with
Noise
- Authors: Difan Zou and Spencer Frei and Quanquan Gu
- Abstract summary: We analyze the properties of adversarial learning adversarially robust halfspaces in the presence of label noise.
To the best of our knowledge, this is the first work to show that adversarial training prov yields classifiers in noise.
- Score: 95.84614821570283
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We analyze the properties of adversarial training for learning adversarially
robust halfspaces in the presence of agnostic label noise. Denoting
$\mathsf{OPT}_{p,r}$ as the best robust classification error achieved by a
halfspace that is robust to perturbations of $\ell_{p}$ balls of radius $r$, we
show that adversarial training on the standard binary cross-entropy loss yields
adversarially robust halfspaces up to (robust) classification error $\tilde
O(\sqrt{\mathsf{OPT}_{2,r}})$ for $p=2$, and $\tilde O(d^{1/4}
\sqrt{\mathsf{OPT}_{\infty, r}} + d^{1/2} \mathsf{OPT}_{\infty,r})$ when
$p=\infty$. Our results hold for distributions satisfying anti-concentration
properties enjoyed by log-concave isotropic distributions among others. We
additionally show that if one instead uses a nonconvex sigmoidal loss,
adversarial training yields halfspaces with an improved robust classification
error of $O(\mathsf{OPT}_{2,r})$ for $p=2$, and $O(d^{1/4}\mathsf{OPT}_{\infty,
r})$ when $p=\infty$. To the best of our knowledge, this is the first work to
show that adversarial training provably yields robust classifiers in the
presence of noise.
Related papers
- Learning Intersections of Halfspaces with Distribution Shift: Improved Algorithms and SQ Lower Bounds [9.036777309376697]
Klivans, Stavropoulos, and Vasilyan initiated the study of testable learning with distribution shift.
Their model deviates from all prior work in that no assumptions are made on $mathcalD'$.
arXiv Detail & Related papers (2024-04-02T23:34:39Z) - Sharp Rates in Dependent Learning Theory: Avoiding Sample Size Deflation for the Square Loss [33.18537822803389]
We show that whenever the topologies of $L2$ and $Psi_p$ are comparable on our hypothesis class $mathscrF$, $mathscrF$ is a weakly sub-Gaussian class.
Our result holds whether the problem is realizable or not and we refer to this as a emphnear mixing-free rate, since direct dependence on mixing is relegated to an additive higher order term.
arXiv Detail & Related papers (2024-02-08T18:57:42Z) - Estimation and Inference in Distributional Reinforcement Learning [28.253677740976197]
We show that a dataset of size $widetilde Oleft(frac|mathcalS||mathcalA|epsilon2 (1-gamma)4right)$ suffices to ensure the Kolmogorov metric and total variation metric between $hatetapi$ and $etapi$ is below $epsilon$ with high probability.
Our findings give rise to a unified approach to statistical inference of a wide class of statistical functionals of $etapi$.
arXiv Detail & Related papers (2023-09-29T14:14:53Z) - Near-Optimal Cryptographic Hardness of Agnostically Learning Halfspaces
and ReLU Regression under Gaussian Marginals [43.0867217287089]
We study the task of agnostically learning halfspaces under the Gaussian distribution.
We prove a near-optimal computational hardness result for this task.
arXiv Detail & Related papers (2023-02-13T16:46:23Z) - Cryptographic Hardness of Learning Halfspaces with Massart Noise [59.8587499110224]
We study the complexity of PAC learning halfspaces in the presence of Massart noise.
We show that no-time Massart halfspace learners can achieve error better than $Omega(eta)$, even if the optimal 0-1 error is small.
arXiv Detail & Related papers (2022-07-28T17:50:53Z) - Learning a Single Neuron with Adversarial Label Noise via Gradient
Descent [50.659479930171585]
We study a function of the form $mathbfxmapstosigma(mathbfwcdotmathbfx)$ for monotone activations.
The goal of the learner is to output a hypothesis vector $mathbfw$ that $F(mathbbw)=C, epsilon$ with high probability.
arXiv Detail & Related papers (2022-06-17T17:55:43Z) - Threshold Phenomena in Learning Halfspaces with Massart Noise [56.01192577666607]
We study the problem of PAC learning halfspaces on $mathbbRd$ with Massart noise under Gaussian marginals.
Our results qualitatively characterize the complexity of learning halfspaces in the Massart model.
arXiv Detail & Related papers (2021-08-19T16:16:48Z) - Self-training Converts Weak Learners to Strong Learners in Mixture
Models [86.7137362125503]
We show that a pseudolabeler $boldsymbolbeta_mathrmpl$ can achieve classification error at most $C_mathrmerr$.
We additionally show that by running gradient descent on the logistic loss one can obtain a pseudolabeler $boldsymbolbeta_mathrmpl$ with classification error $C_mathrmerr$ using only $O(d)$ labeled examples.
arXiv Detail & Related papers (2021-06-25T17:59:16Z) - On the Power of Localized Perceptron for Label-Optimal Learning of
Halfspaces with Adversarial Noise [4.8728183994912415]
We study em online active learning of homogeneous halfspaces in $mathbbRd$ with adversarial noise.
Our main contribution is a Perceptron-like active learning algorithm that runs in time.
Our algorithm learns the underlying halfspace with near-optimal label complexity.
arXiv Detail & Related papers (2020-12-19T22:04:10Z) - Agnostic Learning of Halfspaces with Gradient Descent via Soft Margins [92.7662890047311]
We show that gradient descent finds halfspaces with classification error $tilde O(mathsfOPT1/2) + varepsilon$ in $mathrmpoly(d,1/varepsilon)$ time and sample complexity.
arXiv Detail & Related papers (2020-10-01T16:48:33Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.