Non-Convex SGD Learns Halfspaces with Adversarial Label Noise
- URL: http://arxiv.org/abs/2006.06742v1
- Date: Thu, 11 Jun 2020 18:55:59 GMT
- Title: Non-Convex SGD Learns Halfspaces with Adversarial Label Noise
- Authors: Ilias Diakonikolas, Vasilis Kontonis, Christos Tzamos, Nikos Zarifis
- Abstract summary: We show the solution to the problem of learning surrogate learning homogeneous halfspaces in the distribution-specific model.
In any convex distributions, we show that the misclassification error inherently leads to misclassification error of halfspace.
- Score: 50.659479930171585
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We study the problem of agnostically learning homogeneous halfspaces in the
distribution-specific PAC model. For a broad family of structured
distributions, including log-concave distributions, we show that non-convex SGD
efficiently converges to a solution with misclassification error
$O(\opt)+\eps$, where $\opt$ is the misclassification error of the best-fitting
halfspace. In sharp contrast, we show that optimizing any convex surrogate
inherently leads to misclassification error of $\omega(\opt)$, even under
Gaussian marginals.
Related papers
- Gradual Domain Adaptation via Manifold-Constrained Distributionally Robust Optimization [0.4732176352681218]
This paper addresses the challenge of gradual domain adaptation within a class of manifold-constrained data distributions.
We propose a methodology rooted in Distributionally Robust Optimization (DRO) with an adaptive Wasserstein radius.
Our bounds rely on a newly introduced it compatibility measure, which fully characterizes the error propagation dynamics along the sequence.
arXiv Detail & Related papers (2024-10-17T22:07:25Z) - Out-Of-Domain Unlabeled Data Improves Generalization [0.7589678255312519]
We propose a novel framework for incorporating unlabeled data into semi-supervised classification problems.
We show that unlabeled samples can be harnessed to narrow the generalization gap.
We validate our claims through experiments conducted on a variety of synthetic and real-world datasets.
arXiv Detail & Related papers (2023-09-29T02:00:03Z) - Distributed Extra-gradient with Optimal Complexity and Communication
Guarantees [60.571030754252824]
We consider monotone variational inequality (VI) problems in multi-GPU settings where multiple processors/workers/clients have access to local dual vectors.
Extra-gradient, which is a de facto algorithm for monotone VI problems, has not been designed to be communication-efficient.
We propose a quantized generalized extra-gradient (Q-GenX), which is an unbiased and adaptive compression method tailored to solve VIs.
arXiv Detail & Related papers (2023-08-17T21:15:04Z) - Wide flat minima and optimal generalization in classifying
high-dimensional Gaussian mixtures [8.556763944288116]
We show that there exist configurations that achieve the Bayes-optimal generalization error, even in the case of unbalanced clusters.
We also consider the algorithmically relevant case of targeting wide flat minima of the mean squared error loss.
arXiv Detail & Related papers (2020-10-27T01:32:03Z) - Agnostic Learning of Halfspaces with Gradient Descent via Soft Margins [92.7662890047311]
We show that gradient descent finds halfspaces with classification error $tilde O(mathsfOPT1/2) + varepsilon$ in $mathrmpoly(d,1/varepsilon)$ time and sample complexity.
arXiv Detail & Related papers (2020-10-01T16:48:33Z) - Learning Halfspaces with Tsybakov Noise [50.659479930171585]
We study the learnability of halfspaces in the presence of Tsybakov noise.
We give an algorithm that achieves misclassification error $epsilon$ with respect to the true halfspace.
arXiv Detail & Related papers (2020-06-11T14:25:02Z) - Revisiting SGD with Increasingly Weighted Averaging: Optimization and
Generalization Perspectives [50.12802772165797]
The averaging technique combines all iterative solutions into a single solution.
Experiments have demonstrated trade-off and the effectiveness of averaging compared with other averaging schemes.
arXiv Detail & Related papers (2020-03-09T18:14:00Z) - Learning Halfspaces with Massart Noise Under Structured Distributions [46.37328271312332]
We solve the problem of learningspaces with Massart noise in the distribution-specific PAC model.
We give the first efficient algorithm for a broad family problem with respect to log-con.
arXiv Detail & Related papers (2020-02-13T17:02:37Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.