Related papers: Agnostic Learning of Halfspaces with Gradient Descent via Soft Margins

Agnostic Learning of Halfspaces with Gradient Descent via Soft Margins

URL: http://arxiv.org/abs/2010.00539v2
Date: Sun, 14 Feb 2021 02:20:24 GMT
Title: Agnostic Learning of Halfspaces with Gradient Descent via Soft Margins
Authors: Spencer Frei and Yuan Cao and Quanquan Gu
Abstract summary: We show that gradient descent finds halfspaces with classification error $tilde O(mathsfOPT1/2) + varepsilon$ in $mathrmpoly(d,1/varepsilon)$ time and sample complexity.
Score: 92.7662890047311
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: We analyze the properties of gradient descent on convex surrogates for the zero-one loss for the agnostic learning of linear halfspaces. If $\mathsf{OPT}$ is the best classification error achieved by a halfspace, by appealing to the notion of soft margins we are able to show that gradient descent finds halfspaces with classification error $\tilde O(\mathsf{OPT}^{1/2}) + \varepsilon$ in $\mathrm{poly}(d,1/\varepsilon)$ time and sample complexity for a broad class of distributions that includes log-concave isotropic distributions as a subclass. Along the way we answer a question recently posed by Ji et al. (2020) on how the tail behavior of a loss function can affect sample complexity and runtime guarantees for gradient descent.

Related papers

The informativeness of the gradient revisited [4.178980693837599]
We give a general bound on the variance in terms of a parameter related to the pairwise independence of the target function class.<n>In addition to the theoretical analysis, we present experiments to understand better the nature of recent deep learning-based attacks on Learning with Errors.
arXiv Detail & Related papers (2025-05-28T09:23:37Z)
Stable Minima Cannot Overfit in Univariate ReLU Networks: Generalization by Large Step Sizes [29.466981306355066]
We show that gradient descent with a fixed learning rate $eta$ can only find local minima that represent smooth functions. We also prove a nearly-optimal MSE bound of $widetildeO(n-4/5)$ within the strict interior of the support of the $n$ data points.
arXiv Detail & Related papers (2024-06-10T22:57:27Z)
Near-Optimal Cryptographic Hardness of Agnostically Learning Halfspaces and ReLU Regression under Gaussian Marginals [43.0867217287089]
We study the task of agnostically learning halfspaces under the Gaussian distribution. We prove a near-optimal computational hardness result for this task.
arXiv Detail & Related papers (2023-02-13T16:46:23Z)
High-probability Bounds for Non-Convex Stochastic Optimization with Heavy Tails [55.561406656549686]
We consider non- Hilbert optimization using first-order algorithms for which the gradient estimates may have tails. We show that a combination of gradient, momentum, and normalized gradient descent convergence to critical points in high-probability with best-known iteration for smooth losses.
arXiv Detail & Related papers (2021-06-28T00:17:01Z)
Provable Robustness of Adversarial Training for Learning Halfspaces with Noise [95.84614821570283]
We analyze the properties of adversarial learning adversarially robust halfspaces in the presence of label noise. To the best of our knowledge, this is the first work to show that adversarial training prov yields classifiers in noise.
arXiv Detail & Related papers (2021-04-19T16:35:38Z)
Faster Convergence of Stochastic Gradient Langevin Dynamics for Non-Log-Concave Sampling [110.88857917726276]
We provide a new convergence analysis of gradient Langevin dynamics (SGLD) for sampling from a class of distributions that can be non-log-concave. At the core of our approach is a novel conductance analysis of SGLD using an auxiliary time-reversible Markov Chain.
arXiv Detail & Related papers (2020-10-19T15:23:18Z)
Gradient Methods Never Overfit On Separable Data [31.714928102950584]
We show that standard gradient methods never overfit on separable data. We present non-asymptotic bounds on the number of margin violations over the dataset.
arXiv Detail & Related papers (2020-06-30T18:01:46Z)
Learning Halfspaces with Tsybakov Noise [50.659479930171585]
We study the learnability of halfspaces in the presence of Tsybakov noise. We give an algorithm that achieves misclassification error $epsilon$ with respect to the true halfspace.
arXiv Detail & Related papers (2020-06-11T14:25:02Z)
Agnostic Learning of a Single Neuron with Gradient Descent [92.7662890047311]
We consider the problem of learning the best-fitting single neuron as measured by the expected square loss. For the ReLU activation, our population risk guarantee is $O(mathsfOPT1/2)+epsilon$. For the ReLU activation, our population risk guarantee is $O(mathsfOPT1/2)+epsilon$.
arXiv Detail & Related papers (2020-05-29T07:20:35Z)

This list is automatically generated from the titles and abstracts of the papers in this site.