Related papers: Proper Learning, Helly Number, and an Optimal SVM Bound

Proper Learning, Helly Number, and an Optimal SVM Bound

URL: http://arxiv.org/abs/2005.11818v1
Date: Sun, 24 May 2020 18:11:57 GMT
Title: Proper Learning, Helly Number, and an Optimal SVM Bound
Authors: Olivier Bousquet, Steve Hanneke, Shay Moran, and Nikita Zhivotovskiy
Abstract summary: We characterize classes for which the optimal sample complexity can be achieved by a proper learning algorithm. We show that the dual Helly number is bounded if and only if there is a proper joint dependence on $epsilon$ and $delta$.
Score: 29.35254938542589
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: The classical PAC sample complexity bounds are stated for any Empirical Risk Minimizer (ERM) and contain an extra logarithmic factor $\log(1/{\epsilon})$ which is known to be necessary for ERM in general. It has been recently shown by Hanneke (2016) that the optimal sample complexity of PAC learning for any VC class C is achieved by a particular improper learning algorithm, which outputs a specific majority-vote of hypotheses in C. This leaves the question of when this bound can be achieved by proper learning algorithms, which are restricted to always output a hypothesis from C. In this paper we aim to characterize the classes for which the optimal sample complexity can be achieved by a proper learning algorithm. We identify that these classes can be characterized by the dual Helly number, which is a combinatorial parameter that arises in discrete geometry and abstract convexity. In particular, under general conditions on C, we show that the dual Helly number is bounded if and only if there is a proper learner that obtains the optimal joint dependence on $\epsilon$ and $\delta$. As further implications of our techniques we resolve a long-standing open problem posed by Vapnik and Chervonenkis (1974) on the performance of the Support Vector Machine by proving that the sample complexity of SVM in the realizable case is $\Theta((n/{\epsilon})+(1/{\epsilon})\log(1/{\delta}))$, where $n$ is the dimension. This gives the first optimal PAC bound for Halfspaces achieved by a proper learning algorithm, and moreover is computationally efficient.

Related papers

Algorithms and SQ Lower Bounds for Robustly Learning Real-valued Multi-index Models [34.196233651364615]
We study the complexity of learning real-valued Multi-Index Models (MIMs) under the Gaussian distribution.<n>A $K$-MIM is a function $f:mathbbRdto mathbbR$ that depends only on the projection of its input onto a $K$-dimensional subspace.<n>We give a general algorithm for PAC learning a broad class of MIMs with respect to the square loss, even in the presence of adversarial label noise.
arXiv Detail & Related papers (2025-05-27T17:47:26Z)
Robust Learning of Multi-index Models via Iterative Subspace Approximation [36.138661719725626]
We study the task of learning Multi-Index Models (MIMs) with label noise under the Gaussian distribution. We focus on well-behaved MIMs with finite ranges that satisfy certain regularity properties. We show that in the presence of random classification noise, the complexity of our algorithm scales agnosticly with $1/epsilon$.
arXiv Detail & Related papers (2025-02-13T17:37:42Z)
Projection by Convolution: Optimal Sample Complexity for Reinforcement Learning in Continuous-Space MDPs [56.237917407785545]
We consider the problem of learning an $varepsilon$-optimal policy in a general class of continuous-space Markov decision processes (MDPs) having smooth Bellman operators. Key to our solution is a novel projection technique based on ideas from harmonic analysis. Our result bridges the gap between two popular but conflicting perspectives on continuous-space MDPs.
arXiv Detail & Related papers (2024-05-10T09:58:47Z)
Efficiently Learning One-Hidden-Layer ReLU Networks via Schur Polynomials [50.90125395570797]
We study the problem of PAC learning a linear combination of $k$ ReLU activations under the standard Gaussian distribution on $mathbbRd$ with respect to the square loss. Our main result is an efficient algorithm for this learning task with sample and computational complexity $(dk/epsilon)O(k)$, whereepsilon>0$ is the target accuracy.
arXiv Detail & Related papers (2023-07-24T14:37:22Z)
Near-Optimal Bounds for Learning Gaussian Halfspaces with Random Classification Noise [50.64137465792738]
We show that any efficient SQ algorithm for the problem requires sample complexity at least $Omega(d1/2/(maxp, epsilon)2)$. Our lower bound suggests that this quadratic dependence on $1/epsilon$ is inherent for efficient algorithms.
arXiv Detail & Related papers (2023-07-13T18:59:28Z)
Information-Computation Tradeoffs for Learning Margin Halfspaces with Random Classification Noise [50.64137465792738]
We study the problem of PAC $gamma$-margin halfspaces with Random Classification Noise. We establish an information-computation tradeoff suggesting an inherent gap between the sample complexity of the problem and the sample complexity of computationally efficient algorithms.
arXiv Detail & Related papers (2023-06-28T16:33:39Z)
Private estimation algorithms for stochastic block models and mixture models [63.07482515700984]
General tools for designing efficient private estimation algorithms. First efficient $(epsilon, delta)$-differentially private algorithm for both weak recovery and exact recovery.
arXiv Detail & Related papers (2023-01-11T09:12:28Z)
Learning Halfspaces with Tsybakov Noise [50.659479930171585]
We study the learnability of halfspaces in the presence of Tsybakov noise. We give an algorithm that achieves misclassification error $epsilon$ with respect to the true halfspace.
arXiv Detail & Related papers (2020-06-11T14:25:02Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.