Optimal PAC Bounds Without Uniform Convergence
- URL: http://arxiv.org/abs/2304.09167v1
- Date: Tue, 18 Apr 2023 17:57:31 GMT
- Title: Optimal PAC Bounds Without Uniform Convergence
- Authors: Ishaq Aden-Ali, Yeshwanth Cherapanamjeri, Abhishek Shetty, Nikita
Zhivotovskiy
- Abstract summary: We provide optimal high probability risk bounds through a framework that surpasses the limitations of uniform convergence arguments.
Our framework converts the leave-one-out error of permutation invariant predictors into high probability risk bounds.
Specifically, our result shows that certain aggregations of one-inclusion graph algorithms are optimal.
- Score: 11.125968799758436
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In statistical learning theory, determining the sample complexity of
realizable binary classification for VC classes was a long-standing open
problem. The results of Simon and Hanneke established sharp upper bounds in
this setting. However, the reliance of their argument on the uniform
convergence principle limits its applicability to more general learning
settings such as multiclass classification. In this paper, we address this
issue by providing optimal high probability risk bounds through a framework
that surpasses the limitations of uniform convergence arguments.
Our framework converts the leave-one-out error of permutation invariant
predictors into high probability risk bounds. As an application, by adapting
the one-inclusion graph algorithm of Haussler, Littlestone, and Warmuth, we
propose an algorithm that achieves an optimal PAC bound for binary
classification. Specifically, our result shows that certain aggregations of
one-inclusion graph algorithms are optimal, addressing a variant of a classic
question posed by Warmuth.
We further instantiate our framework in three settings where uniform
convergence is provably suboptimal. For multiclass classification, we prove an
optimal risk bound that scales with the one-inclusion hypergraph density of the
class, addressing the suboptimality of the analysis of Daniely and
Shalev-Shwartz. For partial hypothesis classification, we determine the optimal
sample complexity bound, resolving a question posed by Alon, Hanneke, Holzman,
and Moran. For realizable bounded regression with absolute loss, we derive an
optimal risk bound that relies on a modified version of the scale-sensitive
dimension, refining the results of Bartlett and Long. Our rates surpass
standard uniform convergence-based results due to the smaller complexity
measure in our risk bound.
Related papers
- Optimal Multi-Distribution Learning [88.3008613028333]
Multi-distribution learning seeks to learn a shared model that minimizes the worst-case risk across $k$ distinct data distributions.
We propose a novel algorithm that yields an varepsilon-optimal randomized hypothesis with a sample complexity on the order of (d+k)/varepsilon2.
arXiv Detail & Related papers (2023-12-08T16:06:29Z) - High-Probability Bounds for Stochastic Optimization and Variational
Inequalities: the Case of Unbounded Variance [59.211456992422136]
We propose algorithms with high-probability convergence results under less restrictive assumptions.
These results justify the usage of the considered methods for solving problems that do not fit standard functional classes in optimization.
arXiv Detail & Related papers (2023-02-02T10:37:23Z) - The One-Inclusion Graph Algorithm is not Always Optimal [11.125968799758436]
We provide an in-expectation optimal one-inclusion graph algorithm for Vapnik-Chervonenkis class.
We show that the same poor high-probability performance is inherited by several recent prediction strategies.
arXiv Detail & Related papers (2022-12-19T06:49:39Z) - Optimal Algorithms for Stochastic Complementary Composite Minimization [55.26935605535377]
Inspired by regularization techniques in statistics and machine learning, we study complementary composite minimization.
We provide novel excess risk bounds, both in expectation and with high probability.
Our algorithms are nearly optimal, which we prove via novel lower complexity bounds for this class of problems.
arXiv Detail & Related papers (2022-11-03T12:40:24Z) - Sharp global convergence guarantees for iterative nonconvex
optimization: A Gaussian process perspective [30.524043513721168]
We develop a general recipe for analyzing the convergence of iterative algorithms for a class of regression models.
deterministicly, we accurately capture both the convergence rate of the algorithm and the eventual error floor in the finite-sample regime.
We show sharp convergence rates for both higher-order algorithms based on alternating updates and first-order algorithms based on subgradient subgradients.
arXiv Detail & Related papers (2021-09-20T21:48:19Z) - High Probability Complexity Bounds for Non-Smooth Stochastic Optimization with Heavy-Tailed Noise [51.31435087414348]
It is essential to theoretically guarantee that algorithms provide small objective residual with high probability.
Existing methods for non-smooth convex optimization have complexity bounds with dependence on confidence level.
We propose novel stepsize rules for two methods with gradient clipping.
arXiv Detail & Related papers (2021-06-10T17:54:21Z) - Improved Algorithms for Agnostic Pool-based Active Classification [20.12178157010804]
We consider active learning for binary classification in the agnostic pool-based setting.
Our algorithm is superior to state of the art active learning algorithms on image classification datasets.
arXiv Detail & Related papers (2021-05-13T18:24:30Z) - Adaptive Sampling for Best Policy Identification in Markov Decision
Processes [79.4957965474334]
We investigate the problem of best-policy identification in discounted Markov Decision (MDPs) when the learner has access to a generative model.
The advantages of state-of-the-art algorithms are discussed and illustrated.
arXiv Detail & Related papers (2020-09-28T15:22:24Z) - An Empirical Process Approach to the Union Bound: Practical Algorithms
for Combinatorial and Linear Bandits [34.06611065493047]
This paper proposes near-optimal algorithms for the pure-exploration linear bandit problem in the fixed confidence and fixed budget settings.
We provide an algorithm whose sample complexity scales with the geometry of the instance and avoids an explicit union bound over the number of arms.
We also propose the first algorithm for linear bandits in the the fixed budget setting.
arXiv Detail & Related papers (2020-06-21T00:56:33Z) - The Simulator: Understanding Adaptive Sampling in the
Moderate-Confidence Regime [52.38455827779212]
We propose a novel technique for analyzing adaptive sampling called the em Simulator.
We prove the first instance-based lower bounds the top-k problem which incorporate the appropriate log-factors.
Our new analysis inspires a simple and near-optimal for the best-arm and top-k identification, the first em practical of its kind for the latter problem.
arXiv Detail & Related papers (2017-02-16T23:42:02Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.