Related papers: The One-Inclusion Graph Algorithm is not Always Optimal

The One-Inclusion Graph Algorithm is not Always Optimal

URL: http://arxiv.org/abs/2212.09270v1
Date: Mon, 19 Dec 2022 06:49:39 GMT
Title: The One-Inclusion Graph Algorithm is not Always Optimal
Authors: Ishaq Aden-Ali, Yeshwanth Cherapanamjeri, Abhishek Shetty, Nikita Zhivotovskiy
Abstract summary: We provide an in-expectation optimal one-inclusion graph algorithm for Vapnik-Chervonenkis class. We show that the same poor high-probability performance is inherited by several recent prediction strategies.
Score: 11.125968799758436
License: http://creativecommons.org/licenses/by/4.0/
Abstract: The one-inclusion graph algorithm of Haussler, Littlestone, and Warmuth achieves an optimal in-expectation risk bound in the standard PAC classification setup. In one of the first COLT open problems, Warmuth conjectured that this prediction strategy always implies an optimal high probability bound on the risk, and hence is also an optimal PAC algorithm. We refute this conjecture in the strongest sense: for any practically interesting Vapnik-Chervonenkis class, we provide an in-expectation optimal one-inclusion graph algorithm whose high probability risk bound cannot go beyond that implied by Markov's inequality. Our construction of these poorly performing one-inclusion graph algorithms uses Varshamov-Tenengolts error correcting codes. Our negative result has several implications. First, it shows that the same poor high-probability performance is inherited by several recent prediction strategies based on generalizations of the one-inclusion graph algorithm. Second, our analysis shows yet another statistical problem that enjoys an estimator that is provably optimal in expectation via a leave-one-out argument, but fails in the high-probability regime. This discrepancy occurs despite the boundedness of the binary loss for which arguments based on concentration inequalities often provide sharp high probability risk bounds.

Related papers

Best Arm Identification with Fixed Budget: A Large Deviation Perspective [54.305323903582845]
We present sred, a truly adaptive algorithm that can reject arms in it any round based on the observed empirical gaps between the rewards of various arms. In particular, we present sred, a truly adaptive algorithm that can reject arms in it any round based on the observed empirical gaps between the rewards of various arms.
arXiv Detail & Related papers (2023-12-19T13:17:43Z)
Optimality in Mean Estimation: Beyond Worst-Case, Beyond Sub-Gaussian, and Beyond $1+\alpha$ Moments [10.889739958035536]
We introduce a new definitional framework to analyze the fine-grained optimality of algorithms. We show that median-of-means is neighborhood optimal, up to constant factors. It is open to find a neighborhood-separated estimator without constant factor slackness.
arXiv Detail & Related papers (2023-11-21T18:50:38Z)
Zero-Regret Performative Prediction Under Inequality Constraints [5.513958040574729]
This paper studies performative prediction under inequality constraints. We develop a robust primal-dual framework that requires only approximate up to a certain accuracy. We then propose an adaptive primal-dual algorithm for location families.
arXiv Detail & Related papers (2023-09-22T04:54:26Z)
Optimal PAC Bounds Without Uniform Convergence [11.125968799758436]
We provide optimal high probability risk bounds through a framework that surpasses the limitations of uniform convergence arguments. Our framework converts the leave-one-out error of permutation invariant predictors into high probability risk bounds. Specifically, our result shows that certain aggregations of one-inclusion graph algorithms are optimal.
arXiv Detail & Related papers (2023-04-18T17:57:31Z)
Optimal Fixed-Budget Best Arm Identification using the Augmented Inverse Probability Estimator in Two-Armed Gaussian Bandits with Unknown Variances [27.122181278234617]
We consider the fixed-budget best arm identification problem in two-armed Gaussian bandits with unknown variances. We propose a strategy comprising a sampling rule with randomized sampling (RS) following the estimated target allocation probabilities of arm draws. We show that the proposed strategy is agnostically optimal when the sample size becomes infinitely large and the gap between the two arms goes to zero.
arXiv Detail & Related papers (2022-01-12T13:38:33Z)
High Probability Complexity Bounds for Non-Smooth Stochastic Optimization with Heavy-Tailed Noise [51.31435087414348]
It is essential to theoretically guarantee that algorithms provide small objective residual with high probability. Existing methods for non-smooth convex optimization have complexity bounds with dependence on confidence level. We propose novel stepsize rules for two methods with gradient clipping.
arXiv Detail & Related papers (2021-06-10T17:54:21Z)
Recent Theoretical Advances in Non-Convex Optimization [56.88981258425256]
Motivated by recent increased interest in analysis of optimization algorithms for non- optimization in deep networks and other problems in data, we give an overview of recent results of theoretical optimization algorithms for non- optimization.
arXiv Detail & Related papers (2020-12-11T08:28:51Z)
An Asymptotically Optimal Primal-Dual Incremental Algorithm for Contextual Linear Bandits [129.1029690825929]
We introduce a novel algorithm improving over the state-of-the-art along multiple dimensions. We establish minimax optimality for any learning horizon in the special case of non-contextual linear bandits.
arXiv Detail & Related papers (2020-10-23T09:12:47Z)
ROOT-SGD: Sharp Nonasymptotics and Near-Optimal Asymptotics in a Single Algorithm [71.13558000599839]
We study the problem of solving strongly convex and smooth unconstrained optimization problems using first-order algorithms. We devise a novel, referred to as Recursive One-Over-T SGD, based on an easily implementable, averaging of past gradients. We prove that it simultaneously achieves state-of-the-art performance in both a finite-sample, nonasymptotic sense and an sense.
arXiv Detail & Related papers (2020-08-28T14:46:56Z)
Learning Minimax Estimators via Online Learning [55.92459567732491]
We consider the problem of designing minimax estimators for estimating parameters of a probability distribution. We construct an algorithm for finding a mixed-case Nash equilibrium.
arXiv Detail & Related papers (2020-06-19T22:49:42Z)
The estimation error of general first order methods [12.472245917779754]
We consider two families of estimation problems: high-dimensional regression and low-dimensional matrix estimation. We derive lower bounds the error that hold in the high-dimensional optimals in which both the number of observations and the number of parameters diverge. These lower bounds sense that there exist algorithms whose estimation error matches the lower bounds up to sparseally negligible terms.
arXiv Detail & Related papers (2020-02-28T18:13:47Z)

This list is automatically generated from the titles and abstracts of the papers in this site.