Fast Rates for Nonparametric Online Learning: From Realizability to
Learning in Games
- URL: http://arxiv.org/abs/2111.08911v1
- Date: Wed, 17 Nov 2021 05:24:21 GMT
- Title: Fast Rates for Nonparametric Online Learning: From Realizability to
Learning in Games
- Authors: Constantinos Daskalakis and Noah Golowich
- Abstract summary: We propose a proper learning algorithm which gets a near-optimal mistake bound in terms of the sequential fat-shattering dimension of the hypothesis class.
This result answers a question as to whether proper learners could achieve near-optimal mistake bounds.
For the real-valued (regression) setting, the optimal mistake bound was not even known for improper learners.
- Score: 36.969021834291745
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We study fast rates of convergence in the setting of nonparametric online
regression, namely where regret is defined with respect to an arbitrary
function class which has bounded complexity. Our contributions are two-fold:
- In the realizable setting of nonparametric online regression with the
absolute loss, we propose a randomized proper learning algorithm which gets a
near-optimal mistake bound in terms of the sequential fat-shattering dimension
of the hypothesis class. In the setting of online classification with a class
of Littlestone dimension $d$, our bound reduces to $d \cdot {\rm poly} \log T$.
This result answers a question as to whether proper learners could achieve
near-optimal mistake bounds; previously, even for online classification, the
best known mistake bound was $\tilde O( \sqrt{dT})$. Further, for the
real-valued (regression) setting, the optimal mistake bound was not even known
for improper learners, prior to this work.
- Using the above result, we exhibit an independent learning algorithm for
general-sum binary games of Littlestone dimension $d$, for which each player
achieves regret $\tilde O(d^{3/4} \cdot T^{1/4})$. This result generalizes
analogous results of Syrgkanis et al. (2015) who showed that in finite games
the optimal regret can be accelerated from $O(\sqrt{T})$ in the adversarial
setting to $O(T^{1/4})$ in the game setting.
To establish the above results, we introduce several new techniques,
including: a hierarchical aggregation rule to achieve the optimal mistake bound
for real-valued classes, a multi-scale extension of the proper online
realizable learner of Hanneke et al. (2021), an approach to show that the
output of such nonparametric learning algorithms is stable, and a proof that
the minimax theorem holds in all online learnable games.
Related papers
- Model Selection for Average Reward RL with Application to Utility Maximization in Repeated Games [1.9161424760743742]
We propose an online model selection algorithm for the average reward RL setting.
We show that the additional cost of model selection scales only linearly in $M$, the number of model classes.
We also show that the exponential dependency on $m*$ is inevitable by proving a lower bound on the learner's regret.
arXiv Detail & Related papers (2024-11-09T05:03:10Z) - Agnostic Smoothed Online Learning [5.167069404528051]
We propose an algorithm to guarantee sublinear regret for smoothed online learning without prior knowledge of $mu$.
R-Cover has adaptive regret $tilde O(sqrtdT/sigma)$ for function classes with dimension $d$, which is optimal up to logarithmic factors.
arXiv Detail & Related papers (2024-10-07T15:25:21Z) - Rate-Preserving Reductions for Blackwell Approachability [72.03309261614991]
Abernethy et al. (2011) showed that Blackwell approachability and no-regret learning are equivalent.
We show that it is possible to tightly reduce any approachability instance to an instance of a generalized form of regret minimization.
arXiv Detail & Related papers (2024-06-10T23:23:52Z) - The Best of Both Worlds: Reinforcement Learning with Logarithmic Regret
and Policy Switches [84.54669549718075]
We study the problem of regret minimization for episodic Reinforcement Learning (RL)
We focus on learning with general function classes and general model classes.
We show that a logarithmic regret bound is realizable by algorithms with $O(log T)$ switching cost.
arXiv Detail & Related papers (2022-03-03T02:55:55Z) - Smoothed Online Learning is as Easy as Statistical Learning [77.00766067963195]
We provide the first oracle-efficient, no-regret algorithms in this setting.
We show that if a function class is learnable in the classical setting, then there is an oracle-efficient, no-regret algorithm for contextual bandits.
arXiv Detail & Related papers (2022-02-09T19:22:34Z) - Doubly Optimal No-Regret Online Learning in Strongly Monotone Games with Bandit Feedback [29.553652241608997]
We study the class of textitsmooth and strongly monotone games and study optimal no-regret learning therein.
We first construct a new bandit learning algorithm and show that it achieves the single-agent optimal regret of $tildeTheta(nsqrtT)$.
Our results thus settle this open problem and contribute to the broad landscape of bandit game-theoretical learning.
arXiv Detail & Related papers (2021-12-06T08:27:54Z) - Towards an Understanding of Benign Overfitting in Neural Networks [104.2956323934544]
Modern machine learning models often employ a huge number of parameters and are typically optimized to have zero training loss.
We examine how these benign overfitting phenomena occur in a two-layer neural network setting.
We show that it is possible for the two-layer ReLU network interpolator to achieve a near minimax-optimal learning rate.
arXiv Detail & Related papers (2021-06-06T19:08:53Z) - Localization, Convexity, and Star Aggregation [0.0]
Offset Rademacher complexities have been shown to imply sharp, linear-dependent upper bounds for the square loss.
We show that in the statistical setting, the offset bound can be generalized to any loss satisfying certain uniform convexity.
arXiv Detail & Related papers (2021-05-19T00:47:59Z) - Optimal Dynamic Regret in Exp-Concave Online Learning [28.62891856368132]
We consider the problem of the Zinkevich (2003)-style dynamic regret minimization in online learning with exp-contrivial losses.
We show that whenever improper learning is allowed, a Strongly Adaptive online learner achieves the dynamic regret of $tilde O(d3.5n1/3C_n2/3 vee dlog n)$ where $C_n$ is the total variation (a.k.a. path length) of the an arbitrary sequence of comparators that may not be known to the learner ahead of time.
arXiv Detail & Related papers (2021-04-23T21:36:51Z) - Online Learning with Imperfect Hints [72.4277628722419]
We develop algorithms and nearly matching lower bounds for online learning with imperfect directional hints.
Our algorithms are oblivious to the quality of the hints, and the regret bounds interpolate between the always-correlated hints case and the no-hints case.
arXiv Detail & Related papers (2020-02-11T23:06:09Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.