Expected Worst Case Regret via Stochastic Sequential Covering
- URL: http://arxiv.org/abs/2209.04417v1
- Date: Fri, 9 Sep 2022 17:31:46 GMT
- Title: Expected Worst Case Regret via Stochastic Sequential Covering
- Authors: Changlong Wu, Mohsen Heidari, Ananth Grama, Wojciech Szpankowski
- Abstract summary: We introduce a notion of expected worst case minimax regret that generalizes and encompasses prior known minimax regrets.
For such minimax regrets we establish tight bounds via a novel concept of upper global sequential covering.
We demonstrate the effectiveness of our approach by establishing tight bounds on the expected worst case minimax regrets for logarithmic loss and general mixable losses.
- Score: 14.834625066344582
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We study the problem of sequential prediction and online minimax regret with
stochastically generated features under a general loss function. We introduce a
notion of expected worst case minimax regret that generalizes and encompasses
prior known minimax regrets. For such minimax regrets we establish tight upper
bounds via a novel concept of stochastic global sequential covering. We show
that for a hypothesis class of VC-dimension $\mathsf{VC}$ and $i.i.d.$
generated features of length $T$, the cardinality of the stochastic global
sequential covering can be upper bounded with high probability (whp) by
$e^{O(\mathsf{VC} \cdot \log^2 T)}$. We then improve this bound by introducing
a new complexity measure called the Star-Littlestone dimension, and show that
classes with Star-Littlestone dimension $\mathsf{SL}$ admit a stochastic global
sequential covering of order $e^{O(\mathsf{SL} \cdot \log T)}$. We further
establish upper bounds for real valued classes with finite fat-shattering
numbers. Finally, by applying information-theoretic tools of the fixed design
minimax regrets, we provide lower bounds for the expected worst case minimax
regret. We demonstrate the effectiveness of our approach by establishing tight
bounds on the expected worst case minimax regrets for logarithmic loss and
general mixable losses.
Related papers
- Rate-Preserving Reductions for Blackwell Approachability [72.03309261614991]
Abernethy et al. (2011) showed that Blackwell approachability and no-regret learning are equivalent.
We show that it is possible to tightly reduce any approachability instance to an instance of a generalized form of regret minimization.
arXiv Detail & Related papers (2024-06-10T23:23:52Z) - Nearly Minimax Optimal Regret for Learning Linear Mixture Stochastic
Shortest Path [80.60592344361073]
We study the Shortest Path (SSP) problem with a linear mixture transition kernel.
An agent repeatedly interacts with a environment and seeks to reach certain goal state while minimizing the cumulative cost.
Existing works often assume a strictly positive lower bound of the iteration cost function or an upper bound of the expected length for the optimal policy.
arXiv Detail & Related papers (2024-02-14T07:52:00Z) - $(\epsilon, u)$-Adaptive Regret Minimization in Heavy-Tailed Bandits [29.966828248335972]
We study the regret minimization problem when $epsilon$ and $u$ are unknown to the learner.
AdaR-UCB is the first algorithm that enjoys regret guarantees nearly matching those of the non-adaptive heavy-tailed case.
arXiv Detail & Related papers (2023-10-04T17:11:15Z) - Settling the Sample Complexity of Online Reinforcement Learning [92.02082223856479]
We show how to achieve minimax-optimal regret without incurring any burn-in cost.
We extend our theory to unveil the influences of problem-dependent quantities like the optimal value/cost and certain variances.
arXiv Detail & Related papers (2023-07-25T15:42:11Z) - Lower Bounds for Learning in Revealing POMDPs [88.23337313766355]
This paper studies the fundamental limits of reinforcement learning (RL) in the challenging emphpartially observable setting.
For emphmulti-step revealing POMDPs, we show that the latent state-space dependence is at least $Omega(S1.5)$ in the sample complexity.
arXiv Detail & Related papers (2023-02-02T18:59:30Z) - Online Learning in Dynamically Changing Environments [11.731001328350983]
We study the problem of online learning and online regret when samples are drawn from a general unknown non-stationary process.
We prove a tight (upto $sqrtlog T$ factor) bound $O(sqrtKTcdotmathsfVC(mathcalH)log T)$ for the expected worst case regret of any finite VC-dimensional class.
We extend these results to general smooth adversary processes with unknown reference measure.
arXiv Detail & Related papers (2023-01-31T21:10:03Z) - Generalized Bandit Regret Minimizer Framework in Imperfect Information
Extensive-Form Game [9.933208900617174]
We consider the problem in the interactive bandit-feedback setting where we don't know the dynamics of the IIEG.
To learn NE, the regret minimizer is required to estimate the full-feedback loss gradient $ellt$ by $v(zt)$ and minimize the regret.
We present a novel method SIX-OMD to learn approximate NE. It is model-free and extremely improves the best existing convergence rate from the order of $O(sqrtX B/T+sqrtY C/T)$ to $O(
arXiv Detail & Related papers (2022-03-11T13:45:42Z) - The Best of Both Worlds: Reinforcement Learning with Logarithmic Regret
and Policy Switches [84.54669549718075]
We study the problem of regret minimization for episodic Reinforcement Learning (RL)
We focus on learning with general function classes and general model classes.
We show that a logarithmic regret bound is realizable by algorithms with $O(log T)$ switching cost.
arXiv Detail & Related papers (2022-03-03T02:55:55Z) - First-Order Regret in Reinforcement Learning with Linear Function
Approximation: A Robust Estimation Approach [57.570201404222935]
We show that it is possible to obtain regret scaling as $mathcalO(sqrtV_1star K)$ in reinforcement learning with large state spaces.
We demonstrate that existing techniques based on at least squares estimation are insufficient to obtain this result.
arXiv Detail & Related papers (2021-12-07T00:29:57Z) - Tight Bounds on Minimax Regret under Logarithmic Loss via
Self-Concordance [37.0414602993676]
We show that for any expert class with (sequential) metric entropy $mathcalO(gamma-p)$ at scale $gamma$, the minimax regret is $mathcalO(np/(p+1))$.
As an application of our techniques, we resolve the minimax regret for nonparametric Lipschitz classes of experts.
arXiv Detail & Related papers (2020-07-02T14:47:33Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.