Minimax Optimal Quantile and Semi-Adversarial Regret via
Root-Logarithmic Regularizers
- URL: http://arxiv.org/abs/2110.14804v1
- Date: Wed, 27 Oct 2021 22:38:52 GMT
- Title: Minimax Optimal Quantile and Semi-Adversarial Regret via
Root-Logarithmic Regularizers
- Authors: Jeffrey Negrea, Blair Bilodeau, Nicol\`o Campolongo, Francesco
Orabona, Daniel M. Roy
- Abstract summary: Quantile (and, more generally, KL) regret bounds relax the goal of competing against the best individual expert to only competing against a majority of experts on adversarial data.
More recently, the semi-adversarial paradigm (Bilodeau, Negrea, and Roy 2020) provides an alternative relaxation of adversarial online learning by considering data that may be neither fully adversarial nor adversarial.
We achieve the minimax optimal regret in both paradigms using FTRL with separate, novel, root-logarithmic regularizers, both of which can be interpreted as yielding variants of NormalHedge.
- Score: 31.102181563065844
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Quantile (and, more generally, KL) regret bounds, such as those achieved by
NormalHedge (Chaudhuri, Freund, and Hsu 2009) and its variants, relax the goal
of competing against the best individual expert to only competing against a
majority of experts on adversarial data. More recently, the semi-adversarial
paradigm (Bilodeau, Negrea, and Roy 2020) provides an alternative relaxation of
adversarial online learning by considering data that may be neither fully
adversarial nor stochastic (i.i.d.). We achieve the minimax optimal regret in
both paradigms using FTRL with separate, novel, root-logarithmic regularizers,
both of which can be interpreted as yielding variants of NormalHedge. We extend
existing KL regret upper bounds, which hold uniformly over target
distributions, to possibly uncountable expert classes with arbitrary priors;
provide the first full-information lower bounds for quantile regret on finite
expert classes (which are tight); and provide an adaptively minimax optimal
algorithm for the semi-adversarial paradigm that adapts to the true, unknown
constraint faster, leading to uniformly improved regret bounds over existing
methods.
Related papers
- Best Arm Identification with Minimal Regret [55.831935724659175]
Best arm identification problem elegantly amalgamates regret minimization and BAI.
Agent's goal is to identify the best arm with a prescribed confidence level.
Double KL-UCB algorithm achieves optimality as the confidence level tends to zero.
arXiv Detail & Related papers (2024-09-27T16:46:02Z) - Beyond Primal-Dual Methods in Bandits with Stochastic and Adversarial Constraints [29.514323697659613]
We address a generalization of the bandit with knapsacks problem, where a learner aims to maximize rewards while satisfying an arbitrary set of long-term constraints.
Our goal is to design best-of-both-worlds algorithms that perform under both and adversarial constraints.
arXiv Detail & Related papers (2024-05-25T08:09:36Z) - Accelerated Rates between Stochastic and Adversarial Online Convex
Optimization [2.628557920905129]
We establish novel regret bounds for online convex optimization in a setting that interpolates between i.i.d. and fully adversarial losses.
In the fully i.i.d. case, our regret bounds match the rates one would expect from results in acceleration, and we also recover the optimalally accelerated rates via online-to-batch conversion.
arXiv Detail & Related papers (2023-03-06T16:41:57Z) - A Non-Asymptotic Moreau Envelope Theory for High-Dimensional Generalized
Linear Models [33.36787620121057]
We prove a new generalization bound that shows for any class of linear predictors in Gaussian space.
We use our finite-sample bound to directly recover the "optimistic rate" of Zhou et al. (2021)
We show that application of our bound generalization using localized Gaussian width will generally be sharp for empirical risk minimizers.
arXiv Detail & Related papers (2022-10-21T16:16:55Z) - A Unifying Framework for Online Optimization with Long-Term Constraints [62.35194099438855]
We study online learning problems in which a decision maker has to take a sequence of decisions subject to $m$ long-term constraints.
The goal is to maximize their total reward, while at the same time achieving small cumulative violation across the $T$ rounds.
We present the first best-of-both-world type algorithm for this general class problems, with no-regret guarantees both in the case in which rewards and constraints are selected according to an unknown model, and in the case in which they are selected at each round by an adversary.
arXiv Detail & Related papers (2022-09-15T16:59:19Z) - Algorithm for Constrained Markov Decision Process with Linear
Convergence [55.41644538483948]
An agent aims to maximize the expected accumulated discounted reward subject to multiple constraints on its costs.
A new dual approach is proposed with the integration of two ingredients: entropy regularized policy and Vaidya's dual.
The proposed approach is shown to converge (with linear rate) to the global optimum.
arXiv Detail & Related papers (2022-06-03T16:26:38Z) - On Lower Bounds for Standard and Robust Gaussian Process Bandit
Optimization [55.937424268654645]
We consider algorithm-independent lower bounds for the problem of black-box optimization of functions having a bounded norm.
We provide a novel proof technique for deriving lower bounds on the regret, with benefits including simplicity, versatility, and an improved dependence on the error probability.
arXiv Detail & Related papers (2020-08-20T03:48:14Z) - Relaxing the I.I.D. Assumption: Adaptively Minimax Optimal Regret via
Root-Entropic Regularization [16.536558038560695]
We consider prediction with expert advice when data are generated from varying arbitrarily within an unknown constraint set.
The Hedge algorithm was recently shown to be simultaneously minimax optimal for i.i.d. data.
We provide matching upper and lower bounds on the minimax regret at all levels, show that Hedge with deterministic learning rates is suboptimal outside of the extremes, and prove that one can adaptively obtain minimax regret at all levels.
arXiv Detail & Related papers (2020-07-13T17:54:34Z) - Adaptive Discretization for Adversarial Lipschitz Bandits [85.39106976861702]
Lipschitz bandits is a prominent version of multi-armed bandits that studies large, structured action spaces.
A central theme here is the adaptive discretization of the action space, which gradually zooms in'' on the more promising regions.
We provide the first algorithm for adaptive discretization in the adversarial version, and derive instance-dependent regret bounds.
arXiv Detail & Related papers (2020-06-22T16:06:25Z) - Bias no more: high-probability data-dependent regret bounds for
adversarial bandits and MDPs [48.44657553192801]
We develop a new approach to obtaining high probability regret bounds for online learning with bandit feedback against an adaptive adversary.
Our approach relies on a simple increasing learning rate schedule, together with the help of logarithmically homogeneous self-concordant barriers and a strengthened Freedman's inequality.
arXiv Detail & Related papers (2020-06-14T22:09:27Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.