Bias no more: high-probability data-dependent regret bounds for
adversarial bandits and MDPs
- URL: http://arxiv.org/abs/2006.08040v2
- Date: Thu, 29 Oct 2020 22:40:07 GMT
- Title: Bias no more: high-probability data-dependent regret bounds for
adversarial bandits and MDPs
- Authors: Chung-Wei Lee, Haipeng Luo, Chen-Yu Wei, Mengxiao Zhang
- Abstract summary: We develop a new approach to obtaining high probability regret bounds for online learning with bandit feedback against an adaptive adversary.
Our approach relies on a simple increasing learning rate schedule, together with the help of logarithmically homogeneous self-concordant barriers and a strengthened Freedman's inequality.
- Score: 48.44657553192801
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We develop a new approach to obtaining high probability regret bounds for
online learning with bandit feedback against an adaptive adversary. While
existing approaches all require carefully constructing optimistic and biased
loss estimators, our approach uses standard unbiased estimators and relies on a
simple increasing learning rate schedule, together with the help of
logarithmically homogeneous self-concordant barriers and a strengthened
Freedman's inequality.
Besides its simplicity, our approach enjoys several advantages. First, the
obtained high-probability regret bounds are data-dependent and could be much
smaller than the worst-case bounds, which resolves an open problem asked by Neu
(2015). Second, resolving another open problem of Bartlett et al. (2008) and
Abernethy and Rakhlin (2009), our approach leads to the first general and
efficient algorithm with a high-probability regret bound for adversarial linear
bandits, while previous methods are either inefficient or only applicable to
specific action sets. Finally, our approach can also be applied to learning
adversarial Markov Decision Processes and provides the first algorithm with a
high-probability small-loss bound for this problem.
Related papers
- Agnostic Multi-Robust Learning Using ERM [19.313739782029185]
A fundamental problem in robust learning is asymmetry: a learner needs to correctly classify every one of exponentially-many perturbations that an adversary might make to a test-time natural example.
In contrast, the attacker only needs to find one successful perturbation.
We introduce a novel multi-group setting and introduce a novel multi-robust learning problem.
arXiv Detail & Related papers (2023-03-15T21:30:14Z) - Minimax Optimal Quantile and Semi-Adversarial Regret via
Root-Logarithmic Regularizers [31.102181563065844]
Quantile (and, more generally, KL) regret bounds relax the goal of competing against the best individual expert to only competing against a majority of experts on adversarial data.
More recently, the semi-adversarial paradigm (Bilodeau, Negrea, and Roy 2020) provides an alternative relaxation of adversarial online learning by considering data that may be neither fully adversarial nor adversarial.
We achieve the minimax optimal regret in both paradigms using FTRL with separate, novel, root-logarithmic regularizers, both of which can be interpreted as yielding variants of NormalHedge.
arXiv Detail & Related papers (2021-10-27T22:38:52Z) - Efficient First-Order Contextual Bandits: Prediction, Allocation, and
Triangular Discrimination [82.52105963476703]
A recurring theme in statistical learning, online learning, and beyond is that faster convergence rates are possible for problems with low noise.
First-order guarantees are relatively well understood in statistical and online learning.
We show that the logarithmic loss and an information-theoretic quantity called the triangular discrimination play a fundamental role in obtaining first-order guarantees.
arXiv Detail & Related papers (2021-07-05T19:20:34Z) - Scalable Personalised Item Ranking through Parametric Density Estimation [53.44830012414444]
Learning from implicit feedback is challenging because of the difficult nature of the one-class problem.
Most conventional methods use a pairwise ranking approach and negative samplers to cope with the one-class problem.
We propose a learning-to-rank approach, which achieves convergence speed comparable to the pointwise counterpart.
arXiv Detail & Related papers (2021-05-11T03:38:16Z) - An Asymptotically Optimal Primal-Dual Incremental Algorithm for
Contextual Linear Bandits [129.1029690825929]
We introduce a novel algorithm improving over the state-of-the-art along multiple dimensions.
We establish minimax optimality for any learning horizon in the special case of non-contextual linear bandits.
arXiv Detail & Related papers (2020-10-23T09:12:47Z) - Online and Distribution-Free Robustness: Regression and Contextual
Bandits with Huber Contamination [29.85468294601847]
We revisit two classic high-dimensional online learning problems, namely linear regression and contextual bandits.
We show that our algorithms succeed where conventional methods fail.
arXiv Detail & Related papers (2020-10-08T17:59:05Z) - Instance-Dependent Complexity of Contextual Bandits and Reinforcement
Learning: A Disagreement-Based Perspective [104.67295710363679]
In the classical multi-armed bandit problem, instance-dependent algorithms attain improved performance on "easy" problems with a gap between the best and second-best arm.
We introduce a family of complexity measures that are both sufficient and necessary to obtain instance-dependent regret bounds.
We then introduce new oracle-efficient algorithms which adapt to the gap whenever possible, while also attaining the minimax rate in the worst case.
arXiv Detail & Related papers (2020-10-07T01:33:06Z) - On Lower Bounds for Standard and Robust Gaussian Process Bandit
Optimization [55.937424268654645]
We consider algorithm-independent lower bounds for the problem of black-box optimization of functions having a bounded norm.
We provide a novel proof technique for deriving lower bounds on the regret, with benefits including simplicity, versatility, and an improved dependence on the error probability.
arXiv Detail & Related papers (2020-08-20T03:48:14Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.