Beyond UCB: Optimal and Efficient Contextual Bandits with Regression
Oracles
- URL: http://arxiv.org/abs/2002.04926v2
- Date: Tue, 23 Jun 2020 10:44:25 GMT
- Title: Beyond UCB: Optimal and Efficient Contextual Bandits with Regression
Oracles
- Authors: Dylan J. Foster and Alexander Rakhlin
- Abstract summary: We provide the first universal and optimal reduction from contextual bandits to online regression.
Our algorithm requires no distributional assumptions beyond realizability, and works even when contexts are chosen adversarially.
- Score: 112.89548995091182
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: A fundamental challenge in contextual bandits is to develop flexible,
general-purpose algorithms with computational requirements no worse than
classical supervised learning tasks such as classification and regression.
Algorithms based on regression have shown promising empirical success, but
theoretical guarantees have remained elusive except in special cases. We
provide the first universal and optimal reduction from contextual bandits to
online regression. We show how to transform any oracle for online regression
with a given value function class into an algorithm for contextual bandits with
the induced policy class, with no overhead in runtime or memory requirements.
We characterize the minimax rates for contextual bandits with general,
potentially nonparametric function classes, and show that our algorithm is
minimax optimal whenever the oracle obtains the optimal rate for regression.
Compared to previous results, our algorithm requires no distributional
assumptions beyond realizability, and works even when contexts are chosen
adversarially.
Related papers
- Optimal Contextual Bandits with Knapsacks under Realizibility via
Regression Oracles [14.634964681825197]
We study the contextual bandit with knapsacks (CBwK) problem, where each action leads to a random reward but also costs a random resource consumption in a vector form.
We propose the first universal and optimal algorithmic framework for CBwK by reducing it to online regression.
arXiv Detail & Related papers (2022-10-21T09:28:53Z) - Smoothed Online Learning is as Easy as Statistical Learning [77.00766067963195]
We provide the first oracle-efficient, no-regret algorithms in this setting.
We show that if a function class is learnable in the classical setting, then there is an oracle-efficient, no-regret algorithm for contextual bandits.
arXiv Detail & Related papers (2022-02-09T19:22:34Z) - Efficient and Optimal Algorithms for Contextual Dueling Bandits under
Realizability [59.81339109121384]
We study the $K$ contextual dueling bandit problem, a sequential decision making setting in which the learner uses contextual information to make two decisions, but only observes emphpreference-based feedback suggesting that one decision was better than the other.
We provide a new algorithm that achieves the optimal regret rate for a new notion of best response regret, which is a strictly stronger performance measure than those considered in prior works.
arXiv Detail & Related papers (2021-11-24T07:14:57Z) - Adapting to Misspecification in Contextual Bandits [82.55565343668246]
We introduce a new family of oracle-efficient algorithms for $varepsilon$-misspecified contextual bandits.
We obtain the first algorithm that achieves the optimal $O(dsqrtT + varepsilonsqrtdT)$ regret bound for unknown misspecification level.
arXiv Detail & Related papers (2021-07-12T21:30:41Z) - Efficient First-Order Contextual Bandits: Prediction, Allocation, and
Triangular Discrimination [82.52105963476703]
A recurring theme in statistical learning, online learning, and beyond is that faster convergence rates are possible for problems with low noise.
First-order guarantees are relatively well understood in statistical and online learning.
We show that the logarithmic loss and an information-theoretic quantity called the triangular discrimination play a fundamental role in obtaining first-order guarantees.
arXiv Detail & Related papers (2021-07-05T19:20:34Z) - Adapting to misspecification in contextual bandits with offline
regression oracles [7.312170216336086]
We propose a family of contextual bandit algorithms that adapt to misspecification error by reverting to a good safe policy.
Our algorithm requires only an offline regression oracle to ensure regret guarantees that gracefully degrade in terms of a measure of the average misspecification level.
arXiv Detail & Related papers (2021-02-26T00:15:04Z) - Fast OSCAR and OWL Regression via Safe Screening Rules [97.28167655721766]
Ordered $L_1$ (OWL) regularized regression is a new regression analysis for high-dimensional sparse learning.
Proximal gradient methods are used as standard approaches to solve OWL regression.
We propose the first safe screening rule for OWL regression by exploring the order of the primal solution with the unknown order structure.
arXiv Detail & Related papers (2020-06-29T23:35:53Z) - Bypassing the Monster: A Faster and Simpler Optimal Algorithm for
Contextual Bandits under Realizability [18.45278329799526]
We design a fast and simple algorithm that achieves the statistically optimal regret with only $O(log T)$ calls to an offline regression oracle.
Our results provide the first universal and optimal reduction from contextual bandits to offline regression.
arXiv Detail & Related papers (2020-03-28T04:16:52Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.