Adapting to misspecification in contextual bandits with offline
regression oracles
- URL: http://arxiv.org/abs/2102.13240v1
- Date: Fri, 26 Feb 2021 00:15:04 GMT
- Title: Adapting to misspecification in contextual bandits with offline
regression oracles
- Authors: Sanath Kumar Krishnamurthy, Vitor Hadad, and Susan Athey
- Abstract summary: We propose a family of contextual bandit algorithms that adapt to misspecification error by reverting to a good safe policy.
Our algorithm requires only an offline regression oracle to ensure regret guarantees that gracefully degrade in terms of a measure of the average misspecification level.
- Score: 7.312170216336086
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Computationally efficient contextual bandits are often based on estimating a
predictive model of rewards given contexts and arms using past data. However,
when the reward model is not well-specified, the bandit algorithm may incur
unexpected regret, so recent work has focused on algorithms that are robust to
misspecification. We propose a simple family of contextual bandit algorithms
that adapt to misspecification error by reverting to a good safe policy when
there is evidence that misspecification is causing a regret increase. Our
algorithm requires only an offline regression oracle to ensure regret
guarantees that gracefully degrade in terms of a measure of the average
misspecification level. Compared to prior work, we attain similar regret
guarantees, but we do no rely on a master algorithm, and do not require more
robust oracles like online or constrained regression oracles (e.g., Foster et
al. (2020a); Krishnamurthy et al. (2020)). This allows us to design algorithms
for more general function approximation classes.
Related papers
- Variance-Dependent Regret Bounds for Non-stationary Linear Bandits [52.872628573907434]
We propose algorithms that utilize the variance of the reward distribution as well as the $B_K$, and show that they can achieve tighter regret upper bounds.
We introduce two novel algorithms: Restarted Weighted$textOFUL+$ and Restarted $textSAVE+$.
Notably, when the total variance $V_K$ is much smaller than $K$, our algorithms outperform previous state-of-the-art results on non-stationary linear bandits under different settings.
arXiv Detail & Related papers (2024-03-15T23:36:55Z) - Contextual Bandits with Smooth Regret: Efficient Learning in Continuous
Action Spaces [14.366265951396587]
We design efficient general-purpose contextual bandit algorithms for large -- or even continuous -- action spaces.
We propose a smooth regret notion for contextual bandits, which dominates previously proposed alternatives.
Our algorithms can be used to recover the previous minimax/Pareto optimal guarantees under the standard regret.
arXiv Detail & Related papers (2022-07-12T21:27:09Z) - Online Sign Identification: Minimization of the Number of Errors in
Thresholding Bandits [27.09804256642197]
We introduce a large family of algorithms inspired by the Frank-Wolfe algorithm.
We construct new explicit algorithms for a broad class of problems.
We explain this phenomenon on an insightful toy problem.
arXiv Detail & Related papers (2021-10-18T09:36:36Z) - Adapting to Misspecification in Contextual Bandits [82.55565343668246]
We introduce a new family of oracle-efficient algorithms for $varepsilon$-misspecified contextual bandits.
We obtain the first algorithm that achieves the optimal $O(dsqrtT + varepsilonsqrtdT)$ regret bound for unknown misspecification level.
arXiv Detail & Related papers (2021-07-12T21:30:41Z) - Efficient First-Order Contextual Bandits: Prediction, Allocation, and
Triangular Discrimination [82.52105963476703]
A recurring theme in statistical learning, online learning, and beyond is that faster convergence rates are possible for problems with low noise.
First-order guarantees are relatively well understood in statistical and online learning.
We show that the logarithmic loss and an information-theoretic quantity called the triangular discrimination play a fundamental role in obtaining first-order guarantees.
arXiv Detail & Related papers (2021-07-05T19:20:34Z) - Robust Stochastic Linear Contextual Bandits Under Adversarial Attacks [81.13338949407205]
Recent works show that optimal bandit algorithms are vulnerable to adversarial attacks and can fail completely in the presence of attacks.
Existing robust bandit algorithms only work for the non-contextual setting under the attack of rewards.
We provide the first robust bandit algorithm for linear contextual bandit setting under a fully adaptive and omniscient attack.
arXiv Detail & Related papers (2021-06-05T22:20:34Z) - Tractable contextual bandits beyond realizability [15.40239357726009]
We present a tractable bandit algorithm that is not sensitive to the realizability assumption.
Our algorithm ensures the same guarantees on regret achieved by realizability-based algorithms under realizability.
arXiv Detail & Related papers (2020-10-25T01:36:04Z) - Fast OSCAR and OWL Regression via Safe Screening Rules [97.28167655721766]
Ordered $L_1$ (OWL) regularized regression is a new regression analysis for high-dimensional sparse learning.
Proximal gradient methods are used as standard approaches to solve OWL regression.
We propose the first safe screening rule for OWL regression by exploring the order of the primal solution with the unknown order structure.
arXiv Detail & Related papers (2020-06-29T23:35:53Z) - Beyond UCB: Optimal and Efficient Contextual Bandits with Regression
Oracles [112.89548995091182]
We provide the first universal and optimal reduction from contextual bandits to online regression.
Our algorithm requires no distributional assumptions beyond realizability, and works even when contexts are chosen adversarially.
arXiv Detail & Related papers (2020-02-12T11:33:46Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.