Robust Contextual Linear Bandits
- URL:
- Date: Wed, 26 Oct 2022 05:18:09 GMT
- Title: Robust Contextual Linear Bandits
- Authors: Rong Zhu and Branislav Kveton
- Abstract summary: This paper studies a common form of misspecification, an inter-arm heterogeneity that is not captured by context.
We develop two efficient bandit algorithms for our setting: a UCB algorithm called RoLinUCB and a posterior-sampling algorithm called RoLinTS.
- Score: 19.85979744859435
- License:
- Abstract: Model misspecification is a major consideration in applications of
statistical methods and machine learning. However, it is often neglected in
contextual bandits. This paper studies a common form of misspecification, an
inter-arm heterogeneity that is not captured by context. To address this issue,
we assume that the heterogeneity arises due to arm-specific random variables,
which can be learned. We call this setting a robust contextual bandit. The
arm-specific variables explain the unknown inter-arm heterogeneity, and we
incorporate them in the robust contextual estimator of the mean reward and its
uncertainty. We develop two efficient bandit algorithms for our setting: a UCB
algorithm called RoLinUCB and a posterior-sampling algorithm called RoLinTS. We
analyze both algorithms and bound their $n$-round Bayes regret. Our experiments
show that RoLinTS is comparably statistically efficient to the classic methods
when the misspecification is low, more robust when the misspecification is
high, and significantly more computationally efficient than its naive
Related papers
- Best Arm Identification with Fixed Budget: A Large Deviation Perspective [54.305323903582845]
We present sred, a truly adaptive algorithm that can reject arms in it any round based on the observed empirical gaps between the rewards of various arms.
In particular, we present sred, a truly adaptive algorithm that can reject arms in it any round based on the observed empirical gaps between the rewards of various arms.
arXiv Detail & Related papers (2023-12-19T13:17:43Z) - Variance-Aware Regret Bounds for Stochastic Contextual Dueling Bandits [53.281230333364505]
This paper studies the problem of contextual dueling bandits, where the binary comparison of dueling arms is generated from a generalized linear model (GLM)
We propose a new SupLinUCB-type algorithm that enjoys computational efficiency and a variance-aware regret bound $tilde Obig(dsqrtsum_t=1Tsigma_t2 + dbig)$.
Our regret bound naturally aligns with the intuitive expectation in scenarios where the comparison is deterministic, the algorithm only suffers from an $tilde O(d)$ regret.
arXiv Detail & Related papers (2023-10-02T08:15:52Z) - Dual Instrumental Method for Confounded Kernelized Bandits [0.0]
The contextual bandit problem is a framework with wide applications in various fields.
We propose a confounded bandit problem where the noise becomes a latent confounder that affects both contexts and rewards.
We show that a dual instrumental variable regression can correctly identify the true reward function.
arXiv Detail & Related papers (2022-09-07T15:25:57Z) - Mean-based Best Arm Identification in Stochastic Bandits under Reward
Contamination [80.53485617514707]
This paper proposes two algorithms, a gap-based algorithm and one based on the successive elimination, for best arm identification in sub-Gaussian bandits.
Specifically, for the gap-based algorithm, the sample complexity is optimal up to constant factors, while for the successive elimination, it is optimal up to logarithmic factors.
arXiv Detail & Related papers (2021-11-14T21:49:58Z) - Statistically Robust, Risk-Averse Best Arm Identification in Multi-Armed
Bandits [4.760079434948198]
We show that specialized algorithms that exploit such parametric information are prone to inconsistent learning performance when the parameter is misspecified.
Our key contributions are: (i) We establish fundamental performance limits of statistically robust MAB algorithms under the fixed-budget pure exploration setting, and (ii) We propose two classes of algorithms that are twofoldly near-optimal.
arXiv Detail & Related papers (2020-08-28T13:43:12Z) - Optimal Best-arm Identification in Linear Bandits [79.3239137440876]
We devise a simple algorithm whose sampling complexity matches known instance-specific lower bounds.
Unlike existing best-arm identification strategies, our algorithm uses a stopping rule that does not depend on the number of arms.
arXiv Detail & Related papers (2020-06-29T14:25:51Z) - Robustness Guarantees for Mode Estimation with an Application to Bandits [131.21717367564963]
We introduce a theory for multi-armed bandits where the values are the modes of the reward distributions instead of the mean.
We show in simulations that our algorithms are robust to perturbation of the arms by adversarial noise sequences.
arXiv Detail & Related papers (2020-03-05T21:29:27Z) - Thompson Sampling Algorithms for Mean-Variance Bandits [97.43678751629189]
We develop Thompson Sampling-style algorithms for mean-variance MAB.
We also provide comprehensive regret analyses for Gaussian and Bernoulli bandits.
Our algorithms significantly outperform existing LCB-based algorithms for all risk tolerances.
arXiv Detail & Related papers (2020-02-01T15:33:50Z) - Bandit algorithms to emulate human decision making using probabilistic
distortions [20.422725678982726]
We formulate two sample multi-armed bandit problems with distorted probabilities on the reward distributions.
We consider the aforementioned problems in the regret minimization as well as best arm identification framework for multi-armed bandits.
arXiv Detail & Related papers (2016-11-30T17:37:51Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.