Lifting the Information Ratio: An Information-Theoretic Analysis of
Thompson Sampling for Contextual Bandits
- URL: http://arxiv.org/abs/2205.13924v1
- Date: Fri, 27 May 2022 12:04:07 GMT
- Title: Lifting the Information Ratio: An Information-Theoretic Analysis of
Thompson Sampling for Contextual Bandits
- Authors: Gergely Neu, Julia Olkhovskaya, Matteo Papini, Ludovic Schwartz
- Abstract summary: We adapt the information-theoretic perspective of Russo and Van Roy [2016] to the contextual setting by introducing a new concept of information ratio.
This allows us to bound the regret in terms of the entropy of the prior distribution through a remarkably simple proof.
An interesting special case is that of logistic bandits with d-dimensional parameters, K actions, and Lipschitz logits, for which we provide a $widetildeO(sqrtdKT)$ regret upper-bound that does not depend on the smallest slope of the sigmoid link function.
- Score: 17.470829701201435
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We study the Bayesian regret of the renowned Thompson Sampling algorithm in
contextual bandits with binary losses and adversarially-selected contexts. We
adapt the information-theoretic perspective of Russo and Van Roy [2016] to the
contextual setting by introducing a new concept of information ratio based on
the mutual information between the unknown model parameter and the observed
loss. This allows us to bound the regret in terms of the entropy of the prior
distribution through a remarkably simple proof, and with no structural
assumptions on the likelihood or the prior. The extension to priors with
infinite entropy only requires a Lipschitz assumption on the log-likelihood. An
interesting special case is that of logistic bandits with d-dimensional
parameters, K actions, and Lipschitz logits, for which we provide a
$\widetilde{O}(\sqrt{dKT})$ regret upper-bound that does not depend on the
smallest slope of the sigmoid link function.
Related papers
- Optimistic Information Directed Sampling [15.704243709119726]
We study the problem of online learning in contextual bandit problems where the loss function is assumed to belong to a known parametric function class.
We propose a new analytic framework for this setting that bridges the Bayesian theory of information-directed sampling due to Russo and Van Roy and the worst-case theory of Foster, Kakade Qian, and Rakhlin (2021) based on the decision-estimation coefficient.
arXiv Detail & Related papers (2024-02-23T16:19:32Z) - Thompson Sampling for Stochastic Bandits with Noisy Contexts: An Information-Theoretic Regret Analysis [4.297070083645049]
We investigate a contextual linear bandit problem where the agent observes a noisy, corrupted version of the true context.
Our objective is to design an action policy that can approximate" that of an oracle.
arXiv Detail & Related papers (2024-01-21T18:57:38Z) - Thompson Sampling Regret Bounds for Contextual Bandits with sub-Gaussian
rewards [44.025369660607645]
We study the performance of the Thompson Sampling algorithm for Contextual Bandit problems.
We introduce new bounds on the lifted information ratio that hold for sub-Gaussian rewards.
arXiv Detail & Related papers (2023-04-26T14:40:01Z) - STEERING: Stein Information Directed Exploration for Model-Based
Reinforcement Learning [111.75423966239092]
We propose an exploration incentive in terms of the integral probability metric (IPM) between a current estimate of the transition model and the unknown optimal.
Based on KSD, we develop a novel algorithm algo: textbfSTEin information dirtextbfEcted exploration for model-based textbfReinforcement LearntextbfING.
arXiv Detail & Related papers (2023-01-28T00:49:28Z) - Thompson Sampling for High-Dimensional Sparse Linear Contextual Bandits [17.11922027966447]
This work provides theoretical guarantees of Thompson sampling in high dimensional and sparse contextual bandits.
For faster computation, we use spike-and-slab prior to model the unknown parameter and variational inference instead of MCMC.
arXiv Detail & Related papers (2022-11-11T02:23:39Z) - A Unifying Theory of Thompson Sampling for Continuous Risk-Averse
Bandits [91.3755431537592]
This paper unifies the analysis of risk-averse Thompson sampling algorithms for the multi-armed bandit problem.
Using the contraction principle in the theory of large deviations, we prove novel concentration bounds for continuous risk functionals.
We show that a wide class of risk functionals as well as "nice" functions of them satisfy the continuity condition.
arXiv Detail & Related papers (2021-08-25T17:09:01Z) - Bias-Robust Bayesian Optimization via Dueling Bandit [57.82422045437126]
We consider Bayesian optimization in settings where observations can be adversarially biased.
We propose a novel approach for dueling bandits based on information-directed sampling (IDS)
Thereby, we obtain the first efficient kernelized algorithm for dueling bandits that comes with cumulative regret guarantees.
arXiv Detail & Related papers (2021-05-25T10:08:41Z) - Analysis and Design of Thompson Sampling for Stochastic Partial
Monitoring [91.22679787578438]
We present a novel Thompson-sampling-based algorithm for partial monitoring.
We prove that the new algorithm achieves the logarithmic problem-dependent expected pseudo-regret $mathrmO(log T)$ for a linearized variant of the problem with local observability.
arXiv Detail & Related papers (2020-06-17T05:48:33Z) - On Thompson Sampling for Smoother-than-Lipschitz Bandits [6.929312022493406]
We provide the first bounds on the regret of Thompson Sampling for continuum armed bandits under weak conditions.
Our bounds are realised by analysis of the eluder dimension.
We derive a new bound on the eluder dimension for classes of functions with Lipschitz derivatives.
arXiv Detail & Related papers (2020-01-08T00:46:13Z) - Contextual Linear Bandits under Noisy Features: Towards Bayesian Oracles [65.9694455739978]
We study contextual linear bandit problems under feature uncertainty, where the features are noisy and have missing entries.
Our analysis reveals that the optimal hypothesis can significantly deviate from the underlying realizability function, depending on the noise characteristics.
This implies that classical approaches cannot guarantee a non-trivial regret bound.
arXiv Detail & Related papers (2017-03-03T21:39:56Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.