Related papers: Diffusion Approximations for Thompson Sampling

Diffusion Approximations for Thompson Sampling

URL: http://arxiv.org/abs/2105.09232v1
Date: Wed, 19 May 2021 16:28:01 GMT
Title: Diffusion Approximations for Thompson Sampling
Authors: Lin Fan, Peter W. Glynn
Abstract summary: We show that the dynamics of Thompson sampling evolve according to discrete versions of SDEs and random ODEs. Our weak convergence theory covers both the classical multi-armed and linear bandit settings. Our theory is developed from first-principles and can also be adapted to analyze other sampling-based bandit algorithms.
Score: 9.384123241346382
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: We study the behavior of Thompson sampling from the perspective of weak convergence. In the regime where the gaps between arm means scale as $1/\sqrt{n}$ with the time horizon $n$, we show that the dynamics of Thompson sampling evolve according to discrete versions of SDEs and random ODEs. As $n \to \infty$, we show that the dynamics converge weakly to solutions of the corresponding SDEs and random ODEs. (Recently, Wager and Xu (arXiv:2101.09855) independently proposed this regime and developed similar SDE and random ODE approximations.) Our weak convergence theory covers both the classical multi-armed and linear bandit settings, and can be used, for instance, to obtain insight about the characteristics of the regret distribution when there is information sharing among arms, as well as the effects of variance estimation, model mis-specification and batched updates in bandit learning. Our theory is developed from first-principles and can also be adapted to analyze other sampling-based bandit algorithms.

Related papers

VITS : Variational Inference Thompson Sampling for contextual bandits [10.028119153832346]
We introduce and analyze a variant of the Thompson sampling (TS) algorithm for contextual bandits. We propose a new algorithm, Varational Inference Thompson sampling VITS, based on Gaussian Variational Inference. We show that VITS achieves a sub-linear regret bound of the same order in the dimension and number of round as traditional TS for linear contextual bandit.
arXiv Detail & Related papers (2023-07-19T17:53:22Z)
Provable and Practical: Efficient Exploration in Reinforcement Learning via Langevin Monte Carlo [104.9535542833054]
We present a scalable and effective exploration strategy based on Thompson sampling for reinforcement learning (RL) We instead directly sample the Q function from its posterior distribution, by using Langevin Monte Carlo. Our approach achieves better or similar results compared with state-of-the-art deep RL algorithms on several challenging exploration tasks from the Atari57 suite.
arXiv Detail & Related papers (2023-05-29T17:11:28Z)
Langevin Monte Carlo for Contextual Bandits [72.00524614312002]
Langevin Monte Carlo Thompson Sampling (LMC-TS) is proposed to directly sample from the posterior distribution in contextual bandits. We prove that the proposed algorithm achieves the same sublinear regret bound as the best Thompson sampling algorithms for a special case of contextual bandits.
arXiv Detail & Related papers (2022-06-22T17:58:23Z)
Finite-Time Regret of Thompson Sampling Algorithms for Exponential Family Multi-Armed Bandits [88.21288104408556]
We study the regret of Thompson sampling (TS) algorithms for exponential family bandits, where the reward distribution is from a one-dimensional exponential family. We propose a Thompson sampling, termed Expulli, which uses a novel sampling distribution to avoid the under-estimation of the optimal arm.
arXiv Detail & Related papers (2022-06-07T18:08:21Z)
A Unifying Theory of Thompson Sampling for Continuous Risk-Averse Bandits [91.3755431537592]
This paper unifies the analysis of risk-averse Thompson sampling algorithms for the multi-armed bandit problem. Using the contraction principle in the theory of large deviations, we prove novel concentration bounds for continuous risk functionals. We show that a wide class of risk functionals as well as "nice" functions of them satisfy the continuity condition.
arXiv Detail & Related papers (2021-08-25T17:09:01Z)
Thompson Sampling for Unimodal Bandits [21.514495320038712]
We propose a Thompson Sampling algorithm for emphunimodal bandits, where the expected reward is unimodal over the partially ordered arms. For Gaussian rewards, the regret of our algorithm is $mathcalO(log T)$, which is far better than standard Thompson Sampling algorithms.
arXiv Detail & Related papers (2021-06-15T14:40:34Z)
Analysis and Design of Thompson Sampling for Stochastic Partial Monitoring [91.22679787578438]
We present a novel Thompson-sampling-based algorithm for partial monitoring. We prove that the new algorithm achieves the logarithmic problem-dependent expected pseudo-regret $mathrmO(log T)$ for a linearized variant of the problem with local observability.
arXiv Detail & Related papers (2020-06-17T05:48:33Z)
MOTS: Minimax Optimal Thompson Sampling [89.2370817955411]
It has remained an open problem whether Thompson sampling can match the minimax lower bound $Omega(sqrtKT)$ for $K$-armed bandit problems. We propose a variant of Thompson sampling called MOTS that adaptively clips the sampling instance of the chosen arm at each time step. We prove that this simple variant of Thompson sampling achieves the minimax optimal regret bound $O(sqrtKT)$ for finite time horizon $T$, as well as the optimal regret bound for Gaussian rewards when $T$ approaches infinity.
arXiv Detail & Related papers (2020-03-03T21:24:39Z)
On Thompson Sampling with Langevin Algorithms [106.78254564840844]
Thompson sampling for multi-armed bandit problems enjoys favorable performance in both theory and practice. It suffers from a significant limitation computationally, arising from the need for samples from posterior distributions at every iteration. We propose two Markov Chain Monte Carlo (MCMC) methods tailored to Thompson sampling to address this issue.
arXiv Detail & Related papers (2020-02-23T22:35:29Z)
Ensemble Sampling [18.85309520133554]
This paper develops ensemble sampling, which aims to approximate Thompson sampling while maintaining tractability even in the face of complex models such as neural networks. We establish a theoretical basis that supports the approach and present computational results that offer further insight.
arXiv Detail & Related papers (2017-05-20T19:36:36Z)

This list is automatically generated from the titles and abstracts of the papers in this site.