Asymptotic Convergence of Thompson Sampling
- URL: http://arxiv.org/abs/2011.03917v1
- Date: Sun, 8 Nov 2020 07:36:49 GMT
- Title: Asymptotic Convergence of Thompson Sampling
- Authors: Cem Kalkanli, Ayfer Ozgur
- Abstract summary: Thompson sampling has been shown to be an effective policy across a variety of online learning tasks.
We prove an convergence result for Thompson sampling under the assumption of a sub-linear Bayesian regret.
Our results rely on the martingale structure inherent in Thompson sampling.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Thompson sampling has been shown to be an effective policy across a variety
of online learning tasks. Many works have analyzed the finite time performance
of Thompson sampling, and proved that it achieves a sub-linear regret under a
broad range of probabilistic settings. However its asymptotic behavior remains
mostly underexplored. In this paper, we prove an asymptotic convergence result
for Thompson sampling under the assumption of a sub-linear Bayesian regret, and
show that the actions of a Thompson sampling agent provide a strongly
consistent estimator of the optimal action. Our results rely on the martingale
structure inherent in Thompson sampling.
Related papers
- Finite-Time Regret of Thompson Sampling Algorithms for Exponential
Family Multi-Armed Bandits [88.21288104408556]
We study the regret of Thompson sampling (TS) algorithms for exponential family bandits, where the reward distribution is from a one-dimensional exponential family.
We propose a Thompson sampling, termed Expulli, which uses a novel sampling distribution to avoid the under-estimation of the optimal arm.
arXiv Detail & Related papers (2022-06-07T18:08:21Z) - Asymptotic Performance of Thompson Sampling in the Batched Multi-Armed
Bandits [0.0]
We show that Thompson sampling can maintain its performance even if it receives delayed feedback in batches.
We propose an adaptive scheme that reduces the number of batches to $Theta(log T)$ while maintaining the same performance.
arXiv Detail & Related papers (2021-10-01T01:28:40Z) - A Unifying Theory of Thompson Sampling for Continuous Risk-Averse
Bandits [91.3755431537592]
This paper unifies the analysis of risk-averse Thompson sampling algorithms for the multi-armed bandit problem.
Using the contraction principle in the theory of large deviations, we prove novel concentration bounds for continuous risk functionals.
We show that a wide class of risk functionals as well as "nice" functions of them satisfy the continuity condition.
arXiv Detail & Related papers (2021-08-25T17:09:01Z) - On the Suboptimality of Thompson Sampling in High Dimensions [7.198362232890585]
We demonstrate that Thompson Sampling is sub-optimal for semi-bandits.
We show that Thompson Sampling indeed can perform very poorly in high dimensions.
arXiv Detail & Related papers (2021-02-10T15:44:43Z) - Policy Gradient Optimization of Thompson Sampling Policies [3.3345263849085582]
We study the use of policy gradient algorithms to optimize over a class of generalized Thompson sampling policies.
We show that direct policy search on top of Thompson sampling automatically corrects for some of the algorithm's known shortcomings.
arXiv Detail & Related papers (2020-06-30T03:27:22Z) - Analysis and Design of Thompson Sampling for Stochastic Partial
Monitoring [91.22679787578438]
We present a novel Thompson-sampling-based algorithm for partial monitoring.
We prove that the new algorithm achieves the logarithmic problem-dependent expected pseudo-regret $mathrmO(log T)$ for a linearized variant of the problem with local observability.
arXiv Detail & Related papers (2020-06-17T05:48:33Z) - Statistical Efficiency of Thompson Sampling for Combinatorial
Semi-Bandits [56.31950477139053]
We investigate multi-armed bandit with semi-bandit feedback (CMAB)
We analyze variants of the Combinatorial Thompson Sampling policy (CTS)
This last result gives us an alternative to the Efficient Sampling for Combinatorial Bandit policy (ESCB)
arXiv Detail & Related papers (2020-06-11T17:12:11Z) - MOTS: Minimax Optimal Thompson Sampling [89.2370817955411]
It has remained an open problem whether Thompson sampling can match the minimax lower bound $Omega(sqrtKT)$ for $K$-armed bandit problems.
We propose a variant of Thompson sampling called MOTS that adaptively clips the sampling instance of the chosen arm at each time step.
We prove that this simple variant of Thompson sampling achieves the minimax optimal regret bound $O(sqrtKT)$ for finite time horizon $T$, as well as the optimal regret bound for Gaussian rewards when $T$ approaches infinity.
arXiv Detail & Related papers (2020-03-03T21:24:39Z) - On Thompson Sampling with Langevin Algorithms [106.78254564840844]
Thompson sampling for multi-armed bandit problems enjoys favorable performance in both theory and practice.
It suffers from a significant limitation computationally, arising from the need for samples from posterior distributions at every iteration.
We propose two Markov Chain Monte Carlo (MCMC) methods tailored to Thompson sampling to address this issue.
arXiv Detail & Related papers (2020-02-23T22:35:29Z) - Ensemble Sampling [18.85309520133554]
This paper develops ensemble sampling, which aims to approximate Thompson sampling while maintaining tractability even in the face of complex models such as neural networks.
We establish a theoretical basis that supports the approach and present computational results that offer further insight.
arXiv Detail & Related papers (2017-05-20T19:36:36Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.