Related papers: On the Suboptimality of Thompson Sampling in High Dimensions

On the Suboptimality of Thompson Sampling in High Dimensions

URL: http://arxiv.org/abs/2102.05502v1
Date: Wed, 10 Feb 2021 15:44:43 GMT
Title: On the Suboptimality of Thompson Sampling in High Dimensions
Authors: Raymond Zhang and Richard Combes
Abstract summary: We demonstrate that Thompson Sampling is sub-optimal for semi-bandits. We show that Thompson Sampling indeed can perform very poorly in high dimensions.
Score: 7.198362232890585
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: In this paper we consider Thompson Sampling for combinatorial semi-bandits. We demonstrate that, perhaps surprisingly, Thompson Sampling is sub-optimal for this problem in the sense that its regret scales exponentially in the ambient dimension, and its minimax regret scales almost linearly. This phenomenon occurs under a wide variety of assumptions including both non-linear and linear reward functions. We also show that including a fixed amount of forced exploration to Thompson Sampling does not alleviate the problem. We complement our theoretical results with numerical results and show that in practice Thompson Sampling indeed can perform very poorly in high dimensions.

Related papers

Adaptive Data Augmentation for Thompson Sampling [4.441866681085518]
In linear contextual bandits, the objective is to select actions that maximize cumulative rewards.<n>Thompson Sampling performs well empirically, but it does not achieve optimal regret bounds.<n>This paper proposes a nearly minimax optimal Thompson Sampling for linear contextual bandits.
arXiv Detail & Related papers (2025-06-17T12:57:33Z)
An Adversarial Analysis of Thompson Sampling for Full-information Online Learning: from Finite to Infinite Action Spaces [54.37047702755926]
We develop an analysis of Thompson sampling for online learning under full feedback. We show regret decomposes into regret the learner expected a priori, plus a prior-robustness-type term we call excess regret.
arXiv Detail & Related papers (2025-02-20T18:10:12Z)
Accelerating Approximate Thompson Sampling with Underdamped Langevin Monte Carlo [7.968641076961955]
We propose an approximate Thompson sampling strategy utilizing Langevin Monte Carlo. Based on the standard smoothness and log-concavity conditions, we study the accelerated posterior concentration and sampling. Our algorithm is empirically validated through synthetic experiments in high-dimensional bandit problems.
arXiv Detail & Related papers (2024-01-22T02:54:58Z)
Langevin Monte Carlo for Contextual Bandits [72.00524614312002]
Langevin Monte Carlo Thompson Sampling (LMC-TS) is proposed to directly sample from the posterior distribution in contextual bandits. We prove that the proposed algorithm achieves the same sublinear regret bound as the best Thompson sampling algorithms for a special case of contextual bandits.
arXiv Detail & Related papers (2022-06-22T17:58:23Z)
Finite-Time Regret of Thompson Sampling Algorithms for Exponential Family Multi-Armed Bandits [88.21288104408556]
We study the regret of Thompson sampling (TS) algorithms for exponential family bandits, where the reward distribution is from a one-dimensional exponential family. We propose a Thompson sampling, termed Expulli, which uses a novel sampling distribution to avoid the under-estimation of the optimal arm.
arXiv Detail & Related papers (2022-06-07T18:08:21Z)
Feel-Good Thompson Sampling for Contextual Bandits and Reinforcement Learning [17.860102738896096]
We present a theoretical analysis of Thompson Sampling, with a focus on frequentist regret bounds. We show that the standard Thompson Sampling is not aggressive enough in exploring new actions, leading to suboptimality in some pessimistic situations. We show that the theoretical framework can be used to derive Bayesian regret bounds for standard Thompson Sampling, and frequentist regret bounds for Feel-Good Thompson Sampling.
arXiv Detail & Related papers (2021-10-02T20:10:40Z)
A Unifying Theory of Thompson Sampling for Continuous Risk-Averse Bandits [91.3755431537592]
This paper unifies the analysis of risk-averse Thompson sampling algorithms for the multi-armed bandit problem. Using the contraction principle in the theory of large deviations, we prove novel concentration bounds for continuous risk functionals. We show that a wide class of risk functionals as well as "nice" functions of them satisfy the continuity condition.
arXiv Detail & Related papers (2021-08-25T17:09:01Z)
Asymptotic Convergence of Thompson Sampling [0.0]
Thompson sampling has been shown to be an effective policy across a variety of online learning tasks. We prove an convergence result for Thompson sampling under the assumption of a sub-linear Bayesian regret. Our results rely on the martingale structure inherent in Thompson sampling.
arXiv Detail & Related papers (2020-11-08T07:36:49Z)
Analysis and Design of Thompson Sampling for Stochastic Partial Monitoring [91.22679787578438]
We present a novel Thompson-sampling-based algorithm for partial monitoring. We prove that the new algorithm achieves the logarithmic problem-dependent expected pseudo-regret $mathrmO(log T)$ for a linearized variant of the problem with local observability.
arXiv Detail & Related papers (2020-06-17T05:48:33Z)
MOTS: Minimax Optimal Thompson Sampling [89.2370817955411]
It has remained an open problem whether Thompson sampling can match the minimax lower bound $Omega(sqrtKT)$ for $K$-armed bandit problems. We propose a variant of Thompson sampling called MOTS that adaptively clips the sampling instance of the chosen arm at each time step. We prove that this simple variant of Thompson sampling achieves the minimax optimal regret bound $O(sqrtKT)$ for finite time horizon $T$, as well as the optimal regret bound for Gaussian rewards when $T$ approaches infinity.
arXiv Detail & Related papers (2020-03-03T21:24:39Z)
On Thompson Sampling with Langevin Algorithms [106.78254564840844]
Thompson sampling for multi-armed bandit problems enjoys favorable performance in both theory and practice. It suffers from a significant limitation computationally, arising from the need for samples from posterior distributions at every iteration. We propose two Markov Chain Monte Carlo (MCMC) methods tailored to Thompson sampling to address this issue.
arXiv Detail & Related papers (2020-02-23T22:35:29Z)
Ensemble Sampling [18.85309520133554]
This paper develops ensemble sampling, which aims to approximate Thompson sampling while maintaining tractability even in the face of complex models such as neural networks. We establish a theoretical basis that supports the approach and present computational results that offer further insight.
arXiv Detail & Related papers (2017-05-20T19:36:36Z)

This list is automatically generated from the titles and abstracts of the papers in this site.