Thompson Sampling for High-Dimensional Sparse Linear Contextual Bandits
- URL: http://arxiv.org/abs/2211.05964v1
- Date: Fri, 11 Nov 2022 02:23:39 GMT
- Title: Thompson Sampling for High-Dimensional Sparse Linear Contextual Bandits
- Authors: Sunrit Chakraborty, Saptarshi Roy, Ambuj Tewari
- Abstract summary: This work provides theoretical guarantees of Thompson sampling in high dimensional and sparse contextual bandits.
For faster computation, we use spike-and-slab prior to model the unknown parameter and variational inference instead of MCMC.
- Score: 17.11922027966447
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We consider the stochastic linear contextual bandit problem with
high-dimensional features. We analyze the Thompson sampling (TS) algorithm,
using special classes of sparsity-inducing priors (e.g. spike-and-slab) to
model the unknown parameter, and provide a nearly optimal upper bound on the
expected cumulative regret. To the best of our knowledge, this is the first
work that provides theoretical guarantees of Thompson sampling in high
dimensional and sparse contextual bandits. For faster computation, we use
spike-and-slab prior to model the unknown parameter and variational inference
instead of MCMC to approximate the posterior distribution. Extensive
simulations demonstrate improved performance of our proposed algorithm over
existing ones.
Related papers
- VITS : Variational Inference Thompson Sampling for contextual bandits [10.028119153832346]
We introduce and analyze a variant of the Thompson sampling (TS) algorithm for contextual bandits.
We propose a new algorithm, Varational Inference Thompson sampling VITS, based on Gaussian Variational Inference.
We show that VITS achieves a sub-linear regret bound of the same order in the dimension and number of round as traditional TS for linear contextual bandit.
arXiv Detail & Related papers (2023-07-19T17:53:22Z) - Provable and Practical: Efficient Exploration in Reinforcement Learning via Langevin Monte Carlo [104.9535542833054]
We present a scalable and effective exploration strategy based on Thompson sampling for reinforcement learning (RL)
We instead directly sample the Q function from its posterior distribution, by using Langevin Monte Carlo.
Our approach achieves better or similar results compared with state-of-the-art deep RL algorithms on several challenging exploration tasks from the Atari57 suite.
arXiv Detail & Related papers (2023-05-29T17:11:28Z) - A Provably Efficient Model-Free Posterior Sampling Method for Episodic
Reinforcement Learning [50.910152564914405]
Existing posterior sampling methods for reinforcement learning are limited by being model-based or lack worst-case theoretical guarantees beyond linear MDPs.
This paper proposes a new model-free formulation of posterior sampling that applies to more general episodic reinforcement learning problems with theoretical guarantees.
arXiv Detail & Related papers (2022-08-23T12:21:01Z) - Langevin Monte Carlo for Contextual Bandits [72.00524614312002]
Langevin Monte Carlo Thompson Sampling (LMC-TS) is proposed to directly sample from the posterior distribution in contextual bandits.
We prove that the proposed algorithm achieves the same sublinear regret bound as the best Thompson sampling algorithms for a special case of contextual bandits.
arXiv Detail & Related papers (2022-06-22T17:58:23Z) - Diffusion Approximations for Thompson Sampling [4.390757904176221]
We show that the dynamics of Thompson sampling evolve according to discrete versions of SDE's horizon and ODE's.
Our weak convergence theory is developed from first principles using the Continuous Mapping Theorem.
arXiv Detail & Related papers (2021-05-19T16:28:01Z) - Analysis and Design of Thompson Sampling for Stochastic Partial
Monitoring [91.22679787578438]
We present a novel Thompson-sampling-based algorithm for partial monitoring.
We prove that the new algorithm achieves the logarithmic problem-dependent expected pseudo-regret $mathrmO(log T)$ for a linearized variant of the problem with local observability.
arXiv Detail & Related papers (2020-06-17T05:48:33Z) - On Thompson Sampling with Langevin Algorithms [106.78254564840844]
Thompson sampling for multi-armed bandit problems enjoys favorable performance in both theory and practice.
It suffers from a significant limitation computationally, arising from the need for samples from posterior distributions at every iteration.
We propose two Markov Chain Monte Carlo (MCMC) methods tailored to Thompson sampling to address this issue.
arXiv Detail & Related papers (2020-02-23T22:35:29Z) - Thompson Sampling Algorithms for Mean-Variance Bandits [97.43678751629189]
We develop Thompson Sampling-style algorithms for mean-variance MAB.
We also provide comprehensive regret analyses for Gaussian and Bernoulli bandits.
Our algorithms significantly outperform existing LCB-based algorithms for all risk tolerances.
arXiv Detail & Related papers (2020-02-01T15:33:50Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.