Related papers: Thompson Sampling for High-Dimensional Sparse Linear Contextual Bandits

Thompson Sampling for High-Dimensional Sparse Linear Contextual Bandits

URL: http://arxiv.org/abs/2211.05964v1
Date: Fri, 11 Nov 2022 02:23:39 GMT
Title: Thompson Sampling for High-Dimensional Sparse Linear Contextual Bandits
Authors: Sunrit Chakraborty, Saptarshi Roy, Ambuj Tewari
Abstract summary: This work provides theoretical guarantees of Thompson sampling in high dimensional and sparse contextual bandits. For faster computation, we use spike-and-slab prior to model the unknown parameter and variational inference instead of MCMC.
Score: 17.11922027966447
License: http://creativecommons.org/licenses/by/4.0/
Abstract: We consider the stochastic linear contextual bandit problem with high-dimensional features. We analyze the Thompson sampling (TS) algorithm, using special classes of sparsity-inducing priors (e.g. spike-and-slab) to model the unknown parameter, and provide a nearly optimal upper bound on the expected cumulative regret. To the best of our knowledge, this is the first work that provides theoretical guarantees of Thompson sampling in high dimensional and sparse contextual bandits. For faster computation, we use spike-and-slab prior to model the unknown parameter and variational inference instead of MCMC to approximate the posterior distribution. Extensive simulations demonstrate improved performance of our proposed algorithm over existing ones.

Related papers

VITS : Variational Inference Thompson Sampling for contextual bandits [10.028119153832346]
We introduce and analyze a variant of the Thompson sampling (TS) algorithm for contextual bandits. We propose a new algorithm, Varational Inference Thompson sampling VITS, based on Gaussian Variational Inference. We show that VITS achieves a sub-linear regret bound of the same order in the dimension and number of round as traditional TS for linear contextual bandit.
arXiv Detail & Related papers (2023-07-19T17:53:22Z)
Provable and Practical: Efficient Exploration in Reinforcement Learning via Langevin Monte Carlo [104.9535542833054]
We present a scalable and effective exploration strategy based on Thompson sampling for reinforcement learning (RL) We instead directly sample the Q function from its posterior distribution, by using Langevin Monte Carlo. Our approach achieves better or similar results compared with state-of-the-art deep RL algorithms on several challenging exploration tasks from the Atari57 suite.
arXiv Detail & Related papers (2023-05-29T17:11:28Z)
A Provably Efficient Model-Free Posterior Sampling Method for Episodic Reinforcement Learning [50.910152564914405]
Existing posterior sampling methods for reinforcement learning are limited by being model-based or lack worst-case theoretical guarantees beyond linear MDPs. This paper proposes a new model-free formulation of posterior sampling that applies to more general episodic reinforcement learning problems with theoretical guarantees.
arXiv Detail & Related papers (2022-08-23T12:21:01Z)
Langevin Monte Carlo for Contextual Bandits [72.00524614312002]
Langevin Monte Carlo Thompson Sampling (LMC-TS) is proposed to directly sample from the posterior distribution in contextual bandits. We prove that the proposed algorithm achieves the same sublinear regret bound as the best Thompson sampling algorithms for a special case of contextual bandits.
arXiv Detail & Related papers (2022-06-22T17:58:23Z)
Diffusion Approximations for Thompson Sampling [4.390757904176221]
We show that the dynamics of Thompson sampling evolve according to discrete versions of SDE's horizon and ODE's. Our weak convergence theory is developed from first principles using the Continuous Mapping Theorem.
arXiv Detail & Related papers (2021-05-19T16:28:01Z)
Analysis and Design of Thompson Sampling for Stochastic Partial Monitoring [91.22679787578438]
We present a novel Thompson-sampling-based algorithm for partial monitoring. We prove that the new algorithm achieves the logarithmic problem-dependent expected pseudo-regret $mathrmO(log T)$ for a linearized variant of the problem with local observability.
arXiv Detail & Related papers (2020-06-17T05:48:33Z)
On Thompson Sampling with Langevin Algorithms [106.78254564840844]
Thompson sampling for multi-armed bandit problems enjoys favorable performance in both theory and practice. It suffers from a significant limitation computationally, arising from the need for samples from posterior distributions at every iteration. We propose two Markov Chain Monte Carlo (MCMC) methods tailored to Thompson sampling to address this issue.
arXiv Detail & Related papers (2020-02-23T22:35:29Z)
Thompson Sampling Algorithms for Mean-Variance Bandits [97.43678751629189]
We develop Thompson Sampling-style algorithms for mean-variance MAB. We also provide comprehensive regret analyses for Gaussian and Bernoulli bandits. Our algorithms significantly outperform existing LCB-based algorithms for all risk tolerances.
arXiv Detail & Related papers (2020-02-01T15:33:50Z)

This list is automatically generated from the titles and abstracts of the papers in this site.