Analysis and Design of Thompson Sampling for Stochastic Partial
Monitoring
- URL: http://arxiv.org/abs/2006.09668v2
- Date: Thu, 10 Jun 2021 09:00:24 GMT
- Title: Analysis and Design of Thompson Sampling for Stochastic Partial
Monitoring
- Authors: Taira Tsuchiya, Junya Honda, Masashi Sugiyama
- Abstract summary: We present a novel Thompson-sampling-based algorithm for partial monitoring.
We prove that the new algorithm achieves the logarithmic problem-dependent expected pseudo-regret $mathrmO(log T)$ for a linearized variant of the problem with local observability.
- Score: 91.22679787578438
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We investigate finite stochastic partial monitoring, which is a general model
for sequential learning with limited feedback. While Thompson sampling is one
of the most promising algorithms on a variety of online decision-making
problems, its properties for stochastic partial monitoring have not been
theoretically investigated, and the existing algorithm relies on a heuristic
approximation of the posterior distribution. To mitigate these problems, we
present a novel Thompson-sampling-based algorithm, which enables us to exactly
sample the target parameter from the posterior distribution. Besides, we prove
that the new algorithm achieves the logarithmic problem-dependent expected
pseudo-regret $\mathrm{O}(\log T)$ for a linearized variant of the problem with
local observability. This result is the first regret bound of Thompson sampling
for partial monitoring, which also becomes the first logarithmic regret bound
of Thompson sampling for linear bandits.
Related papers
- VITS : Variational Inference Thompson Sampling for contextual bandits [10.028119153832346]
We introduce and analyze a variant of the Thompson sampling (TS) algorithm for contextual bandits.
We propose a new algorithm, Varational Inference Thompson sampling VITS, based on Gaussian Variational Inference.
We show that VITS achieves a sub-linear regret bound of the same order in the dimension and number of round as traditional TS for linear contextual bandit.
arXiv Detail & Related papers (2023-07-19T17:53:22Z) - Thompson Sampling for High-Dimensional Sparse Linear Contextual Bandits [17.11922027966447]
This work provides theoretical guarantees of Thompson sampling in high dimensional and sparse contextual bandits.
For faster computation, we use spike-and-slab prior to model the unknown parameter and variational inference instead of MCMC.
arXiv Detail & Related papers (2022-11-11T02:23:39Z) - Langevin Monte Carlo for Contextual Bandits [72.00524614312002]
Langevin Monte Carlo Thompson Sampling (LMC-TS) is proposed to directly sample from the posterior distribution in contextual bandits.
We prove that the proposed algorithm achieves the same sublinear regret bound as the best Thompson sampling algorithms for a special case of contextual bandits.
arXiv Detail & Related papers (2022-06-22T17:58:23Z) - A Unifying Theory of Thompson Sampling for Continuous Risk-Averse
Bandits [91.3755431537592]
This paper unifies the analysis of risk-averse Thompson sampling algorithms for the multi-armed bandit problem.
Using the contraction principle in the theory of large deviations, we prove novel concentration bounds for continuous risk functionals.
We show that a wide class of risk functionals as well as "nice" functions of them satisfy the continuity condition.
arXiv Detail & Related papers (2021-08-25T17:09:01Z) - Diffusion Approximations for Thompson Sampling [4.390757904176221]
We show that the dynamics of Thompson sampling evolve according to discrete versions of SDE's horizon and ODE's.
Our weak convergence theory is developed from first principles using the Continuous Mapping Theorem.
arXiv Detail & Related papers (2021-05-19T16:28:01Z) - Adaptive Sampling for Best Policy Identification in Markov Decision
Processes [79.4957965474334]
We investigate the problem of best-policy identification in discounted Markov Decision (MDPs) when the learner has access to a generative model.
The advantages of state-of-the-art algorithms are discussed and illustrated.
arXiv Detail & Related papers (2020-09-28T15:22:24Z) - Active Model Estimation in Markov Decision Processes [108.46146218973189]
We study the problem of efficient exploration in order to learn an accurate model of an environment, modeled as a Markov decision process (MDP)
We show that our Markov-based algorithm outperforms both our original algorithm and the maximum entropy algorithm in the small sample regime.
arXiv Detail & Related papers (2020-03-06T16:17:24Z) - On Thompson Sampling with Langevin Algorithms [106.78254564840844]
Thompson sampling for multi-armed bandit problems enjoys favorable performance in both theory and practice.
It suffers from a significant limitation computationally, arising from the need for samples from posterior distributions at every iteration.
We propose two Markov Chain Monte Carlo (MCMC) methods tailored to Thompson sampling to address this issue.
arXiv Detail & Related papers (2020-02-23T22:35:29Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.