Diffusion Approximations for Thompson Sampling
- URL: http://arxiv.org/abs/2105.09232v1
- Date: Wed, 19 May 2021 16:28:01 GMT
- Title: Diffusion Approximations for Thompson Sampling
- Authors: Lin Fan, Peter W. Glynn
- Abstract summary: We show that the dynamics of Thompson sampling evolve according to discrete versions of SDEs and random ODEs.
Our weak convergence theory covers both the classical multi-armed and linear bandit settings.
Our theory is developed from first-principles and can also be adapted to analyze other sampling-based bandit algorithms.
- Score: 9.384123241346382
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We study the behavior of Thompson sampling from the perspective of weak
convergence. In the regime where the gaps between arm means scale as
$1/\sqrt{n}$ with the time horizon $n$, we show that the dynamics of Thompson
sampling evolve according to discrete versions of SDEs and random ODEs. As $n
\to \infty$, we show that the dynamics converge weakly to solutions of the
corresponding SDEs and random ODEs. (Recently, Wager and Xu (arXiv:2101.09855)
independently proposed this regime and developed similar SDE and random ODE
approximations.) Our weak convergence theory covers both the classical
multi-armed and linear bandit settings, and can be used, for instance, to
obtain insight about the characteristics of the regret distribution when there
is information sharing among arms, as well as the effects of variance
estimation, model mis-specification and batched updates in bandit learning. Our
theory is developed from first-principles and can also be adapted to analyze
other sampling-based bandit algorithms.
Related papers
- Gaussian Mixture Solvers for Diffusion Models [84.83349474361204]
We introduce a novel class of SDE-based solvers called GMS for diffusion models.
Our solver outperforms numerous SDE-based solvers in terms of sample quality in image generation and stroke-based synthesis.
arXiv Detail & Related papers (2023-11-02T02:05:38Z) - VITS : Variational Inference Thompson Sampling for contextual bandits [10.028119153832346]
We introduce and analyze a variant of the Thompson sampling (TS) algorithm for contextual bandits.
We propose a new algorithm, Varational Inference Thompson sampling VITS, based on Gaussian Variational Inference.
We show that VITS achieves a sub-linear regret bound of the same order in the dimension and number of round as traditional TS for linear contextual bandit.
arXiv Detail & Related papers (2023-07-19T17:53:22Z) - Towards Faster Non-Asymptotic Convergence for Diffusion-Based Generative
Models [49.81937966106691]
We develop a suite of non-asymptotic theory towards understanding the data generation process of diffusion models.
In contrast to prior works, our theory is developed based on an elementary yet versatile non-asymptotic approach.
arXiv Detail & Related papers (2023-06-15T16:30:08Z) - A Geometric Perspective on Diffusion Models [60.69328526215776]
We inspect the ODE-based sampling of a popular variance-exploding SDE and reveal several intriguing structures of its sampling dynamics.
We establish a theoretical relationship between the optimal ODE-based sampling and the classic mean-shift (mode-seeking) algorithm.
arXiv Detail & Related papers (2023-05-31T15:33:16Z) - Finite-Time Regret of Thompson Sampling Algorithms for Exponential
Family Multi-Armed Bandits [88.21288104408556]
We study the regret of Thompson sampling (TS) algorithms for exponential family bandits, where the reward distribution is from a one-dimensional exponential family.
We propose a Thompson sampling, termed Expulli, which uses a novel sampling distribution to avoid the under-estimation of the optimal arm.
arXiv Detail & Related papers (2022-06-07T18:08:21Z) - Improving Robustness and Uncertainty Modelling in Neural Ordinary
Differential Equations [0.2538209532048866]
We propose a novel approach to model uncertainty in NODE by considering a distribution over the end-time $T$ of the ODE solver.
We also propose, adaptive latent time NODE (ALT-NODE), which allow each data point to have a distinct posterior distribution over end-times.
We demonstrate the effectiveness of the proposed approaches in modelling uncertainty and robustness through experiments on synthetic and several real-world image classification data.
arXiv Detail & Related papers (2021-12-23T16:56:10Z) - From Optimality to Robustness: Dirichlet Sampling Strategies in
Stochastic Bandits [0.0]
We study a generic Dirichlet Sampling (DS) algorithm, based on pairwise comparisons of empirical indices computed with re-sampling of the arms' observations.
We show that different variants of this strategy achieve provably optimal regret guarantees when the distributions are bounded and logarithmic regret for semi-bounded distributions with a mild quantile condition.
arXiv Detail & Related papers (2021-11-18T14:34:21Z) - A Unifying Theory of Thompson Sampling for Continuous Risk-Averse
Bandits [91.3755431537592]
This paper unifies the analysis of risk-averse Thompson sampling algorithms for the multi-armed bandit problem.
Using the contraction principle in the theory of large deviations, we prove novel concentration bounds for continuous risk functionals.
We show that a wide class of risk functionals as well as "nice" functions of them satisfy the continuity condition.
arXiv Detail & Related papers (2021-08-25T17:09:01Z) - Random Effect Bandits [22.322646330965476]
We study regret in multi-armed bandits, a classical online learning problem.
Our experiments show that ReUCB can outperform Thompson sampling in various scenarios.
arXiv Detail & Related papers (2021-06-23T07:15:31Z) - Analysis and Design of Thompson Sampling for Stochastic Partial
Monitoring [91.22679787578438]
We present a novel Thompson-sampling-based algorithm for partial monitoring.
We prove that the new algorithm achieves the logarithmic problem-dependent expected pseudo-regret $mathrmO(log T)$ for a linearized variant of the problem with local observability.
arXiv Detail & Related papers (2020-06-17T05:48:33Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.