The Randomized Elliptical Potential Lemma with an Application to Linear
Thompson Sampling
- URL: http://arxiv.org/abs/2102.07987v1
- Date: Tue, 16 Feb 2021 07:30:04 GMT
- Title: The Randomized Elliptical Potential Lemma with an Application to Linear
Thompson Sampling
- Authors: Nima Hamidi, Mohsen Bayati
- Abstract summary: We introduce a randomized version of the well-known elliptical potential lemma that is widely used in the analysis of algorithms in sequential learning and decision-making problems such as linear bandits.
Our randomized elliptical potential lemma relaxes the Gaussian assumption on the observation noise and on the prior distribution of the problem parameters.
- Score: 10.939683083130616
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In this note, we introduce a randomized version of the well-known elliptical
potential lemma that is widely used in the analysis of algorithms in sequential
learning and decision-making problems such as stochastic linear bandits. Our
randomized elliptical potential lemma relaxes the Gaussian assumption on the
observation noise and on the prior distribution of the problem parameters. We
then use this generalization to prove an improved Bayesian regret bound for
Thompson sampling for the linear stochastic bandits with changing action sets
where prior and noise distributions are general. This bound is minimax optimal
up to constants.
Related papers
- Likelihood Ratio Confidence Sets for Sequential Decision Making [51.66638486226482]
We revisit the likelihood-based inference principle and propose to use likelihood ratios to construct valid confidence sequences.
Our method is especially suitable for problems with well-specified likelihoods.
We show how to provably choose the best sequence of estimators and shed light on connections to online convex optimization.
arXiv Detail & Related papers (2023-11-08T00:10:21Z) - Thompson Sampling for High-Dimensional Sparse Linear Contextual Bandits [17.11922027966447]
This work provides theoretical guarantees of Thompson sampling in high dimensional and sparse contextual bandits.
For faster computation, we use spike-and-slab prior to model the unknown parameter and variational inference instead of MCMC.
arXiv Detail & Related papers (2022-11-11T02:23:39Z) - Langevin Monte Carlo for Contextual Bandits [72.00524614312002]
Langevin Monte Carlo Thompson Sampling (LMC-TS) is proposed to directly sample from the posterior distribution in contextual bandits.
We prove that the proposed algorithm achieves the same sublinear regret bound as the best Thompson sampling algorithms for a special case of contextual bandits.
arXiv Detail & Related papers (2022-06-22T17:58:23Z) - Optimal Online Generalized Linear Regression with Stochastic Noise and
Its Application to Heteroscedastic Bandits [88.6139446295537]
We study the problem of online generalized linear regression in the setting of a generalized linear model with possibly unbounded additive noise.
We provide a sharp analysis of the classical follow-the-regularized-leader (FTRL) algorithm to cope with the label noise.
We propose an algorithm based on FTRL to achieve the first variance-aware regret bound.
arXiv Detail & Related papers (2022-02-28T08:25:26Z) - SNIPS: Solving Noisy Inverse Problems Stochastically [25.567566997688044]
We introduce a novel algorithm dubbed SNIPS, which draws samples from the posterior distribution of any linear inverse problem.
Our solution incorporates ideas from Langevin dynamics and Newton's method, and exploits a pre-trained minimum mean squared error (MMSE)
We show that the samples produced are sharp, detailed and consistent with the given measurements, and their diversity exposes the inherent uncertainty in the inverse problem being solved.
arXiv Detail & Related papers (2021-05-31T13:33:21Z) - Stochastic Online Convex Optimization. Application to probabilistic time
series forecasting [0.0]
We prove algorithms such as online newton steps and a scale-free 10 version of Bernstein online achieve best-known rates in unbounded settings.
Our fast-rate regret bounds are any-time valid.
arXiv Detail & Related papers (2021-02-01T09:49:15Z) - Pathwise Conditioning of Gaussian Processes [72.61885354624604]
Conventional approaches for simulating Gaussian process posteriors view samples as draws from marginal distributions of process values at finite sets of input locations.
This distribution-centric characterization leads to generative strategies that scale cubically in the size of the desired random vector.
We show how this pathwise interpretation of conditioning gives rise to a general family of approximations that lend themselves to efficiently sampling Gaussian process posteriors.
arXiv Detail & Related papers (2020-11-08T17:09:37Z) - Analysis and Design of Thompson Sampling for Stochastic Partial
Monitoring [91.22679787578438]
We present a novel Thompson-sampling-based algorithm for partial monitoring.
We prove that the new algorithm achieves the logarithmic problem-dependent expected pseudo-regret $mathrmO(log T)$ for a linearized variant of the problem with local observability.
arXiv Detail & Related papers (2020-06-17T05:48:33Z) - Stochastic Saddle-Point Optimization for Wasserstein Barycenters [69.68068088508505]
We consider the populationimation barycenter problem for random probability measures supported on a finite set of points and generated by an online stream of data.
We employ the structure of the problem and obtain a convex-concave saddle-point reformulation of this problem.
In the setting when the distribution of random probability measures is discrete, we propose an optimization algorithm and estimate its complexity.
arXiv Detail & Related papers (2020-06-11T19:40:38Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.