On Thompson Sampling for Smoother-than-Lipschitz Bandits
- URL: http://arxiv.org/abs/2001.02323v2
- Date: Wed, 26 Feb 2020 12:06:42 GMT
- Title: On Thompson Sampling for Smoother-than-Lipschitz Bandits
- Authors: James A. Grant and David S. Leslie
- Abstract summary: We provide the first bounds on the regret of Thompson Sampling for continuum armed bandits under weak conditions.
Our bounds are realised by analysis of the eluder dimension.
We derive a new bound on the eluder dimension for classes of functions with Lipschitz derivatives.
- Score: 6.929312022493406
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Thompson Sampling is a well established approach to bandit and reinforcement
learning problems. However its use in continuum armed bandit problems has
received relatively little attention. We provide the first bounds on the regret
of Thompson Sampling for continuum armed bandits under weak conditions on the
function class containing the true function and sub-exponential observation
noise. Our bounds are realised by analysis of the eluder dimension, a recently
proposed measure of the complexity of a function class, which has been
demonstrated to be useful in bounding the Bayesian regret of Thompson Sampling
for simpler bandit problems under sub-Gaussian observation noise. We derive a
new bound on the eluder dimension for classes of functions with Lipschitz
derivatives, and generalise previous analyses in multiple regards.
Related papers
- Thompson Sampling for Stochastic Bandits with Noisy Contexts: An Information-Theoretic Regret Analysis [4.297070083645049]
We investigate a contextual linear bandit problem where the agent observes a noisy, corrupted version of the true context.
Our objective is to design an action policy that can approximate" that of an oracle.
arXiv Detail & Related papers (2024-01-21T18:57:38Z) - Thompson Sampling Regret Bounds for Contextual Bandits with sub-Gaussian
rewards [44.025369660607645]
We study the performance of the Thompson Sampling algorithm for Contextual Bandit problems.
We introduce new bounds on the lifted information ratio that hold for sub-Gaussian rewards.
arXiv Detail & Related papers (2023-04-26T14:40:01Z) - Scalable Representation Learning in Linear Contextual Bandits with
Constant Regret Guarantees [103.69464492445779]
We propose BanditSRL, a representation learning algorithm that learns a realizable representation with good spectral properties.
We prove that BanditSRL can be paired with any no-regret algorithm and achieve constant regret whenever an HLS representation is available.
arXiv Detail & Related papers (2022-10-24T10:04:54Z) - Finite-Time Regret of Thompson Sampling Algorithms for Exponential
Family Multi-Armed Bandits [88.21288104408556]
We study the regret of Thompson sampling (TS) algorithms for exponential family bandits, where the reward distribution is from a one-dimensional exponential family.
We propose a Thompson sampling, termed Expulli, which uses a novel sampling distribution to avoid the under-estimation of the optimal arm.
arXiv Detail & Related papers (2022-06-07T18:08:21Z) - Feel-Good Thompson Sampling for Contextual Bandits and Reinforcement
Learning [17.860102738896096]
We present a theoretical analysis of Thompson Sampling, with a focus on frequentist regret bounds.
We show that the standard Thompson Sampling is not aggressive enough in exploring new actions, leading to suboptimality in some pessimistic situations.
We show that the theoretical framework can be used to derive Bayesian regret bounds for standard Thompson Sampling, and frequentist regret bounds for Feel-Good Thompson Sampling.
arXiv Detail & Related papers (2021-10-02T20:10:40Z) - A Unifying Theory of Thompson Sampling for Continuous Risk-Averse
Bandits [91.3755431537592]
This paper unifies the analysis of risk-averse Thompson sampling algorithms for the multi-armed bandit problem.
Using the contraction principle in the theory of large deviations, we prove novel concentration bounds for continuous risk functionals.
We show that a wide class of risk functionals as well as "nice" functions of them satisfy the continuity condition.
arXiv Detail & Related papers (2021-08-25T17:09:01Z) - Analysis and Design of Thompson Sampling for Stochastic Partial
Monitoring [91.22679787578438]
We present a novel Thompson-sampling-based algorithm for partial monitoring.
We prove that the new algorithm achieves the logarithmic problem-dependent expected pseudo-regret $mathrmO(log T)$ for a linearized variant of the problem with local observability.
arXiv Detail & Related papers (2020-06-17T05:48:33Z) - Contextual Linear Bandits under Noisy Features: Towards Bayesian Oracles [65.9694455739978]
We study contextual linear bandit problems under feature uncertainty, where the features are noisy and have missing entries.
Our analysis reveals that the optimal hypothesis can significantly deviate from the underlying realizability function, depending on the noise characteristics.
This implies that classical approaches cannot guarantee a non-trivial regret bound.
arXiv Detail & Related papers (2017-03-03T21:39:56Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.