An Adversarial Analysis of Thompson Sampling for Full-information Online Learning: from Finite to Infinite Action Spaces
- URL: http://arxiv.org/abs/2502.14790v1
- Date: Thu, 20 Feb 2025 18:10:12 GMT
- Title: An Adversarial Analysis of Thompson Sampling for Full-information Online Learning: from Finite to Infinite Action Spaces
- Authors: Alexander Terenin, Jeffrey Negrea,
- Abstract summary: We develop an analysis of Thompson sampling for online learning under full feedback.
We show regret decomposes into regret the learner expected a priori, plus a prior-robustness-type term we call excess regret.
- Score: 54.37047702755926
- License:
- Abstract: We develop an analysis of Thompson sampling for online learning under full feedback - also known as prediction with expert advice - where the learner's prior is defined over the space of an adversary's future actions, rather than the space of experts. We show regret decomposes into regret the learner expected a priori, plus a prior-robustness-type term we call excess regret. In the classical finite-expert setting, this recovers optimal rates. As an initial step towards practical online learning in settings with a potentially-uncountably-infinite number of experts, we show that Thompson sampling with a certain Gaussian process prior widely-used in the Bayesian optimization literature has a $\mathcal{O}(\beta\sqrt{T\log(1+\lambda)})$ rate against a $\beta$-bounded $\lambda$-Lipschitz~adversary.
Related papers
- Approximate Thompson Sampling for Learning Linear Quadratic Regulators with $O(\sqrt{T})$ Regret [10.541541376305243]
We propose an approximate Thompson sampling algorithm that learns linear quadratic regulators (LQR) with an improved Bayesian regret bound of $O(sqrtT)$.
We show that the excitation signal induces the minimum eigenvalue of the preconditioner to grow over time, thereby accelerating the approximate posterior sampling process.
arXiv Detail & Related papers (2024-05-29T03:24:56Z) - Thompson Sampling for Infinite-Horizon Discounted Decision Processes [0.0]
We study the behavior of a sampling-based algorithm, called Thompson sampling.
By decomposing the standard (expected) regret, we develop a new metric, called the expected residual regret.
arXiv Detail & Related papers (2024-05-14T01:01:05Z) - Improved Regret for Efficient Online Reinforcement Learning with Linear
Function Approximation [69.0695698566235]
We study reinforcement learning with linear function approximation and adversarially changing cost functions.
We present a computationally efficient policy optimization algorithm for the challenging general setting of unknown dynamics and bandit feedback.
arXiv Detail & Related papers (2023-01-30T17:26:39Z) - Thompson Sampling for High-Dimensional Sparse Linear Contextual Bandits [17.11922027966447]
This work provides theoretical guarantees of Thompson sampling in high dimensional and sparse contextual bandits.
For faster computation, we use spike-and-slab prior to model the unknown parameter and variational inference instead of MCMC.
arXiv Detail & Related papers (2022-11-11T02:23:39Z) - Finite-Time Regret of Thompson Sampling Algorithms for Exponential
Family Multi-Armed Bandits [88.21288104408556]
We study the regret of Thompson sampling (TS) algorithms for exponential family bandits, where the reward distribution is from a one-dimensional exponential family.
We propose a Thompson sampling, termed Expulli, which uses a novel sampling distribution to avoid the under-estimation of the optimal arm.
arXiv Detail & Related papers (2022-06-07T18:08:21Z) - A Regret-Variance Trade-Off in Online Learning [14.41667013062914]
We show how variance of predictions can be exploited in learning.
In online prediction with corrupted losses, we show that the effect of corruption on the regret can be compensated by a large variance.
We extend our results to the setting of online linear regression.
arXiv Detail & Related papers (2022-06-06T14:50:19Z) - Bayesian decision-making under misspecified priors with applications to
meta-learning [64.38020203019013]
Thompson sampling and other sequential decision-making algorithms are popular approaches to tackle explore/exploit trade-offs in contextual bandits.
We show that performance degrades gracefully with misspecified priors.
arXiv Detail & Related papers (2021-07-03T23:17:26Z) - Asymptotic Convergence of Thompson Sampling [0.0]
Thompson sampling has been shown to be an effective policy across a variety of online learning tasks.
We prove an convergence result for Thompson sampling under the assumption of a sub-linear Bayesian regret.
Our results rely on the martingale structure inherent in Thompson sampling.
arXiv Detail & Related papers (2020-11-08T07:36:49Z) - Analysis and Design of Thompson Sampling for Stochastic Partial
Monitoring [91.22679787578438]
We present a novel Thompson-sampling-based algorithm for partial monitoring.
We prove that the new algorithm achieves the logarithmic problem-dependent expected pseudo-regret $mathrmO(log T)$ for a linearized variant of the problem with local observability.
arXiv Detail & Related papers (2020-06-17T05:48:33Z) - Upper Confidence Primal-Dual Reinforcement Learning for CMDP with
Adversarial Loss [145.54544979467872]
We consider online learning for episodically constrained Markov decision processes (CMDPs)
We propose a new emphupper confidence primal-dual algorithm, which only requires the trajectories sampled from the transition model.
Our analysis incorporates a new high-probability drift analysis of Lagrange multiplier processes into the celebrated regret analysis of upper confidence reinforcement learning.
arXiv Detail & Related papers (2020-03-02T05:02:23Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.