Related papers: Frequentist Regret Analysis of Gaussian Process Thompson Sampling via Fractional Posteriors

Frequentist Regret Analysis of Gaussian Process Thompson Sampling via Fractional Posteriors

URL: http://arxiv.org/abs/2602.14472v1
Date: Mon, 16 Feb 2026 05:18:13 GMT
Title: Frequentist Regret Analysis of Gaussian Process Thompson Sampling via Fractional Posteriors
Authors: Somjit Roy, Prateek Jaiswal, Anirban Bhattacharya, Debdeep Pati, Bani K. Mallick,
Abstract summary: We show that variance inflation commonly assumed in existing analyses of GP-TS can be interpreted as Thompson Sampling.<n>We derive a kernel-agnostic regret bound expressed in terms of the information gain parameter $_t$ and the posterior contraction rate $_t$.<n>Overall, our analysis provides a unified and discretization-free regret framework for GP-TS that applies broadly across kernel classes.
Score: 5.7099901327884695
License: http://creativecommons.org/licenses/by/4.0/
Abstract: We study Gaussian Process Thompson Sampling (GP-TS) for sequential decision-making over compact, continuous action spaces and provide a frequentist regret analysis based on fractional Gaussian process posteriors, without relying on domain discretization as in prior work. We show that the variance inflation commonly assumed in existing analyses of GP-TS can be interpreted as Thompson Sampling with respect to a fractional posterior with tempering parameter $α\in (0,1)$. We derive a kernel-agnostic regret bound expressed in terms of the information gain parameter $γ_t$ and the posterior contraction rate $ε_t$, and identify conditions on the Gaussian process prior under which $ε_t$ can be controlled. As special cases of our general bound, we recover regret of order $\tilde{\mathcal{O}}(T^{\frac{1}{2}})$ for the squared exponential kernel, $\tilde{\mathcal{O}}(T^{\frac{2ν+3d}{2(2ν+d)}} )$ for the Matérn-$ν$ kernel, and a bound of order $\tilde{\mathcal{O}}(T^{\frac{2ν+3d}{2(2ν+d)}})$ for the rational quadratic kernel. Overall, our analysis provides a unified and discretization-free regret framework for GP-TS that applies broadly across kernel classes.

Related papers

Variance-Aware Feel-Good Thompson Sampling for Contextual Bandits [54.220839560203096]
We present FGTSVA, a variance-aware Thompson Sampling algorithm for contextual bandits with general reward function with optimal regret bound.<n>With the new decoupling coefficient denoted by $mathrmdc$, FGTS-VA achieves the regret of $tildeO(sqrtmathrmdccdotlog|mathcalF|$, where $|mathcalF|$ is the size of the model space.<n>In the setting of contextual linear bandits, the regret bound of FGTSVA matches that of UCB-based
arXiv Detail & Related papers (2025-11-03T23:25:41Z)
Improved Regret Bounds for Gaussian Process Upper Confidence Bound in Bayesian Optimization [3.6985338895569204]
We show that the Gaussian process GP-UCB algorithm achieves $tildeO(sqrtT)$ cumulative regret with high probability.<n>Our analysis yields $O(sqrtT ln2 T)$ regret under a squared exponential kernel.
arXiv Detail & Related papers (2025-06-02T07:38:58Z)
Learning with Norm Constrained, Over-parameterized, Two-layer Neural Networks [54.177130905659155]
Recent studies show that a reproducing kernel Hilbert space (RKHS) is not a suitable space to model functions by neural networks. In this paper, we study a suitable function space for over- parameterized two-layer neural networks with bounded norms.
arXiv Detail & Related papers (2024-04-29T15:04:07Z)
Generalized Regret Analysis of Thompson Sampling using Fractional Posteriors [12.43000662545423]
Thompson sampling (TS) is one of the most popular and earliest algorithms to solve multi-armed bandit problems. We consider a variant of TS, named $alpha$-TS, where we use a fractional or $alpha$-posterior instead of the standard posterior distribution.
arXiv Detail & Related papers (2023-09-12T16:15:33Z)
Optimal and instance-dependent guarantees for Markovian linear stochastic approximation [47.912511426974376]
We show a non-asymptotic bound of the order $t_mathrmmix tfracdn$ on the squared error of the last iterate of a standard scheme. We derive corollaries of these results for policy evaluation with Markov noise.
arXiv Detail & Related papers (2021-12-23T18:47:50Z)
Optimal policy evaluation using kernel-based temporal difference methods [78.83926562536791]
We use kernel Hilbert spaces for estimating the value function of an infinite-horizon discounted Markov reward process. We derive a non-asymptotic upper bound on the error with explicit dependence on the eigenvalues of the associated kernel operator. We prove minimax lower bounds over sub-classes of MRPs.
arXiv Detail & Related papers (2021-09-24T14:48:20Z)
Online nonparametric regression with Sobolev kernels [99.12817345416846]
We derive the regret upper bounds on the classes of Sobolev spaces $W_pbeta(mathcalX)$, $pgeq 2, beta>fracdp$. The upper bounds are supported by the minimax regret analysis, which reveals that in the cases $beta> fracd2$ or $p=infty$ these rates are (essentially) optimal.
arXiv Detail & Related papers (2021-02-06T15:05:14Z)
Nonparametric approximation of conditional expectation operators [0.3655021726150368]
We investigate the approximation of the $L2$-operator defined by $[Pf](x) := mathbbE[ f(Y) mid X = x ]$ under minimal assumptions. We prove that $P$ can be arbitrarily well approximated in operator norm by Hilbert-Schmidt operators acting on a reproducing kernel space.
arXiv Detail & Related papers (2020-12-23T19:06:12Z)
On Information Gain and Regret Bounds in Gaussian Process Bandits [8.499604625449907]
Consider the optimization of an expensive to evaluate and possibly non- sequential objective function $f$ from noisy feedback observations. For the Matern family of kernels, lower bounds on $gamma_T$, and regret under the frequentist setting, are known.
arXiv Detail & Related papers (2020-09-15T10:15:29Z)
Kernel-Based Reinforcement Learning: A Finite-Time Analysis [53.47210316424326]
We introduce Kernel-UCBVI, a model-based optimistic algorithm that leverages the smoothness of the MDP and a non-parametric kernel estimator of the rewards. We empirically validate our approach in continuous MDPs with sparse rewards.
arXiv Detail & Related papers (2020-04-12T12:23:46Z)
Tight Regret Bounds for Bayesian Optimization in One Dimension [47.51554144092745]
We consider the problem of Bayesian optimization (BO) in one dimension, under a Gaussian process prior and Gaussian sampling noise.<n>We provide a theoretical analysis showing that, under fairly mild technical assumptions on the kernel, the best possible cumulative regret up to time $T$ behaves as $Omega(sqrtT)$ and $O(sqrtTlog T)$.
arXiv Detail & Related papers (2018-05-30T03:33:37Z)

This list is automatically generated from the titles and abstracts of the papers in this site.