Improved Regret Analysis in Gaussian Process Bandits: Optimality for Noiseless Reward, RKHS norm, and Non-Stationary Variance
- URL: http://arxiv.org/abs/2502.06363v1
- Date: Mon, 10 Feb 2025 11:29:27 GMT
- Title: Improved Regret Analysis in Gaussian Process Bandits: Optimality for Noiseless Reward, RKHS norm, and Non-Stationary Variance
- Authors: Shogo Iwazaki, Shion Takeno,
- Abstract summary: We study the Gaussian process (GP) bandit problem, whose goal is to minimize regret under an unknown reward function.
We show the new upper bound of the maximum posterior variance, which improves the dependence of the noise variance parameters of the GP.
We show that MVR and PE-based algorithms achieve noise variance-dependent regret upper bounds, which matches our regret lower bound.
- Score: 6.379833644595456
- License:
- Abstract: We study the Gaussian process (GP) bandit problem, whose goal is to minimize regret under an unknown reward function lying in some reproducing kernel Hilbert space (RKHS). The maximum posterior variance analysis is vital in analyzing near-optimal GP bandit algorithms such as maximum variance reduction (MVR) and phased elimination (PE). Therefore, we first show the new upper bound of the maximum posterior variance, which improves the dependence of the noise variance parameters of the GP. By leveraging this result, we refine the MVR and PE to obtain (i) a nearly optimal regret upper bound in the noiseless setting and (ii) regret upper bounds that are optimal with respect to the RKHS norm of the reward function. Furthermore, as another application of our proposed bound, we analyze the GP bandit under the time-varying noise variance setting, which is the kernelized extension of the linear bandit with heteroscedastic noise. For this problem, we show that MVR and PE-based algorithms achieve noise variance-dependent regret upper bounds, which matches our regret lower bound.
Related papers
- On Improved Regret Bounds In Bayesian Optimization with Gaussian Noise [2.250251490529229]
convergence analysis of BO algorithms has focused on the cumulative regret under both the Bayesian and frequentist settings for the objective.
We establish new pointwise on the prediction error of GP under the frequentist setting with Gaussian noise.
We prove improved convergence rates of cumulative regret bound for both GP-UCB and GP-TS.
arXiv Detail & Related papers (2024-12-25T05:57:27Z) - Model-Based Epistemic Variance of Values for Risk-Aware Policy Optimization [59.758009422067]
We consider the problem of quantifying uncertainty over expected cumulative rewards in model-based reinforcement learning.
We propose a new uncertainty Bellman equation (UBE) whose solution converges to the true posterior variance over values.
We introduce a general-purpose policy optimization algorithm, Q-Uncertainty Soft Actor-Critic (QU-SAC) that can be applied for either risk-seeking or risk-averse policy optimization.
arXiv Detail & Related papers (2023-12-07T15:55:58Z) - On the Sublinear Regret of GP-UCB [58.25014663727544]
We show that the Gaussian Process Upper Confidence Bound (GP-UCB) algorithm enjoys nearly optimal regret rates.
Our improvements rely on a key technical contribution -- regularizing kernel ridge estimators in proportion to the smoothness of the underlying kernel.
arXiv Detail & Related papers (2023-07-14T13:56:11Z) - Small noise analysis for Tikhonov and RKHS regularizations [0.8133739801185272]
We establish a small noise analysis framework to assess the effects of norms in Tikhonov and RKHS regularizations.
This framework studies the convergence rates of regularized estimators in the small noise limit and reveals the potential instability of the conventional L2-regularizer.
A surprising insight is that over-smoothing via these fractional RKHSs consistently yields optimal convergence rates.
arXiv Detail & Related papers (2023-05-18T15:50:33Z) - Regret Bounds for Expected Improvement Algorithms in Gaussian Process
Bandit Optimization [63.8557841188626]
The expected improvement (EI) algorithm is one of the most popular strategies for optimization under uncertainty.
We propose a variant of EI with a standard incumbent defined via the GP predictive mean.
We show that our algorithm converges, and achieves a cumulative regret bound of $mathcal O(gamma_TsqrtT)$.
arXiv Detail & Related papers (2022-03-15T13:17:53Z) - Optimizing Information-theoretical Generalization Bounds via Anisotropic
Noise in SGLD [73.55632827932101]
We optimize the information-theoretical generalization bound by manipulating the noise structure in SGLD.
We prove that with constraint to guarantee low empirical risk, the optimal noise covariance is the square root of the expected gradient covariance.
arXiv Detail & Related papers (2021-10-26T15:02:27Z) - Weighted Gaussian Process Bandits for Non-stationary Environments [30.49357960656324]
We develop WGP-UCB, a novel UCB-type algorithm based on weighted Gaussian process regression.
A key challenge is how to cope with infinite-dimensional feature maps.
We provide universal upper bounds and weight-dependent upper bounds for weighted maximum information gains.
arXiv Detail & Related papers (2021-07-06T03:37:33Z) - Scalable Variational Gaussian Processes via Harmonic Kernel
Decomposition [54.07797071198249]
We introduce a new scalable variational Gaussian process approximation which provides a high fidelity approximation while retaining general applicability.
We demonstrate that, on a range of regression and classification problems, our approach can exploit input space symmetries such as translations and reflections.
Notably, our approach achieves state-of-the-art results on CIFAR-10 among pure GP models.
arXiv Detail & Related papers (2021-06-10T18:17:57Z) - No-Regret Algorithms for Time-Varying Bayesian Optimization [0.0]
We adopt the general variation budget model to capture the time-varying environment.
We introduce two GP-UCB type algorithms: R-GP-UCB and SW-GP-UCB, respectively.
Our results not only recover previous linear bandit results when a linear kernel is used, but complement the previous regret analysis of time-varying Gaussian process bandit.
arXiv Detail & Related papers (2021-02-11T22:35:32Z) - On Signal-to-Noise Ratio Issues in Variational Inference for Deep
Gaussian Processes [55.62520135103578]
We show that the gradient estimates used in training Deep Gaussian Processes (DGPs) with importance-weighted variational inference are susceptible to signal-to-noise ratio (SNR) issues.
We show that our fix can lead to consistent improvements in the predictive performance of DGP models.
arXiv Detail & Related papers (2020-11-01T14:38:02Z) - An Optimal Multistage Stochastic Gradient Method for Minimax Problems [8.615625517708324]
We study the minimax optimization problem in the smooth and strongly convex-strongly concave setting.
We first analyze the Gradient Descent Ascent (GDA) method with constant stepsize.
We propose a multistage variant of GDA that runs in multiple stages with a particular learning rate decay schedule.
arXiv Detail & Related papers (2020-02-13T18:01:18Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.