On the Sublinear Regret of GP-UCB
- URL: http://arxiv.org/abs/2307.07539v2
- Date: Mon, 14 Aug 2023 17:22:21 GMT
- Title: On the Sublinear Regret of GP-UCB
- Authors: Justin Whitehouse, Zhiwei Steven Wu, Aaditya Ramdas
- Abstract summary: We show that the Gaussian Process Upper Confidence Bound (GP-UCB) algorithm enjoys nearly optimal regret rates.
Our improvements rely on a key technical contribution -- regularizing kernel ridge estimators in proportion to the smoothness of the underlying kernel.
- Score: 58.25014663727544
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In the kernelized bandit problem, a learner aims to sequentially compute the
optimum of a function lying in a reproducing kernel Hilbert space given only
noisy evaluations at sequentially chosen points. In particular, the learner
aims to minimize regret, which is a measure of the suboptimality of the choices
made. Arguably the most popular algorithm is the Gaussian Process Upper
Confidence Bound (GP-UCB) algorithm, which involves acting based on a simple
linear estimator of the unknown function. Despite its popularity, existing
analyses of GP-UCB give a suboptimal regret rate, which fails to be sublinear
for many commonly used kernels such as the Mat\'ern kernel. This has led to a
longstanding open question: are existing regret analyses for GP-UCB tight, or
can bounds be improved by using more sophisticated analytical techniques? In
this work, we resolve this open question and show that GP-UCB enjoys nearly
optimal regret. In particular, our results yield sublinear regret rates for the
Mat\'ern kernel, improving over the state-of-the-art analyses and partially
resolving a COLT open problem posed by Vakili et al. Our improvements rely on a
key technical contribution -- regularizing kernel ridge estimators in
proportion to the smoothness of the underlying kernel $k$. Applying this key
idea together with a largely overlooked concentration result in separable
Hilbert spaces (for which we provide an independent, simplified derivation), we
are able to provide a tighter analysis of the GP-UCB algorithm.
Related papers
- Regret Analysis for Randomized Gaussian Process Upper Confidence Bound [9.967062483758632]
This paper analyzes the randomized variant of GP-UCB called improved randomized GP-UCB.
In both regret analyses, IRGP-UCB achieves a sub-linear regret upper bound without increasing the confidence parameter if the input domain is finite.
arXiv Detail & Related papers (2024-09-02T06:49:29Z) - Regret Optimality of GP-UCB [12.323109084902228]
Gaussian Process Upper Confidence Bound (GP-UCB) is one of the most popular methods for optimizing black-box functions with noisy observations.
We establish new upper bounds on both the simple and cumulative regret of GP-UCB when the objective function to optimize admits certain smoothness property.
With the same level of exploration, GP-UCB can simultaneously achieve optimality in both simple and cumulative regret.
arXiv Detail & Related papers (2023-12-03T13:20:08Z) - Quantum Bayesian Optimization [64.58749619145908]
We introduce the quantum-Gaussian process-upper confidence bound (Q-GP-UCB) algorithm.
It is the first BO algorithm able to achieve a regret upper bound of O(polylog T), which is significantly smaller than its regret lower bound of Omega(sqrt(T)) in the classical setting.
Thanks to our novel analysis of the confidence ellipsoid, our Q-GP-UCB with the linear kernel achieves a smaller regret than the quantum linear UCB algorithm.
arXiv Detail & Related papers (2023-10-09T03:10:42Z) - Randomized Gaussian Process Upper Confidence Bound with Tighter Bayesian
Regret Bounds [9.89553853547974]
This study first generalizes the regret analysis of RGP-UCB to a wider class of distributions, including the Gamma distribution.
We propose improved RGP-UCB based on a two- parameter exponential distribution, which achieves tighter Bayesian regret bounds.
We demonstrate the effectiveness of IRGP-UCB through extensive experiments.
arXiv Detail & Related papers (2023-02-03T02:48:48Z) - Sample-Then-Optimize Batch Neural Thompson Sampling [50.800944138278474]
We introduce two algorithms for black-box optimization based on the Thompson sampling (TS) policy.
To choose an input query, we only need to train an NN and then choose the query by maximizing the trained NN.
Our algorithms sidestep the need to invert the large parameter matrix yet still preserve the validity of the TS policy.
arXiv Detail & Related papers (2022-10-13T09:01:58Z) - On the Benefits of Large Learning Rates for Kernel Methods [110.03020563291788]
We show that a phenomenon can be precisely characterized in the context of kernel methods.
We consider the minimization of a quadratic objective in a separable Hilbert space, and show that with early stopping, the choice of learning rate influences the spectral decomposition of the obtained solution.
arXiv Detail & Related papers (2022-02-28T13:01:04Z) - Meta-Learning Hypothesis Spaces for Sequential Decision-making [79.73213540203389]
We propose to meta-learn a kernel from offline data (Meta-KeL)
Under mild conditions, we guarantee that our estimated RKHS yields valid confidence sets.
We also empirically evaluate the effectiveness of our approach on a Bayesian optimization task.
arXiv Detail & Related papers (2022-02-01T17:46:51Z) - Misspecified Gaussian Process Bandit Optimization [59.30399661155574]
Kernelized bandit algorithms have shown strong empirical and theoretical performance for this problem.
We introduce a emphmisspecified kernelized bandit setting where the unknown function can be $epsilon$--uniformly approximated by a function with a bounded norm in some Reproducing Kernel Hilbert Space (RKHS)
We show that our algorithm achieves optimal dependence on $epsilon$ with no prior knowledge of misspecification.
arXiv Detail & Related papers (2021-11-09T09:00:02Z) - Weighted Gaussian Process Bandits for Non-stationary Environments [30.49357960656324]
We develop WGP-UCB, a novel UCB-type algorithm based on weighted Gaussian process regression.
A key challenge is how to cope with infinite-dimensional feature maps.
We provide universal upper bounds and weight-dependent upper bounds for weighted maximum information gains.
arXiv Detail & Related papers (2021-07-06T03:37:33Z) - No-Regret Algorithms for Time-Varying Bayesian Optimization [0.0]
We adopt the general variation budget model to capture the time-varying environment.
We introduce two GP-UCB type algorithms: R-GP-UCB and SW-GP-UCB, respectively.
Our results not only recover previous linear bandit results when a linear kernel is used, but complement the previous regret analysis of time-varying Gaussian process bandit.
arXiv Detail & Related papers (2021-02-11T22:35:32Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.