On Information Gain and Regret Bounds in Gaussian Process Bandits
- URL: http://arxiv.org/abs/2009.06966v3
- Date: Tue, 9 Mar 2021 22:46:52 GMT
- Title: On Information Gain and Regret Bounds in Gaussian Process Bandits
- Authors: Sattar Vakili, Kia Khezeli, Victor Picheny
- Abstract summary: Consider the optimization of an expensive to evaluate and possibly non- sequential objective function $f$ from noisy feedback observations.
For the Matern family of kernels, lower bounds on $gamma_T$, and regret under the frequentist setting, are known.
- Score: 8.499604625449907
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Consider the sequential optimization of an expensive to evaluate and possibly
non-convex objective function $f$ from noisy feedback, that can be considered
as a continuum-armed bandit problem. Upper bounds on the regret performance of
several learning algorithms (GP-UCB, GP-TS, and their variants) are known under
both a Bayesian (when $f$ is a sample from a Gaussian process (GP)) and a
frequentist (when $f$ lives in a reproducing kernel Hilbert space) setting. The
regret bounds often rely on the maximal information gain $\gamma_T$ between $T$
observations and the underlying GP (surrogate) model. We provide general bounds
on $\gamma_T$ based on the decay rate of the eigenvalues of the GP kernel,
whose specialisation for commonly used kernels, improves the existing bounds on
$\gamma_T$, and subsequently the regret bounds relying on $\gamma_T$ under
numerous settings. For the Mat\'ern family of kernels, where the lower bounds
on $\gamma_T$, and regret under the frequentist setting, are known, our results
close a huge polynomial in $T$ gap between the upper and lower bounds (up to
logarithmic in $T$ factors).
Related papers
- Nearly Minimax Optimal Regret for Learning Linear Mixture Stochastic
Shortest Path [80.60592344361073]
We study the Shortest Path (SSP) problem with a linear mixture transition kernel.
An agent repeatedly interacts with a environment and seeks to reach certain goal state while minimizing the cumulative cost.
Existing works often assume a strictly positive lower bound of the iteration cost function or an upper bound of the expected length for the optimal policy.
arXiv Detail & Related papers (2024-02-14T07:52:00Z) - Kernelized Normalizing Constant Estimation: Bridging Bayesian Quadrature
and Bayesian Optimization [51.533164528799084]
We show that to estimate the normalizing constant within a small relative error, the level of difficulty depends on the value of $lambda$.
We find that this pattern holds true even when the function evaluations are noisy.
arXiv Detail & Related papers (2024-01-11T07:45:09Z) - On the Sublinear Regret of GP-UCB [58.25014663727544]
We show that the Gaussian Process Upper Confidence Bound (GP-UCB) algorithm enjoys nearly optimal regret rates.
Our improvements rely on a key technical contribution -- regularizing kernel ridge estimators in proportion to the smoothness of the underlying kernel.
arXiv Detail & Related papers (2023-07-14T13:56:11Z) - Delayed Feedback in Kernel Bandits [16.11544965325835]
Black box optimisation of an unknown function is a ubiquitous problem in machine learning, academic research and industrial production.
We consider a kernel bandit problem underally delayed feedback, and propose an algorithm with $tildemathcalO(sqrtGamma_k(T)T+mathbbE[tau]Gamma_k(T))$
This represents a significant improvement over the state of the art regret bound of $tildemathcalO(Gamma_k(T)sqrtT
arXiv Detail & Related papers (2023-02-01T12:03:19Z) - A Robust Phased Elimination Algorithm for Corruption-Tolerant Gaussian
Process Bandits [118.22458816174144]
We propose a novel robust elimination-type algorithm that runs in epochs, combines exploration with infrequent switching to select a small subset of actions, and plays each action for multiple time instants.
Our algorithm, GP Robust Phased Elimination (RGP-PE), successfully balances robustness to corruptions with exploration and exploitation.
We perform the first empirical study of robustness in the corrupted GP bandit setting, and show that our algorithm is robust against a variety of adversarial attacks.
arXiv Detail & Related papers (2022-02-03T21:19:36Z) - Misspecified Gaussian Process Bandit Optimization [59.30399661155574]
Kernelized bandit algorithms have shown strong empirical and theoretical performance for this problem.
We introduce a emphmisspecified kernelized bandit setting where the unknown function can be $epsilon$--uniformly approximated by a function with a bounded norm in some Reproducing Kernel Hilbert Space (RKHS)
We show that our algorithm achieves optimal dependence on $epsilon$ with no prior knowledge of misspecification.
arXiv Detail & Related papers (2021-11-09T09:00:02Z) - Gaussian Process Bandit Optimization with Few Batches [49.896920704012395]
We introduce a batch algorithm inspired by finite-arm bandit algorithms.
We show that it achieves the cumulative regret upper bound $Oast(sqrtTgamma_T)$ using $O(loglog T)$ batches within time horizon $T$.
In addition, we propose a modified version of our algorithm, and characterize how the regret is impacted by the number of batches.
arXiv Detail & Related papers (2021-10-15T00:54:04Z) - Optimal Order Simple Regret for Gaussian Process Bandits [6.84224661918159]
We show that a pure exploration algorithm is significantly tighter than the existing bounds.
We show that this regret is optimal up to logarithmically for cases where a lower bound on a kernel is known.
arXiv Detail & Related papers (2021-08-20T16:49:32Z) - Multi-Scale Zero-Order Optimization of Smooth Functions in an RKHS [19.252319300590653]
A black-box function $f:mathcalX mapsto mathbbR$ is optimized under the assumption that $f$ is H"older smooth and has bounded norm in the RKHS associated with a given kernel $K$.
In this paper, we propose a new algorithm (textttLP-GP-UCB) where the usual GP surrogate model is augmented with Local Polynomial (LP) estimators of the H" smooth parameter $f$.
arXiv Detail & Related papers (2020-05-11T01:55:39Z) - Regret and Belief Complexity Trade-off in Gaussian Process Bandits via
Information Thresholding [42.669970064867556]
We show how to characterize the trade-off between regret bounds of GP bandit algorithms and complexity of the posterior distributions.
We observe state of the art accuracy and complexity trade-offs for GP bandit algorithms applied to global optimization.
arXiv Detail & Related papers (2020-03-23T21:05:15Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.