Beyond Lengthscales: No-regret Bayesian Optimisation With Unknown
Hyperparameters Of Any Type
- URL: http://arxiv.org/abs/2402.01632v2
- Date: Tue, 13 Feb 2024 17:27:45 GMT
- Title: Beyond Lengthscales: No-regret Bayesian Optimisation With Unknown
Hyperparameters Of Any Type
- Authors: Juliusz Ziomek, Masaki Adachi, Michael A. Osborne
- Abstract summary: HE-GP-UCB is the first algorithm enjoying the no-regret property in the case of unknown hyperparameters of arbitrary form.
Our proof idea is novel and can easily be extended to other variants of Bayesian optimisation.
- Score: 21.28080111271296
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Bayesian optimisation requires fitting a Gaussian process model, which in
turn requires specifying hyperparameters - most of the theoretical literature
assumes those hyperparameters are known. The commonly used maximum likelihood
estimator for hyperparameters of the Gaussian process is consistent only if the
data fills the space uniformly, which does not have to be the case in Bayesian
optimisation. Since no guarantees exist regarding the correctness of
hyperparameter estimation, and those hyperparameters can significantly affect
the Gaussian process fit, theoretical analysis of Bayesian optimisation with
unknown hyperparameters is very challenging. Previously proposed algorithms
with the no-regret property were only able to handle the special case of
unknown lengthscales, reproducing kernel Hilbert space norm and applied only to
the frequentist case. We propose a novel algorithm, HE-GP-UCB, which is the
first algorithm enjoying the no-regret property in the case of unknown
hyperparameters of arbitrary form, and which supports both Bayesian and
frequentist settings. Our proof idea is novel and can easily be extended to
other variants of Bayesian optimisation. We show this by extending our
algorithm to the adversarially robust optimisation setting under unknown
hyperparameters. Finally, we empirically evaluate our algorithm on a set of toy
problems and show that it can outperform the maximum likelihood estimator.
Related papers
- Near-Optimal Algorithm for Non-Stationary Kernelized Bandits [6.379833644595456]
We study a non-stationary kernelized bandit (KB) problem, also called time-varying Bayesian optimization.
We show the first algorithm-independent regret lower bound for non-stationary KB with squared exponential and Mat'ern kernels.
We propose a novel near-optimal algorithm called restarting phased elimination with random permutation.
arXiv Detail & Related papers (2024-10-21T14:28:26Z) - Principled Preferential Bayesian Optimization [22.269732173306192]
We study the problem of preferential Bayesian optimization (BO)
We aim to optimize a black-box function with only preference feedback over a pair of candidate solutions.
An optimistic algorithm with an efficient computational method is then developed to solve the problem.
arXiv Detail & Related papers (2024-02-08T02:57:47Z) - Branch & Learn with Post-hoc Correction for Predict+Optimize with
Unknown Parameters in Constraints [5.762370982168012]
Post-hoc Regret is a loss function that takes into account the cost of correcting an unsatisfiable prediction.
We show how to compute Post-hoc Regret exactly for any optimization problem solvable by a recursion algorithm satisfying simple conditions.
arXiv Detail & Related papers (2023-03-12T16:23:58Z) - Generalizing Bayesian Optimization with Decision-theoretic Entropies [102.82152945324381]
We consider a generalization of Shannon entropy from work in statistical decision theory.
We first show that special cases of this entropy lead to popular acquisition functions used in BO procedures.
We then show how alternative choices for the loss yield a flexible family of acquisition functions.
arXiv Detail & Related papers (2022-10-04T04:43:58Z) - Regret Bounds for Expected Improvement Algorithms in Gaussian Process
Bandit Optimization [63.8557841188626]
The expected improvement (EI) algorithm is one of the most popular strategies for optimization under uncertainty.
We propose a variant of EI with a standard incumbent defined via the GP predictive mean.
We show that our algorithm converges, and achieves a cumulative regret bound of $mathcal O(gamma_TsqrtT)$.
arXiv Detail & Related papers (2022-03-15T13:17:53Z) - Metalearning Linear Bandits by Prior Update [7.519872646378836]
Fully Bayesian approaches assume that problem parameters are generated from a known prior, while in practice, such information is often lacking.
This problem is exacerbated in decision-making setups with partial information, where using a misspecified prior may lead to poor exploration and inferior performance.
In this work we prove, in the context of linear bandits and Gaussian priors, that as long as the prior estimate is sufficiently close to the true prior, the performance of an algorithm that uses the misspecified prior is close to that of an algorithm that uses the true prior.
arXiv Detail & Related papers (2021-07-12T11:17:01Z) - Bayesian decision-making under misspecified priors with applications to
meta-learning [64.38020203019013]
Thompson sampling and other sequential decision-making algorithms are popular approaches to tackle explore/exploit trade-offs in contextual bandits.
We show that performance degrades gracefully with misspecified priors.
arXiv Detail & Related papers (2021-07-03T23:17:26Z) - Towards Feature-Based Performance Regression Using Trajectory Data [0.9281671380673306]
Black-box optimization is a very active area of research, with many new algorithms being developed every year.
The variety of algorithms poses a meta-problem: which algorithm to choose for a given problem at hand?
Past research has shown that per-instance algorithm selection based on exploratory landscape analysis can be an efficient mean to tackle this meta-problem.
arXiv Detail & Related papers (2021-02-10T10:19:13Z) - Recent Theoretical Advances in Non-Convex Optimization [56.88981258425256]
Motivated by recent increased interest in analysis of optimization algorithms for non- optimization in deep networks and other problems in data, we give an overview of recent results of theoretical optimization algorithms for non- optimization.
arXiv Detail & Related papers (2020-12-11T08:28:51Z) - Incorporating Expert Prior in Bayesian Optimisation via Space Warping [54.412024556499254]
In big search spaces the algorithm goes through several low function value regions before reaching the optimum of the function.
One approach to subside this cold start phase is to use prior knowledge that can accelerate the optimisation.
In this paper, we represent the prior knowledge about the function optimum through a prior distribution.
The prior distribution is then used to warp the search space in such a way that space gets expanded around the high probability region of function optimum and shrinks around low probability region of optimum.
arXiv Detail & Related papers (2020-03-27T06:18:49Z) - Time-varying Gaussian Process Bandit Optimization with Non-constant
Evaluation Time [93.6788993843846]
We propose a novel time-varying Bayesian optimization algorithm that can effectively handle the non-constant evaluation time.
Our bound elucidates that a pattern of the evaluation time sequence can hugely affect the difficulty of the problem.
arXiv Detail & Related papers (2020-03-10T13:28:33Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.