Bayesian Optimisation with Unknown Hyperparameters: Regret Bounds Logarithmically Closer to Optimal
- URL: http://arxiv.org/abs/2410.10384v3
- Date: Fri, 22 Nov 2024 23:40:56 GMT
- Title: Bayesian Optimisation with Unknown Hyperparameters: Regret Bounds Logarithmically Closer to Optimal
- Authors: Juliusz Ziomek, Masaki Adachi, Michael A. Osborne,
- Abstract summary: We introduce Length scale Balancing (LB) - a novel approach aggregating multiple base surrogate models with varying length scales.
LB adds length scale candidate values while retaining longer scales, balancing exploration and exploitation.
We show that LB is only $log g(T) away from oracle regret.
- Score: 18.93478528448966
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Bayesian Optimization (BO) is widely used for optimising black-box functions but requires us to specify the length scale hyperparameter, which defines the smoothness of the functions the optimizer will consider. Most current BO algorithms choose this hyperparameter by maximizing the marginal likelihood of the observed data, albeit risking misspecification if the objective function is less smooth in regions we have not yet explored. The only prior solution addressing this problem with theoretical guarantees was A-GP-UCB, proposed by Berkenkamp et al. (2019). This algorithm progressively decreases the length scale, expanding the class of functions considered by the optimizer. However, A-GP-UCB lacks a stopping mechanism, leading to over-exploration and slow convergence. To overcome this, we introduce Length scale Balancing (LB) - a novel approach, aggregating multiple base surrogate models with varying length scales. LB intermittently adds smaller length scale candidate values while retaining longer scales, balancing exploration and exploitation. We formally derive a cumulative regret bound of LB and compare it with the regret of an oracle BO algorithm using the optimal length scale. Denoting the factor by which the regret bound of A-GP-UCB was away from oracle as $g(T)$, we show that LB is only $\log g(T)$ away from oracle regret. We also empirically evaluate our algorithm on synthetic and real-world benchmarks and show it outperforms A-GP-UCB, maximum likelihood estimation and MCMC.
Related papers
- Nonlinear Dimensionality Reduction Techniques for Bayesian Optimization [0.9303501974597549]
We investigate nonlinear dimensionality reduction techniques that reduce the problem to a sequence of low-dimensional Latent-Space BO (LSBO)<n>We propose some changes in their implementation, originally designed for tasks such as molecule generation, and reformulate the algorithm for broader optimisation purposes.<n>We then couple LSBO with Sequential Domain Reduction (SDR) directly in the latent space (SDR-LSBO), yielding an algorithm that narrows the latent search domains as evidence accumulates.
arXiv Detail & Related papers (2025-10-17T08:45:38Z) - Quantum Non-Linear Bandit Optimization [2.7718037613443127]
We study non-linear bandit optimization where the learner maximizes a black-box function with zeroth order function oracle.
We propose the new Q-NLB-UCB algorithm which uses the novel parametric function approximation technique.
We prove that the regret bound of Q-NLB-UCB is not only $O(mathrmpolylog T)$ but also input dimension-free.
arXiv Detail & Related papers (2025-03-04T21:53:33Z) - The Overlap Gap Property limits limit swapping in QAOA [0.0]
We show that the average-case value obtained by QAOA for the Max-$q$-XORSAT for even $qge 4$ is bounded away from optimality.
Results suggest that even when sub-optimised, the performance of QAOA on spin glass is equal in performance to classical algorithms.
arXiv Detail & Related papers (2024-04-09T07:45:06Z) - Nonconvex Stochastic Bregman Proximal Gradient Method for Nonconvex Composite Problems [9.202586157819693]
gradient methods for non composite objective functions typically rely on the Lipschitz smoothness of the differentiable part.
We propose a better approximation model that handles non-Lipschitz gradient in non objectives.
We show it achieves optimal robustness in terms of step selection sensitivity.
arXiv Detail & Related papers (2023-06-26T08:54:46Z) - Model-based Causal Bayesian Optimization [78.120734120667]
We propose model-based causal Bayesian optimization (MCBO)
MCBO learns a full system model instead of only modeling intervention-reward pairs.
Unlike in standard Bayesian optimization, our acquisition function cannot be evaluated in closed form.
arXiv Detail & Related papers (2022-11-18T14:28:21Z) - Sample-Then-Optimize Batch Neural Thompson Sampling [50.800944138278474]
We introduce two algorithms for black-box optimization based on the Thompson sampling (TS) policy.
To choose an input query, we only need to train an NN and then choose the query by maximizing the trained NN.
Our algorithms sidestep the need to invert the large parameter matrix yet still preserve the validity of the TS policy.
arXiv Detail & Related papers (2022-10-13T09:01:58Z) - MBORE: Multi-objective Bayesian Optimisation by Density-Ratio Estimation [0.01652719262940403]
optimisation problems often have multiple conflicting objectives that can be computationally and/or financially expensive.
Mono-surrogate Bayesian optimisation (BO) is a popular model-based approach for optimising such black-box functions.
We extend previous work on BO by density-ratio estimation (BORE) to the multi-objective setting.
arXiv Detail & Related papers (2022-03-31T09:27:59Z) - Two-step Lookahead Bayesian Optimization with Inequality Constraints [21.703234193908038]
We propose a two-step lookahead constrained Bayesian optimization acquisition function (2-OPT-C) supporting both sequential and batch settings.
In numerical experiments, 2-OPT-C typically improves query efficiency by 2x or more over previous methods, and in some cases by 10x or more.
arXiv Detail & Related papers (2021-12-06T07:40:54Z) - Accounting for Gaussian Process Imprecision in Bayesian Optimization [0.0]
We study the effect of the Gaussian processes' prior specifications on classical BO's convergence.
We introduce PROBO as a generalization of BO that aims at rendering the method more robust towards prior mean parameter misspecification.
We test our approach against classical BO on a real-world problem from material science and observe PROBO to converge faster.
arXiv Detail & Related papers (2021-11-16T08:45:39Z) - Misspecified Gaussian Process Bandit Optimization [59.30399661155574]
Kernelized bandit algorithms have shown strong empirical and theoretical performance for this problem.
We introduce a emphmisspecified kernelized bandit setting where the unknown function can be $epsilon$--uniformly approximated by a function with a bounded norm in some Reproducing Kernel Hilbert Space (RKHS)
We show that our algorithm achieves optimal dependence on $epsilon$ with no prior knowledge of misspecification.
arXiv Detail & Related papers (2021-11-09T09:00:02Z) - Sub-linear Regret Bounds for Bayesian Optimisation in Unknown Search
Spaces [63.22864716473051]
We propose a novel BO algorithm which expands (and shifts) the search space over iterations.
We show theoretically that for both our algorithms, the cumulative regret grows at sub-linear rates.
arXiv Detail & Related papers (2020-09-05T14:24:40Z) - Balancing Rates and Variance via Adaptive Batch-Size for Stochastic
Optimization Problems [120.21685755278509]
In this work, we seek to balance the fact that attenuating step-size is required for exact convergence with the fact that constant step-size learns faster in time up to an error.
Rather than fixing the minibatch the step-size at the outset, we propose to allow parameters to evolve adaptively.
arXiv Detail & Related papers (2020-07-02T16:02:02Z) - UFO-BLO: Unbiased First-Order Bilevel Optimization [42.49533978193117]
We propose a new FOBLO-based unbiased estimate of outer-level gradients, enabling us to theoretically guarantee this convergence.
Our findings are supported by experimental results on Omniglot and Mini-ImageNet, popular few-shot meta-learning benchmarks.
arXiv Detail & Related papers (2020-06-05T18:53:45Z) - Near-linear Time Gaussian Process Optimization with Adaptive Batching
and Resparsification [119.41129787351092]
We introduce BBKB, the first no-regret GP optimization algorithm that provably runs in near-linear time and selects candidates in batches.
We show that the same bound can be used to adaptively delay costly updates to the sparse GP approximation, achieving a near-constant per-step amortized cost.
arXiv Detail & Related papers (2020-02-23T17:43:29Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.