Related papers: Batch Acquisition Function Evaluations and Decouple Optimizer Updates for Faster Bayesian Optimization

Batch Acquisition Function Evaluations and Decouple Optimizer Updates for Faster Bayesian Optimization

URL: http://arxiv.org/abs/2511.13625v2
Date: Tue, 18 Nov 2025 15:38:51 GMT
Title: Batch Acquisition Function Evaluations and Decouple Optimizer Updates for Faster Bayesian Optimization
Authors: Kaichi Irie, Shuhei Watanabe, Masaki Onishi,
Abstract summary: We show how to efficiently find high-performing parameters by maximizing an acquisition function.<n>To address this problem, we propose to decouple the theoretically identical convergence of a QNN acquisition function.
Score: 10.36384630923661
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Bayesian optimization (BO) efficiently finds high-performing parameters by maximizing an acquisition function, which models the promise of parameters. A major computational bottleneck arises in acquisition function optimization, where multi-start optimization (MSO) with quasi-Newton (QN) methods is required due to the non-convexity of the acquisition function. BoTorch, a widely used BO library, currently optimizes the summed acquisition function over multiple points, leading to the speedup of MSO owing to PyTorch batching. Nevertheless, this paper empirically demonstrates the suboptimality of this approach in terms of off-diagonal approximation errors in the inverse Hessian of a QN method, slowing down its convergence. To address this problem, we propose to decouple QN updates using a coroutine while batching the acquisition function calls. Our approach not only yields the theoretically identical convergence to the sequential MSO but also drastically reduces the wall-clock time compared to the previous approaches. Our approach is available in GPSampler in Optuna, effectively reducing its computational overhead.

Related papers

Iterative Interpolation Schedules for Quantum Approximate Optimization Algorithm [1.845978975395919]
We present an iterative method that exploits the smoothness of optimal parameter schedules by expressing them in a basis of functions.<n>We demonstrate our method achieves better performance with fewer optimization steps than current approaches.<n>For the largest LABS instance, we achieve near-optimal merit factors with schedules exceeding 1000 layers, an order of magnitude beyond previous methods.
arXiv Detail & Related papers (2025-04-02T12:53:21Z)
Global Optimization of Gaussian Process Acquisition Functions Using a Piecewise-Linear Kernel Approximation [5.23716716929969]
This paper investigates mixed-integer Kernel-MIQP as a paradigm for global acquisition function optimization.<n>We analyze the theoretical bounds of the proposed approximation and empirically demonstrate the framework.
arXiv Detail & Related papers (2024-10-22T10:56:52Z)
Poisson Process for Bayesian Optimization [126.51200593377739]
We propose a ranking-based surrogate model based on the Poisson process and introduce an efficient BO framework, namely Poisson Process Bayesian Optimization (PoPBO) Compared to the classic GP-BO method, our PoPBO has lower costs and better robustness to noise, which is verified by abundant experiments.
arXiv Detail & Related papers (2024-02-05T02:54:50Z)
Generalizing Bayesian Optimization with Decision-theoretic Entropies [102.82152945324381]
We consider a generalization of Shannon entropy from work in statistical decision theory. We first show that special cases of this entropy lead to popular acquisition functions used in BO procedures. We then show how alternative choices for the loss yield a flexible family of acquisition functions.
arXiv Detail & Related papers (2022-10-04T04:43:58Z)
Batch Bayesian Optimization via Particle Gradient Flows [0.5735035463793008]
We show how to find global optima of objective functions which are only available as a black-box or are expensive to evaluate. We construct a new function based on multipoint expected probability which is over the space of probability measures.
arXiv Detail & Related papers (2022-09-10T18:10:15Z)
Iterative-Free Quantum Approximate Optimization Algorithm Using Neural Networks [20.051757447006043]
We propose a practical method that uses a simple, fully connected neural network to find better parameters tailored to a new given problem instance. Our method is consistently the fastest to converge while also the best final result.
arXiv Detail & Related papers (2022-08-21T14:05:11Z)
Two-step Lookahead Bayesian Optimization with Inequality Constraints [21.703234193908038]
We propose a two-step lookahead constrained Bayesian optimization acquisition function (2-OPT-C) supporting both sequential and batch settings. In numerical experiments, 2-OPT-C typically improves query efficiency by 2x or more over previous methods, and in some cases by 10x or more.
arXiv Detail & Related papers (2021-12-06T07:40:54Z)
STORM+: Fully Adaptive SGD with Momentum for Nonconvex Optimization [74.1615979057429]
We investigate non-batch optimization problems where the objective is an expectation over smooth loss functions. Our work builds on the STORM algorithm, in conjunction with a novel approach to adaptively set the learning rate and momentum parameters.
arXiv Detail & Related papers (2021-11-01T15:43:36Z)
Distributed stochastic optimization with large delays [59.95552973784946]
One of the most widely used methods for solving large-scale optimization problems is distributed asynchronous gradient descent (DASGD) We show that DASGD converges to a global optimal implementation model under same delay assumptions.
arXiv Detail & Related papers (2021-07-06T21:59:49Z)
An Efficient Batch Constrained Bayesian Optimization Approach for Analog Circuit Synthesis via Multi-objective Acquisition Ensemble [11.64233949999656]
We propose an efficient parallelizable Bayesian optimization algorithm via Multi-objective ACquisition function Ensemble (MACE) Our proposed algorithm can reduce the overall simulation time by up to 74 times compared to differential evolution (DE) for the unconstrained optimization problem when the batch size is 15. For the constrained optimization problem, our proposed algorithm can speed up the optimization process by up to 15 times compared to the weighted expected improvement based Bayesian optimization (WEIBO) approach, when the batch size is 15.
arXiv Detail & Related papers (2021-06-28T13:21:28Z)
Adaptive pruning-based optimization of parameterized quantum circuits [62.997667081978825]
Variisy hybrid quantum-classical algorithms are powerful tools to maximize the use of Noisy Intermediate Scale Quantum devices. We propose a strategy for such ansatze used in variational quantum algorithms, which we call "Efficient Circuit Training" (PECT) Instead of optimizing all of the ansatz parameters at once, PECT launches a sequence of variational algorithms.
arXiv Detail & Related papers (2020-10-01T18:14:11Z)
Convergence of adaptive algorithms for weakly convex constrained optimization [59.36386973876765]
We prove the $mathcaltilde O(t-1/4)$ rate of convergence for the norm of the gradient of Moreau envelope. Our analysis works with mini-batch size of $1$, constant first and second order moment parameters, and possibly smooth optimization domains.
arXiv Detail & Related papers (2020-06-11T17:43:19Z)
Global Optimization of Gaussian processes [52.77024349608834]
We propose a reduced-space formulation with trained Gaussian processes trained on few data points. The approach also leads to significantly smaller and computationally cheaper sub solver for lower bounding. In total, we reduce time convergence by orders of orders of the proposed method.
arXiv Detail & Related papers (2020-05-21T20:59:11Z)
Incorporating Expert Prior in Bayesian Optimisation via Space Warping [54.412024556499254]
In big search spaces the algorithm goes through several low function value regions before reaching the optimum of the function. One approach to subside this cold start phase is to use prior knowledge that can accelerate the optimisation. In this paper, we represent the prior knowledge about the function optimum through a prior distribution. The prior distribution is then used to warp the search space in such a way that space gets expanded around the high probability region of function optimum and shrinks around low probability region of optimum.
arXiv Detail & Related papers (2020-03-27T06:18:49Z)
Support recovery and sup-norm convergence rates for sparse pivotal estimation [79.13844065776928]
In high dimensional sparse regression, pivotal estimators are estimators for which the optimal regularization parameter is independent of the noise level. We show minimax sup-norm convergence rates for non smoothed and smoothed, single task and multitask square-root Lasso-type estimators.
arXiv Detail & Related papers (2020-01-15T16:11:04Z)

This list is automatically generated from the titles and abstracts of the papers in this site.