Optimal Initialization of Batch Bayesian Optimization
- URL: http://arxiv.org/abs/2404.17997v1
- Date: Sat, 27 Apr 2024 20:16:58 GMT
- Title: Optimal Initialization of Batch Bayesian Optimization
- Authors: Jiuge Ren, David Sweet,
- Abstract summary: We propose a batch-design acquisition function that designs a batch by optimization rather than random sampling.
MTV minimizes the variance of the post-evaluation estimates of quality, integrated over the entire space of settings.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Field experiments and computer simulations are effective but time-consuming methods of measuring the quality of engineered systems at different settings. To reduce the total time required, experimenters may employ Bayesian optimization, which is parsimonious with measurements, and take measurements of multiple settings simultaneously, in a batch. In practice, experimenters use very few batches, thus, it is imperative that each batch be as informative as possible. Typically, the initial batch in a Batch Bayesian Optimization (BBO) is constructed from a quasi-random sample of settings values. We propose a batch-design acquisition function, Minimal Terminal Variance (MTV), that designs a batch by optimization rather than random sampling. MTV adapts a design criterion function from Design of Experiments, called I-Optimality, which minimizes the variance of the post-evaluation estimates of quality, integrated over the entire space of settings. MTV weights the integral by the probability that a setting is optimal, making it able to design not only an initial batch but all subsequent batches, as well. Applicability to both initialization and subsequent batches is novel among acquisition functions. Numerical experiments on test functions and simulators show that MTV compares favorably to other BBO methods.
Related papers
- Batch Bayesian Optimization via Expected Subspace Improvement [0.0]
We propose a simple and efficient approach to extend Bayesian optimization to batch evaluation.
Our proposed approach can achieve near-linear speedup when compared with the sequential Bayesian optimization algorithm.
arXiv Detail & Related papers (2024-11-25T09:14:09Z) - Accelerated zero-order SGD under high-order smoothness and overparameterized regime [79.85163929026146]
We present a novel gradient-free algorithm to solve convex optimization problems.
Such problems are encountered in medicine, physics, and machine learning.
We provide convergence guarantees for the proposed algorithm under both types of noise.
arXiv Detail & Related papers (2024-11-21T10:26:17Z) - Scaling LLM Inference with Optimized Sample Compute Allocation [56.524278187351925]
We propose OSCA, an algorithm to find an optimal mix of different inference configurations.
Our experiments show that with our learned mixed allocation, we can achieve accuracy better than the best single configuration.
OSCA is also shown to be effective in agentic beyond single-turn tasks, achieving a better accuracy on SWE-Bench with 3x less compute than the default configuration.
arXiv Detail & Related papers (2024-10-29T19:17:55Z) - Mini-Batch Optimization of Contrastive Loss [13.730030395850358]
We show that mini-batch optimization is equivalent to full-batch optimization if and only if all $binomNB$ mini-batches are selected.
We then demonstrate that utilizing high-loss mini-batches can speed up SGD convergence and propose a spectral clustering-based approach for identifying these high-loss mini-batches.
arXiv Detail & Related papers (2023-07-12T04:23:26Z) - Combining Multi-Fidelity Modelling and Asynchronous Batch Bayesian
Optimization [10.29946890434873]
This paper proposes an algorithm combining multi-fidelity and asynchronous batch methods.
We empirically study the algorithm behavior, and show it can outperform single-fidelity batch methods and multi-fidelity sequential methods.
As an application, we consider designing electrode materials for optimal performance in pouch cells using experiments with coin cells to approximate battery performance.
arXiv Detail & Related papers (2022-11-11T12:02:40Z) - Batch Bayesian Optimization via Particle Gradient Flows [0.5735035463793008]
We show how to find global optima of objective functions which are only available as a black-box or are expensive to evaluate.
We construct a new function based on multipoint expected probability which is over the space of probability measures.
arXiv Detail & Related papers (2022-09-10T18:10:15Z) - Surrogate modeling for Bayesian optimization beyond a single Gaussian
process [62.294228304646516]
We propose a novel Bayesian surrogate model to balance exploration with exploitation of the search space.
To endow function sampling with scalability, random feature-based kernel approximation is leveraged per GP model.
To further establish convergence of the proposed EGP-TS to the global optimum, analysis is conducted based on the notion of Bayesian regret.
arXiv Detail & Related papers (2022-05-27T16:43:10Z) - BBTv2: Pure Black-Box Optimization Can Be Comparable to Gradient Descent
for Few-Shot Learning [83.26610968655815]
Black-Box Tuning is a derivative-free approach to optimize continuous prompt tokens prepended to the input of language models.
We present BBTv2, a pure black-box optimization approach that can drive language models to achieve comparable results to gradient-based optimization.
arXiv Detail & Related papers (2022-05-23T11:10:19Z) - Fast Variational AutoEncoder with Inverted Multi-Index for Collaborative
Filtering [59.349057602266]
Variational AutoEncoder (VAE) has been extended as a representative nonlinear method for collaborative filtering.
We propose to decompose the inner-product-based softmax probability based on the inverted multi-index.
FastVAE can outperform the state-of-the-art baselines in terms of both sampling quality and efficiency.
arXiv Detail & Related papers (2021-09-13T08:31:59Z) - Trusted-Maximizers Entropy Search for Efficient Bayesian Optimization [39.824086260578646]
This paper presents a novel trusted-maximizers entropy search (TES) acquisition function.
It measures how much an input contributes to the information gain on a query over a finite set of trusted maximizers.
arXiv Detail & Related papers (2021-07-30T07:25:07Z) - Adaptive Learning of the Optimal Batch Size of SGD [52.50880550357175]
We propose a method capable of learning the optimal batch size adaptively throughout its iterations for strongly convex and smooth functions.
Our method does this provably, and in our experiments with synthetic and real data robustly exhibits nearly optimal behaviour.
We generalize our method to several new batch strategies not considered in the literature before, including a sampling suitable for distributed implementations.
arXiv Detail & Related papers (2020-05-03T14:28:32Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.