Pre-trained Gaussian Processes for Bayesian Optimization
- URL: http://arxiv.org/abs/2109.08215v6
- Date: Fri, 2 Aug 2024 20:13:29 GMT
- Title: Pre-trained Gaussian Processes for Bayesian Optimization
- Authors: Zi Wang, George E. Dahl, Kevin Swersky, Chansoo Lee, Zachary Nado, Justin Gilmer, Jasper Snoek, Zoubin Ghahramani,
- Abstract summary: We propose a new pre-training based BO framework named HyperBO.
We show bounded posterior predictions and near-zero regrets for HyperBO without assuming the "ground truth" GP prior is known.
- Score: 24.730678780782647
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Bayesian optimization (BO) has become a popular strategy for global optimization of expensive real-world functions. Contrary to a common expectation that BO is suited to optimizing black-box functions, it actually requires domain knowledge about those functions to deploy BO successfully. Such domain knowledge often manifests in Gaussian process (GP) priors that specify initial beliefs on functions. However, even with expert knowledge, it is non-trivial to quantitatively define a prior. This is especially true for hyperparameter tuning problems on complex machine learning models, where landscapes of tuning objectives are often difficult to comprehend. We seek an alternative practice for setting these functional priors. In particular, we consider the scenario where we have data from similar functions that allow us to pre-train a tighter distribution a priori. We detail what pre-training entails for GPs using a KL divergence based loss function, and propose a new pre-training based BO framework named HyperBO. Theoretically, we show bounded posterior predictions and near-zero regrets for HyperBO without assuming the "ground truth" GP prior is known. To verify our approach in realistic setups, we collect a large multi-task hyperparameter tuning dataset by training tens of thousands of configurations of near-state-of-the-art deep learning models on popular image and text datasets, as well as a protein sequence dataset. Our results show that on average, HyperBO is able to locate good hyperparameters at least 3 times more efficiently than the best competing methods on both our new tuning dataset and existing multi-task BO benchmarks.
Related papers
- Approximation-Aware Bayesian Optimization [34.56666383247348]
High-dimensional Bayesian optimization (BO) tasks often require 10,000 function evaluations before obtaining meaningful results.
We modify sparse variational Gaussian processes (SVGPs) to better align with the goals of BO.
Using the framework of utility-calibrated variational inference, we unify GP approximation and data acquisition into a joint optimization problem.
arXiv Detail & Related papers (2024-06-06T17:55:02Z) - Equation Discovery with Bayesian Spike-and-Slab Priors and Efficient Kernels [57.46832672991433]
We propose a novel equation discovery method based on Kernel learning and BAyesian Spike-and-Slab priors (KBASS)
We use kernel regression to estimate the target function, which is flexible, expressive, and more robust to data sparsity and noises.
We develop an expectation-propagation expectation-maximization algorithm for efficient posterior inference and function estimation.
arXiv Detail & Related papers (2023-10-09T03:55:09Z) - Learning Regions of Interest for Bayesian Optimization with Adaptive
Level-Set Estimation [84.0621253654014]
We propose a framework, called BALLET, which adaptively filters for a high-confidence region of interest.
We show theoretically that BALLET can efficiently shrink the search space, and can exhibit a tighter regret bound than standard BO.
arXiv Detail & Related papers (2023-07-25T09:45:47Z) - Provably Efficient Bayesian Optimization with Unknown Gaussian Process Hyperparameter Estimation [44.53678257757108]
We propose a new BO method that can sub-linearly converge to the objective function's global optimum.
Our method uses a multi-armed bandit technique (EXP3) to add random data points to the BO process.
We demonstrate empirically that our method outperforms existing approaches on various synthetic and real-world problems.
arXiv Detail & Related papers (2023-06-12T03:35:45Z) - HyperBO+: Pre-training a universal prior for Bayesian optimization with
hierarchical Gaussian processes [7.963551878308098]
HyperBO+ is a pre-training approach for hierarchical Gaussian processes.
We show that HyperBO+ is able to generalize to unseen search spaces and achieves lower regrets than competitive baselines.
arXiv Detail & Related papers (2022-12-20T18:47:10Z) - Prior-mean-assisted Bayesian optimization application on FRIB Front-End
tunning [61.78406085010957]
We exploit a neural network model trained over historical data as a prior mean of BO for FRIB Front-End tuning.
In this paper, we exploit a neural network model trained over historical data as a prior mean of BO for FRIB Front-End tuning.
arXiv Detail & Related papers (2022-11-11T18:34:15Z) - Pre-training helps Bayesian optimization too [49.28382118032923]
We seek an alternative practice for setting functional priors.
In particular, we consider the scenario where we have data from similar functions that allow us to pre-train a tighter distribution a priori.
Our results show that our method is able to locate good hyper parameters at least 3 times more efficiently than the best competing methods.
arXiv Detail & Related papers (2022-07-07T04:42:54Z) - Surrogate modeling for Bayesian optimization beyond a single Gaussian
process [62.294228304646516]
We propose a novel Bayesian surrogate model to balance exploration with exploitation of the search space.
To endow function sampling with scalability, random feature-based kernel approximation is leveraged per GP model.
To further establish convergence of the proposed EGP-TS to the global optimum, analysis is conducted based on the notion of Bayesian regret.
arXiv Detail & Related papers (2022-05-27T16:43:10Z) - Towards Learning Universal Hyperparameter Optimizers with Transformers [57.35920571605559]
We introduce the OptFormer, the first text-based Transformer HPO framework that provides a universal end-to-end interface for jointly learning policy and function prediction.
Our experiments demonstrate that the OptFormer can imitate at least 7 different HPO algorithms, which can be further improved via its function uncertainty estimates.
arXiv Detail & Related papers (2022-05-26T12:51:32Z) - Accounting for Gaussian Process Imprecision in Bayesian Optimization [0.0]
We study the effect of the Gaussian processes' prior specifications on classical BO's convergence.
We introduce PROBO as a generalization of BO that aims at rendering the method more robust towards prior mean parameter misspecification.
We test our approach against classical BO on a real-world problem from material science and observe PROBO to converge faster.
arXiv Detail & Related papers (2021-11-16T08:45:39Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.