Pre-training helps Bayesian optimization too
- URL: http://arxiv.org/abs/2207.03084v1
- Date: Thu, 7 Jul 2022 04:42:54 GMT
- Title: Pre-training helps Bayesian optimization too
- Authors: Zi Wang, George E. Dahl, Kevin Swersky, Chansoo Lee, Zelda Mariet,
Zachary Nado, Justin Gilmer, Jasper Snoek, Zoubin Ghahramani
- Abstract summary: We seek an alternative practice for setting functional priors.
In particular, we consider the scenario where we have data from similar functions that allow us to pre-train a tighter distribution a priori.
Our results show that our method is able to locate good hyper parameters at least 3 times more efficiently than the best competing methods.
- Score: 49.28382118032923
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Bayesian optimization (BO) has become a popular strategy for global
optimization of many expensive real-world functions. Contrary to a common
belief that BO is suited to optimizing black-box functions, it actually
requires domain knowledge on characteristics of those functions to deploy BO
successfully. Such domain knowledge often manifests in Gaussian process priors
that specify initial beliefs on functions. However, even with expert knowledge,
it is not an easy task to select a prior. This is especially true for
hyperparameter tuning problems on complex machine learning models, where
landscapes of tuning objectives are often difficult to comprehend. We seek an
alternative practice for setting these functional priors. In particular, we
consider the scenario where we have data from similar functions that allow us
to pre-train a tighter distribution a priori. To verify our approach in
realistic model training setups, we collected a large multi-task hyperparameter
tuning dataset by training tens of thousands of configurations of
near-state-of-the-art models on popular image and text datasets, as well as a
protein sequence dataset. Our results show that on average, our method is able
to locate good hyperparameters at least 3 times more efficiently than the best
competing methods.
Related papers
- A General Framework for User-Guided Bayesian Optimization [51.96352579696041]
We propose ColaBO, the first Bayesian-principled framework for prior beliefs beyond the typical kernel structure.
We empirically demonstrate ColaBO's ability to substantially accelerate optimization when the prior information is accurate, and to retain approximately default performance when it is misleading.
arXiv Detail & Related papers (2023-11-24T18:27:26Z) - Transfer Learning for Bayesian Optimization on Heterogeneous Search
Spaces [7.963551878308098]
We introduce MPHD, a model pre-training method on heterogeneous domains.
MPHD can be seamlessly integrated with BO to transfer knowledge across heterogeneous search spaces.
arXiv Detail & Related papers (2023-09-28T17:01:43Z) - Learning Regions of Interest for Bayesian Optimization with Adaptive
Level-Set Estimation [84.0621253654014]
We propose a framework, called BALLET, which adaptively filters for a high-confidence region of interest.
We show theoretically that BALLET can efficiently shrink the search space, and can exhibit a tighter regret bound than standard BO.
arXiv Detail & Related papers (2023-07-25T09:45:47Z) - Agent-based Collaborative Random Search for Hyper-parameter Tuning and
Global Function Optimization [0.0]
This paper proposes an agent-based collaborative technique for finding near-optimal values for any arbitrary set of hyper- parameters in a machine learning model.
The behavior of the presented model, specifically against the changes in its design parameters, is investigated in both machine learning and global function optimization applications.
arXiv Detail & Related papers (2023-03-03T21:10:17Z) - HyperBO+: Pre-training a universal prior for Bayesian optimization with
hierarchical Gaussian processes [7.963551878308098]
HyperBO+ is a pre-training approach for hierarchical Gaussian processes.
We show that HyperBO+ is able to generalize to unseen search spaces and achieves lower regrets than competitive baselines.
arXiv Detail & Related papers (2022-12-20T18:47:10Z) - Towards Learning Universal Hyperparameter Optimizers with Transformers [57.35920571605559]
We introduce the OptFormer, the first text-based Transformer HPO framework that provides a universal end-to-end interface for jointly learning policy and function prediction.
Our experiments demonstrate that the OptFormer can imitate at least 7 different HPO algorithms, which can be further improved via its function uncertainty estimates.
arXiv Detail & Related papers (2022-05-26T12:51:32Z) - Consolidated learning -- a domain-specific model-free optimization
strategy with examples for XGBoost and MIMIC-IV [4.370097023410272]
This paper proposes a new formulation of the tuning problem, called consolidated learning.
In such settings, we are interested in the total optimization time rather than tuning for a single task.
We demonstrate the effectiveness of this approach through an empirical study for XGBoost algorithm and the collection of predictive tasks extracted from the MIMIC-IV medical database.
arXiv Detail & Related papers (2022-01-27T21:38:53Z) - Pre-trained Gaussian Processes for Bayesian Optimization [24.730678780782647]
We propose a new pre-training based BO framework named HyperBO.
We show bounded posterior predictions and near-zero regrets for HyperBO without assuming the "ground truth" GP prior is known.
arXiv Detail & Related papers (2021-09-16T20:46:26Z) - Conservative Objective Models for Effective Offline Model-Based
Optimization [78.19085445065845]
Computational design problems arise in a number of settings, from synthetic biology to computer architectures.
We propose a method that learns a model of the objective function that lower bounds the actual value of the ground-truth objective on out-of-distribution inputs.
COMs are simple to implement and outperform a number of existing methods on a wide range of MBO problems.
arXiv Detail & Related papers (2021-07-14T17:55:28Z) - Incorporating Expert Prior in Bayesian Optimisation via Space Warping [54.412024556499254]
In big search spaces the algorithm goes through several low function value regions before reaching the optimum of the function.
One approach to subside this cold start phase is to use prior knowledge that can accelerate the optimisation.
In this paper, we represent the prior knowledge about the function optimum through a prior distribution.
The prior distribution is then used to warp the search space in such a way that space gets expanded around the high probability region of function optimum and shrinks around low probability region of optimum.
arXiv Detail & Related papers (2020-03-27T06:18:49Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.