Automatic prior selection for meta Bayesian optimization with a case
study on tuning deep neural network optimizers
- URL: http://arxiv.org/abs/2109.08215v1
- Date: Thu, 16 Sep 2021 20:46:26 GMT
- Title: Automatic prior selection for meta Bayesian optimization with a case
study on tuning deep neural network optimizers
- Authors: Zi Wang and George E. Dahl and Kevin Swersky and Chansoo Lee and Zelda
Mariet and Zack Nado and Justin Gilmer and Jasper Snoek and Zoubin Ghahramani
- Abstract summary: We propose a principled approach to solve such expensive hyperparameter tuning problems efficiently.
Key to the performance of BO is specifying and refining a distribution over functions, which is used to reason about the optima of the underlying function being optimized.
We verify our approach in realistic model training setups by training tens of thousands of configurations of near-state-of-the-art models on popular image and text datasets.
- Score: 47.013395100497775
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: The performance of deep neural networks can be highly sensitive to the choice
of a variety of meta-parameters, such as optimizer parameters and model
hyperparameters. Tuning these well, however, often requires extensive and
costly experimentation. Bayesian optimization (BO) is a principled approach to
solve such expensive hyperparameter tuning problems efficiently. Key to the
performance of BO is specifying and refining a distribution over functions,
which is used to reason about the optima of the underlying function being
optimized. In this work, we consider the scenario where we have data from
similar functions that allows us to specify a tighter distribution a priori.
Specifically, we focus on the common but potentially costly task of tuning
optimizer parameters for training neural networks. Building on the meta BO
method from Wang et al. (2018), we develop practical improvements that (a)
boost its performance by leveraging tuning results on multiple tasks without
requiring observations for the same meta-parameter points across all tasks, and
(b) retain its regret bound for a special case of our method. As a result, we
provide a coherent BO solution for iterative optimization of continuous
optimizer parameters. To verify our approach in realistic model training
setups, we collected a large multi-task hyperparameter tuning dataset by
training tens of thousands of configurations of near-state-of-the-art models on
popular image and text datasets, as well as a protein sequence dataset. Our
results show that on average, our method is able to locate good hyperparameters
at least 3 times more efficiently than the best competing methods.
Related papers
- Approximation-Aware Bayesian Optimization [34.56666383247348]
High-dimensional Bayesian optimization (BO) tasks often require 10,000 function evaluations before obtaining meaningful results.
We modify sparse variational Gaussian processes (SVGPs) to better align with the goals of BO.
Using the framework of utility-calibrated variational inference, we unify GP approximation and data acquisition into a joint optimization problem.
arXiv Detail & Related papers (2024-06-06T17:55:02Z) - Equation Discovery with Bayesian Spike-and-Slab Priors and Efficient Kernels [57.46832672991433]
We propose a novel equation discovery method based on Kernel learning and BAyesian Spike-and-Slab priors (KBASS)
We use kernel regression to estimate the target function, which is flexible, expressive, and more robust to data sparsity and noises.
We develop an expectation-propagation expectation-maximization algorithm for efficient posterior inference and function estimation.
arXiv Detail & Related papers (2023-10-09T03:55:09Z) - Learning Regions of Interest for Bayesian Optimization with Adaptive
Level-Set Estimation [84.0621253654014]
We propose a framework, called BALLET, which adaptively filters for a high-confidence region of interest.
We show theoretically that BALLET can efficiently shrink the search space, and can exhibit a tighter regret bound than standard BO.
arXiv Detail & Related papers (2023-07-25T09:45:47Z) - Provably Efficient Bayesian Optimization with Unknown Gaussian Process Hyperparameter Estimation [44.53678257757108]
We propose a new BO method that can sub-linearly converge to the objective function's global optimum.
Our method uses a multi-armed bandit technique (EXP3) to add random data points to the BO process.
We demonstrate empirically that our method outperforms existing approaches on various synthetic and real-world problems.
arXiv Detail & Related papers (2023-06-12T03:35:45Z) - HyperBO+: Pre-training a universal prior for Bayesian optimization with
hierarchical Gaussian processes [7.963551878308098]
HyperBO+ is a pre-training approach for hierarchical Gaussian processes.
We show that HyperBO+ is able to generalize to unseen search spaces and achieves lower regrets than competitive baselines.
arXiv Detail & Related papers (2022-12-20T18:47:10Z) - Prior-mean-assisted Bayesian optimization application on FRIB Front-End
tunning [61.78406085010957]
We exploit a neural network model trained over historical data as a prior mean of BO for FRIB Front-End tuning.
In this paper, we exploit a neural network model trained over historical data as a prior mean of BO for FRIB Front-End tuning.
arXiv Detail & Related papers (2022-11-11T18:34:15Z) - Pre-training helps Bayesian optimization too [49.28382118032923]
We seek an alternative practice for setting functional priors.
In particular, we consider the scenario where we have data from similar functions that allow us to pre-train a tighter distribution a priori.
Our results show that our method is able to locate good hyper parameters at least 3 times more efficiently than the best competing methods.
arXiv Detail & Related papers (2022-07-07T04:42:54Z) - Surrogate modeling for Bayesian optimization beyond a single Gaussian
process [62.294228304646516]
We propose a novel Bayesian surrogate model to balance exploration with exploitation of the search space.
To endow function sampling with scalability, random feature-based kernel approximation is leveraged per GP model.
To further establish convergence of the proposed EGP-TS to the global optimum, analysis is conducted based on the notion of Bayesian regret.
arXiv Detail & Related papers (2022-05-27T16:43:10Z) - Towards Learning Universal Hyperparameter Optimizers with Transformers [57.35920571605559]
We introduce the OptFormer, the first text-based Transformer HPO framework that provides a universal end-to-end interface for jointly learning policy and function prediction.
Our experiments demonstrate that the OptFormer can imitate at least 7 different HPO algorithms, which can be further improved via its function uncertainty estimates.
arXiv Detail & Related papers (2022-05-26T12:51:32Z) - Accounting for Gaussian Process Imprecision in Bayesian Optimization [0.0]
We study the effect of the Gaussian processes' prior specifications on classical BO's convergence.
We introduce PROBO as a generalization of BO that aims at rendering the method more robust towards prior mean parameter misspecification.
We test our approach against classical BO on a real-world problem from material science and observe PROBO to converge faster.
arXiv Detail & Related papers (2021-11-16T08:45:39Z) - Using Distance Correlation for Efficient Bayesian Optimization [0.0]
We propose a BO scheme named BDC, which integrates BO with a statistical measure of association of two random variables called Distance Correlation.<n>BDC exploration balances and exploitation automatically, and requires no manual hyper parameter tuning.<n>We evaluate BDC on a range of benchmark tests and observe that it performs on per with popular BO methods.
arXiv Detail & Related papers (2021-02-17T19:37:35Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.