Automatic prior selection for meta Bayesian optimization with a case
study on tuning deep neural network optimizers
- URL: http://arxiv.org/abs/2109.08215v1
- Date: Thu, 16 Sep 2021 20:46:26 GMT
- Title: Automatic prior selection for meta Bayesian optimization with a case
study on tuning deep neural network optimizers
- Authors: Zi Wang and George E. Dahl and Kevin Swersky and Chansoo Lee and Zelda
Mariet and Zack Nado and Justin Gilmer and Jasper Snoek and Zoubin Ghahramani
- Abstract summary: We propose a principled approach to solve such expensive hyperparameter tuning problems efficiently.
Key to the performance of BO is specifying and refining a distribution over functions, which is used to reason about the optima of the underlying function being optimized.
We verify our approach in realistic model training setups by training tens of thousands of configurations of near-state-of-the-art models on popular image and text datasets.
- Score: 47.013395100497775
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: The performance of deep neural networks can be highly sensitive to the choice
of a variety of meta-parameters, such as optimizer parameters and model
hyperparameters. Tuning these well, however, often requires extensive and
costly experimentation. Bayesian optimization (BO) is a principled approach to
solve such expensive hyperparameter tuning problems efficiently. Key to the
performance of BO is specifying and refining a distribution over functions,
which is used to reason about the optima of the underlying function being
optimized. In this work, we consider the scenario where we have data from
similar functions that allows us to specify a tighter distribution a priori.
Specifically, we focus on the common but potentially costly task of tuning
optimizer parameters for training neural networks. Building on the meta BO
method from Wang et al. (2018), we develop practical improvements that (a)
boost its performance by leveraging tuning results on multiple tasks without
requiring observations for the same meta-parameter points across all tasks, and
(b) retain its regret bound for a special case of our method. As a result, we
provide a coherent BO solution for iterative optimization of continuous
optimizer parameters. To verify our approach in realistic model training
setups, we collected a large multi-task hyperparameter tuning dataset by
training tens of thousands of configurations of near-state-of-the-art models on
popular image and text datasets, as well as a protein sequence dataset. Our
results show that on average, our method is able to locate good hyperparameters
at least 3 times more efficiently than the best competing methods.
Related papers
- Adaptive Preference Scaling for Reinforcement Learning with Human Feedback [103.36048042664768]
Reinforcement learning from human feedback (RLHF) is a prevalent approach to align AI systems with human values.
We propose a novel adaptive preference loss, underpinned by distributionally robust optimization (DRO)
Our method is versatile and can be readily adapted to various preference optimization frameworks.
arXiv Detail & Related papers (2024-06-04T20:33:22Z) - A Unified Gaussian Process for Branching and Nested Hyperparameter
Optimization [19.351804144005744]
In deep learning, tuning parameters with conditional dependence are common in practice.
New GP model accounts for the dependent structure among input variables through a new kernel function.
High prediction accuracy and better optimization efficiency are observed in a series of synthetic simulations and real data applications of neural networks.
arXiv Detail & Related papers (2024-01-19T21:11:32Z) - VeLO: Training Versatile Learned Optimizers by Scaling Up [67.90237498659397]
We leverage the same scaling approach behind the success of deep learning to learn versatiles.
We train an ingest for deep learning which is itself a small neural network that ingests and outputs parameter updates.
We open source our learned, meta-training code, the associated train test data, and an extensive benchmark suite with baselines at velo-code.io.
arXiv Detail & Related papers (2022-11-17T18:39:07Z) - Pre-training helps Bayesian optimization too [49.28382118032923]
We seek an alternative practice for setting functional priors.
In particular, we consider the scenario where we have data from similar functions that allow us to pre-train a tighter distribution a priori.
Our results show that our method is able to locate good hyper parameters at least 3 times more efficiently than the best competing methods.
arXiv Detail & Related papers (2022-07-07T04:42:54Z) - Towards Learning Universal Hyperparameter Optimizers with Transformers [57.35920571605559]
We introduce the OptFormer, the first text-based Transformer HPO framework that provides a universal end-to-end interface for jointly learning policy and function prediction.
Our experiments demonstrate that the OptFormer can imitate at least 7 different HPO algorithms, which can be further improved via its function uncertainty estimates.
arXiv Detail & Related papers (2022-05-26T12:51:32Z) - AUTOMATA: Gradient Based Data Subset Selection for Compute-Efficient
Hyper-parameter Tuning [72.54359545547904]
We propose a gradient-based subset selection framework for hyper- parameter tuning.
We show that using gradient-based data subsets for hyper- parameter tuning achieves significantly faster turnaround times and speedups of 3$times$-30$times$.
arXiv Detail & Related papers (2022-03-15T19:25:01Z) - BOSH: Bayesian Optimization by Sampling Hierarchically [10.10241176664951]
We propose a novel BO routine pairing a hierarchical Gaussian process with an information-theoretic framework to generate a growing pool of realizations.
We demonstrate that BOSH provides more efficient and higher-precision optimization than standard BO across synthetic benchmarks, simulation optimization, reinforcement learning and hyper- parameter tuning tasks.
arXiv Detail & Related papers (2020-07-02T07:35:49Z) - Automatic Setting of DNN Hyper-Parameters by Mixing Bayesian
Optimization and Tuning Rules [0.6875312133832078]
We build a new algorithm for evaluating and analyzing the results of the network on the training and validation sets.
We use a set of tuning rules to add new hyper-parameters and/or to reduce the hyper- parameter search space to select a better combination.
arXiv Detail & Related papers (2020-06-03T08:53:48Z) - Weighting Is Worth the Wait: Bayesian Optimization with Importance
Sampling [34.67740033646052]
We improve upon Bayesian optimization state-of-the-art runtime and final validation error across a variety of datasets and complex neural architectures.
By learning a parameterization of IS that trades-off evaluation complexity and quality, we improve upon Bayesian optimization state-of-the-art runtime and final validation error across a variety of datasets and complex neural architectures.
arXiv Detail & Related papers (2020-02-23T15:52:08Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.