Few-Shot Bayesian Optimization with Deep Kernel Surrogates
- URL: http://arxiv.org/abs/2101.07667v1
- Date: Tue, 19 Jan 2021 15:00:39 GMT
- Title: Few-Shot Bayesian Optimization with Deep Kernel Surrogates
- Authors: Martin Wistuba and Josif Grabocka
- Abstract summary: We propose a few-shot learning problem in which we train a shared deep surrogate model to adapt to the response function of a new task.
We propose the use of a deep kernel network for a Gaussian process surrogate that is meta-learned in an end-to-end fashion.
As a result, the novel few-shot optimization of our deep kernel surrogate leads to new state-of-the-art results at HPO.
- Score: 7.208515071018781
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Hyperparameter optimization (HPO) is a central pillar in the automation of
machine learning solutions and is mainly performed via Bayesian optimization,
where a parametric surrogate is learned to approximate the black box response
function (e.g. validation error). Unfortunately, evaluating the response
function is computationally intensive. As a remedy, earlier work emphasizes the
need for transfer learning surrogates which learn to optimize hyperparameters
for an algorithm from other tasks. In contrast to previous work, we propose to
rethink HPO as a few-shot learning problem in which we train a shared deep
surrogate model to quickly adapt (with few response evaluations) to the
response function of a new task. We propose the use of a deep kernel network
for a Gaussian process surrogate that is meta-learned in an end-to-end fashion
in order to jointly approximate the response functions of a collection of
training data sets. As a result, the novel few-shot optimization of our deep
kernel surrogate leads to new state-of-the-art results at HPO compared to
several recent methods on diverse metadata sets.
Related papers
- Efficient Bayesian Optimization with Deep Kernel Learning and
Transformer Pre-trained on Multiple Heterogeneous Datasets [9.510327380529892]
We propose a simple approach to pre-train a surrogate, which is a Gaussian process (GP) with a kernel defined on deep features learned from a Transformer-based encoder.
Experiments on both synthetic and real benchmark problems demonstrate the effectiveness of our proposed pre-training and transfer BO strategy.
arXiv Detail & Related papers (2023-08-09T01:56:10Z) - Deep Ranking Ensembles for Hyperparameter Optimization [9.453554184019108]
We present a novel method that meta-learns neural network surrogates optimized for ranking the configurations' performances while modeling their uncertainty via ensembling.
In a large-scale experimental protocol comprising 12 baselines, 16 HPO search spaces and 86 datasets/tasks, we demonstrate that our method achieves new state-of-the-art results in HPO.
arXiv Detail & Related papers (2023-03-27T13:52:40Z) - Multi-objective hyperparameter optimization with performance uncertainty [62.997667081978825]
This paper presents results on multi-objective hyperparameter optimization with uncertainty on the evaluation of Machine Learning algorithms.
We combine the sampling strategy of Tree-structured Parzen Estimators (TPE) with the metamodel obtained after training a Gaussian Process Regression (GPR) with heterogeneous noise.
Experimental results on three analytical test functions and three ML problems show the improvement over multi-objective TPE and GPR.
arXiv Detail & Related papers (2022-09-09T14:58:43Z) - Pre-training helps Bayesian optimization too [49.28382118032923]
We seek an alternative practice for setting functional priors.
In particular, we consider the scenario where we have data from similar functions that allow us to pre-train a tighter distribution a priori.
Our results show that our method is able to locate good hyper parameters at least 3 times more efficiently than the best competing methods.
arXiv Detail & Related papers (2022-07-07T04:42:54Z) - Surrogate modeling for Bayesian optimization beyond a single Gaussian
process [62.294228304646516]
We propose a novel Bayesian surrogate model to balance exploration with exploitation of the search space.
To endow function sampling with scalability, random feature-based kernel approximation is leveraged per GP model.
To further establish convergence of the proposed EGP-TS to the global optimum, analysis is conducted based on the notion of Bayesian regret.
arXiv Detail & Related papers (2022-05-27T16:43:10Z) - Towards Learning Universal Hyperparameter Optimizers with Transformers [57.35920571605559]
We introduce the OptFormer, the first text-based Transformer HPO framework that provides a universal end-to-end interface for jointly learning policy and function prediction.
Our experiments demonstrate that the OptFormer can imitate at least 7 different HPO algorithms, which can be further improved via its function uncertainty estimates.
arXiv Detail & Related papers (2022-05-26T12:51:32Z) - Efficient Neural Network Analysis with Sum-of-Infeasibilities [64.31536828511021]
Inspired by sum-of-infeasibilities methods in convex optimization, we propose a novel procedure for analyzing verification queries on networks with extensive branching functions.
An extension to a canonical case-analysis-based complete search procedure can be achieved by replacing the convex procedure executed at each search state with DeepSoI.
arXiv Detail & Related papers (2022-03-19T15:05:09Z) - Automatic tuning of hyper-parameters of reinforcement learning
algorithms using Bayesian optimization with behavioral cloning [0.0]
In reinforcement learning (RL), the information content of data gathered by the learning agent is dependent on the setting of many hyper- parameters.
In this work, a novel approach for autonomous hyper- parameter setting using Bayesian optimization is proposed.
Experiments reveal promising results compared to other manual tweaking and optimization-based approaches.
arXiv Detail & Related papers (2021-12-15T13:10:44Z) - Cost-Efficient Online Hyperparameter Optimization [94.60924644778558]
We propose an online HPO algorithm that reaches human expert-level performance within a single run of the experiment.
Our proposed online HPO algorithm reaches human expert-level performance within a single run of the experiment, while incurring only modest computational overhead compared to regular training.
arXiv Detail & Related papers (2021-01-17T04:55:30Z) - Practical and sample efficient zero-shot HPO [8.41866793161234]
We provide an overview of available approaches and introduce two novel techniques to handle the problem.
The first is based on a surrogate model and adaptively chooses pairs of dataset, configuration to query.
The second is for settings where finding, tuning and testing a surrogate model is problematic, is a multi-fidelity technique combining HyperBand with submodular optimization.
arXiv Detail & Related papers (2020-07-27T08:56:55Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.