Multi-level Training and Bayesian Optimization for Economical
Hyperparameter Optimization
- URL: http://arxiv.org/abs/2007.09953v1
- Date: Mon, 20 Jul 2020 09:03:02 GMT
- Title: Multi-level Training and Bayesian Optimization for Economical
Hyperparameter Optimization
- Authors: Yang Yang, Ke Deng, Michael Zhu
- Abstract summary: In this paper, we develop an effective approach to reducing the total amount of required training time for Hyperparameter Optimization.
We propose a truncated additive Gaussian process model to calibrate approximate performance measurements generated by light training.
Based on the model, a sequential model-based algorithm is developed to generate the performance profile of the configuration space as well as find optimal ones.
- Score: 12.92634461859467
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Hyperparameters play a critical role in the performances of many machine
learning methods. Determining their best settings or Hyperparameter
Optimization (HPO) faces difficulties presented by the large number of
hyperparameters as well as the excessive training time. In this paper, we
develop an effective approach to reducing the total amount of required training
time for HPO. In the initialization, the nested Latin hypercube design is used
to select hyperparameter configurations for two types of training, which are,
respectively, heavy training and light training. We propose a truncated
additive Gaussian process model to calibrate approximate performance
measurements generated by light training, using accurate performance
measurements generated by heavy training. Based on the model, a sequential
model-based algorithm is developed to generate the performance profile of the
configuration space as well as find optimal ones. Our proposed approach
demonstrates competitive performance when applied to optimize synthetic
examples, support vector machines, fully connected networks and convolutional
neural networks.
Related papers
- Optimization Hyper-parameter Laws for Large Language Models [56.322914260197734]
We present Opt-Laws, a framework that captures the relationship between hyper- parameters and training outcomes.
Our validation across diverse model sizes and data scales demonstrates Opt-Laws' ability to accurately predict training loss.
This approach significantly reduces computational costs while enhancing overall model performance.
arXiv Detail & Related papers (2024-09-07T09:37:19Z) - Model Performance Prediction for Hyperparameter Optimization of Deep
Learning Models Using High Performance Computing and Quantum Annealing [0.0]
We show that integrating model performance prediction with early stopping methods holds great potential to speed up the HPO process of deep learning models.
We propose a novel algorithm called Swift-Hyperband that can use either classical or quantum support vector regression for performance prediction.
arXiv Detail & Related papers (2023-11-29T10:32:40Z) - Deep Ranking Ensembles for Hyperparameter Optimization [9.453554184019108]
We present a novel method that meta-learns neural network surrogates optimized for ranking the configurations' performances while modeling their uncertainty via ensembling.
In a large-scale experimental protocol comprising 12 baselines, 16 HPO search spaces and 86 datasets/tasks, we demonstrate that our method achieves new state-of-the-art results in HPO.
arXiv Detail & Related papers (2023-03-27T13:52:40Z) - Optimization-Derived Learning with Essential Convergence Analysis of
Training and Hyper-training [52.39882976848064]
We design a Generalized Krasnoselskii-Mann (GKM) scheme based on fixed-point iterations as our fundamental ODL module.
Under the GKM scheme, a Bilevel Meta Optimization (BMO) algorithmic framework is constructed to solve the optimal training and hyper-training variables together.
arXiv Detail & Related papers (2022-06-16T01:50:25Z) - Towards Learning Universal Hyperparameter Optimizers with Transformers [57.35920571605559]
We introduce the OptFormer, the first text-based Transformer HPO framework that provides a universal end-to-end interface for jointly learning policy and function prediction.
Our experiments demonstrate that the OptFormer can imitate at least 7 different HPO algorithms, which can be further improved via its function uncertainty estimates.
arXiv Detail & Related papers (2022-05-26T12:51:32Z) - AUTOMATA: Gradient Based Data Subset Selection for Compute-Efficient
Hyper-parameter Tuning [72.54359545547904]
We propose a gradient-based subset selection framework for hyper- parameter tuning.
We show that using gradient-based data subsets for hyper- parameter tuning achieves significantly faster turnaround times and speedups of 3$times$-30$times$.
arXiv Detail & Related papers (2022-03-15T19:25:01Z) - Towards Robust and Automatic Hyper-Parameter Tunning [39.04604349338802]
We introduce a new class of HPO method and explore how the low-rank factorization of intermediate layers of a convolutional network can be used to define an analytical response surface.
We quantify how this surface behaves as a surrogate to model performance and can be solved using a trust-region search algorithm, which we call autoHyper.
arXiv Detail & Related papers (2021-11-28T05:27:34Z) - Online hyperparameter optimization by real-time recurrent learning [57.01871583756586]
Our framework takes advantage of the analogy between hyperparameter optimization and parameter learning in neural networks (RNNs)
It adapts a well-studied family of online learning algorithms for RNNs to tune hyperparameters and network parameters simultaneously.
This procedure yields systematically better generalization performance compared to standard methods, at a fraction of wallclock time.
arXiv Detail & Related papers (2021-02-15T19:36:18Z) - Bayesian Optimization for Selecting Efficient Machine Learning Models [53.202224677485525]
We present a unified Bayesian Optimization framework for jointly optimizing models for both prediction effectiveness and training efficiency.
Experiments on model selection for recommendation tasks indicate models selected this way significantly improves model training efficiency.
arXiv Detail & Related papers (2020-08-02T02:56:30Z) - Weighting Is Worth the Wait: Bayesian Optimization with Importance
Sampling [34.67740033646052]
We improve upon Bayesian optimization state-of-the-art runtime and final validation error across a variety of datasets and complex neural architectures.
By learning a parameterization of IS that trades-off evaluation complexity and quality, we improve upon Bayesian optimization state-of-the-art runtime and final validation error across a variety of datasets and complex neural architectures.
arXiv Detail & Related papers (2020-02-23T15:52:08Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.