Hippo: Taming Hyper-parameter Optimization of Deep Learning with Stage
Trees
- URL: http://arxiv.org/abs/2006.11972v1
- Date: Mon, 22 Jun 2020 02:36:12 GMT
- Title: Hippo: Taming Hyper-parameter Optimization of Deep Learning with Stage
Trees
- Authors: Ahnjae Shin, Do Yoon Kim, Joo Seong Jeong, Byung-Gon Chun
- Abstract summary: We propose Hippo, a hyper-parameter optimization system that removes redundancy in the training process to reduce the overall amount of computation significantly.
Hippo is applicable to not only single studies, but multi-study scenarios as well, where multiple studies of the same model and search space can be formulated as trees of stages.
- Score: 2.294014185517203
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Hyper-parameter optimization is crucial for pushing the accuracy of a deep
learning model to its limits. A hyper-parameter optimization job, referred to
as a study, involves numerous trials of training a model using different
training knobs, and therefore is very computation-heavy, typically taking hours
and days to finish. We observe that trials issued from hyper-parameter
optimization algorithms often share common hyper-parameter sequence prefixes.
Based on this observation, we propose Hippo, a hyper-parameter optimization
system that removes redundancy in the training process to reduce the overall
amount of computation significantly. Instead of executing each trial
independently as in existing hyper-parameter optimization systems, Hippo breaks
down the hyper-parameter sequences into stages and merges common stages to form
a tree of stages (called a stage-tree), then executes a stage once per tree on
a distributed GPU server environment. Hippo is applicable to not only single
studies, but multi-study scenarios as well, where multiple studies of the same
model and search space can be formulated as trees of stages. Evaluations show
that Hippo's stage-based execution strategy outperforms trial-based methods
such as Ray Tune for several models and hyper-parameter optimization
algorithms, reducing GPU-hours and end-to-end training time significantly.
Related papers
- Tune As You Scale: Hyperparameter Optimization For Compute Efficient
Training [0.0]
We propose a practical method for robustly tuning large models.
CarBS performs local search around the performance-cost frontier.
Among our results, we effectively solve the entire ProcGen benchmark just by tuning a simple baseline.
arXiv Detail & Related papers (2023-06-13T18:22:24Z) - Sparse high-dimensional linear regression with a partitioned empirical
Bayes ECM algorithm [62.997667081978825]
We propose a computationally efficient and powerful Bayesian approach for sparse high-dimensional linear regression.
Minimal prior assumptions on the parameters are used through the use of plug-in empirical Bayes estimates.
The proposed approach is implemented in the R package probe.
arXiv Detail & Related papers (2022-09-16T19:15:50Z) - AUTOMATA: Gradient Based Data Subset Selection for Compute-Efficient
Hyper-parameter Tuning [72.54359545547904]
We propose a gradient-based subset selection framework for hyper- parameter tuning.
We show that using gradient-based data subsets for hyper- parameter tuning achieves significantly faster turnaround times and speedups of 3$times$-30$times$.
arXiv Detail & Related papers (2022-03-15T19:25:01Z) - Towards Robust and Automatic Hyper-Parameter Tunning [39.04604349338802]
We introduce a new class of HPO method and explore how the low-rank factorization of intermediate layers of a convolutional network can be used to define an analytical response surface.
We quantify how this surface behaves as a surrogate to model performance and can be solved using a trust-region search algorithm, which we call autoHyper.
arXiv Detail & Related papers (2021-11-28T05:27:34Z) - Joint inference and input optimization in equilibrium networks [68.63726855991052]
deep equilibrium model is a class of models that foregoes traditional network depth and instead computes the output of a network by finding the fixed point of a single nonlinear layer.
We show that there is a natural synergy between these two settings.
We demonstrate this strategy on various tasks such as training generative models while optimizing over latent codes, training models for inverse problems like denoising and inpainting, adversarial training and gradient based meta-learning.
arXiv Detail & Related papers (2021-11-25T19:59:33Z) - Scalable One-Pass Optimisation of High-Dimensional Weight-Update
Hyperparameters by Implicit Differentiation [0.0]
We develop an approximate hypergradient-based hyper parameter optimiser.
It requires only one training episode, with no restarts.
We also provide a motivating argument for convergence to the true hypergradient.
arXiv Detail & Related papers (2021-10-20T09:57:57Z) - HYPPO: A Surrogate-Based Multi-Level Parallelism Tool for Hyperparameter
Optimization [0.2844198651668139]
HYPPO uses adaptive surrogate models and accounts for uncertainty in model predictions to find accurate and reliable models that make robust predictions.
We demonstrate various software features on time-series prediction and image classification problems as well as a scientific application in computed tomography image reconstruction.
arXiv Detail & Related papers (2021-10-04T20:14:22Z) - HyP-ABC: A Novel Automated Hyper-Parameter Tuning Algorithm Using
Evolutionary Optimization [1.6114012813668934]
We propose HyP-ABC, an automatic hybrid hyper-parameter optimization algorithm using the modified artificial bee colony approach.
Compared to the state-of-the-art techniques, HyP-ABC is more efficient and has a limited number of parameters to be tuned.
arXiv Detail & Related papers (2021-09-11T16:45:39Z) - Online hyperparameter optimization by real-time recurrent learning [57.01871583756586]
Our framework takes advantage of the analogy between hyperparameter optimization and parameter learning in neural networks (RNNs)
It adapts a well-studied family of online learning algorithms for RNNs to tune hyperparameters and network parameters simultaneously.
This procedure yields systematically better generalization performance compared to standard methods, at a fraction of wallclock time.
arXiv Detail & Related papers (2021-02-15T19:36:18Z) - How much progress have we made in neural network training? A New
Evaluation Protocol for Benchmarking Optimizers [86.36020260204302]
We propose a new benchmarking protocol to evaluate both end-to-end efficiency and data-addition training efficiency.
A human study is conducted to show that our evaluation protocol matches human tuning behavior better than the random search.
We then apply the proposed benchmarking framework to 7s and various tasks, including computer vision, natural language processing, reinforcement learning, and graph mining.
arXiv Detail & Related papers (2020-10-19T21:46:39Z) - Communication-Efficient Distributed Stochastic AUC Maximization with
Deep Neural Networks [50.42141893913188]
We study a distributed variable for large-scale AUC for a neural network as with a deep neural network.
Our model requires a much less number of communication rounds and still a number of communication rounds in theory.
Our experiments on several datasets show the effectiveness of our theory and also confirm our theory.
arXiv Detail & Related papers (2020-05-05T18:08:23Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.