Improving Multi-fidelity Optimization with a Recurring Learning Rate for
Hyperparameter Tuning
- URL: http://arxiv.org/abs/2209.12499v1
- Date: Mon, 26 Sep 2022 08:16:31 GMT
- Title: Improving Multi-fidelity Optimization with a Recurring Learning Rate for
Hyperparameter Tuning
- Authors: HyunJae Lee, Gihyeon Lee, Junhwan Kim, Sungjun Cho, Dohyun Kim,
Donggeun Yoo
- Abstract summary: We propose Multi-fidelity Optimization with a Recurring Learning rate (MORL)
MORL incorporates CNNs' optimization process into multi-fidelity optimization.
It alleviates the problem of slow-starter and achieves a more precise low-fidelity approximation.
- Score: 7.591442522626255
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Despite the evolution of Convolutional Neural Networks (CNNs), their
performance is surprisingly dependent on the choice of hyperparameters.
However, it remains challenging to efficiently explore large hyperparameter
search space due to the long training times of modern CNNs. Multi-fidelity
optimization enables the exploration of more hyperparameter configurations
given budget by early termination of unpromising configurations. However, it
often results in selecting a sub-optimal configuration as training with the
high-performing configuration typically converges slowly in an early phase. In
this paper, we propose Multi-fidelity Optimization with a Recurring Learning
rate (MORL) which incorporates CNNs' optimization process into multi-fidelity
optimization. MORL alleviates the problem of slow-starter and achieves a more
precise low-fidelity approximation. Our comprehensive experiments on general
image classification, transfer learning, and semi-supervised learning
demonstrate the effectiveness of MORL over other multi-fidelity optimization
methods such as Successive Halving Algorithm (SHA) and Hyperband. Furthermore,
it achieves significant performance improvements over hand-tuned hyperparameter
configuration within a practical budget.
Related papers
- FADAS: Towards Federated Adaptive Asynchronous Optimization [56.09666452175333]
Federated learning (FL) has emerged as a widely adopted training paradigm for privacy-preserving machine learning.
This paper introduces federated adaptive asynchronous optimization, named FADAS, a novel method that incorporates asynchronous updates into adaptive federated optimization with provable guarantees.
We rigorously establish the convergence rate of the proposed algorithms and empirical results demonstrate the superior performance of FADAS over other asynchronous FL baselines.
arXiv Detail & Related papers (2024-07-25T20:02:57Z) - Memory-Efficient Optimization with Factorized Hamiltonian Descent [11.01832755213396]
We introduce a novel adaptive, H-Fac, which incorporates a memory-efficient factorization approach to address this challenge.
By employing a rank-1 parameterization for both momentum and scaling parameter estimators, H-Fac reduces memory costs to a sublinear level.
We develop our algorithms based on principles derived from Hamiltonian dynamics, providing robust theoretical underpinnings in optimization dynamics and convergence guarantees.
arXiv Detail & Related papers (2024-06-14T12:05:17Z) - On the Parameterization of Second-Order Optimization Effective Towards the Infinite Width [5.217870815854702]
We identify a specific parameterization for second-order optimization that promotes feature learning in a stable manner.
Inspired by a maximal update parameterization, we consider a one-step update of the gradient.
Our approach covers two major second-order optimization algorithms, K-FAC and Shampoo.
arXiv Detail & Related papers (2023-12-19T15:12:39Z) - Federated Multi-Level Optimization over Decentralized Networks [55.776919718214224]
We study the problem of distributed multi-level optimization over a network, where agents can only communicate with their immediate neighbors.
We propose a novel gossip-based distributed multi-level optimization algorithm that enables networked agents to solve optimization problems at different levels in a single timescale.
Our algorithm achieves optimal sample complexity, scaling linearly with the network size, and demonstrates state-of-the-art performance on various applications.
arXiv Detail & Related papers (2023-10-10T00:21:10Z) - Amortized Auto-Tuning: Cost-Efficient Transfer Optimization for
Hyperparameter Recommendation [83.85021205445662]
We propose an instantiation--amortized auto-tuning (AT2) to speed up tuning of machine learning models.
We conduct a thorough analysis of the multi-task multi-fidelity Bayesian optimization framework, which leads to the best instantiation--amortized auto-tuning (AT2)
arXiv Detail & Related papers (2021-06-17T00:01:18Z) - Optimizing Large-Scale Hyperparameters via Automated Learning Algorithm [97.66038345864095]
We propose a new hyperparameter optimization method with zeroth-order hyper-gradients (HOZOG)
Specifically, we first formulate hyperparameter optimization as an A-based constrained optimization problem.
Then, we use the average zeroth-order hyper-gradients to update hyper parameters.
arXiv Detail & Related papers (2021-02-17T21:03:05Z) - Online hyperparameter optimization by real-time recurrent learning [57.01871583756586]
Our framework takes advantage of the analogy between hyperparameter optimization and parameter learning in neural networks (RNNs)
It adapts a well-studied family of online learning algorithms for RNNs to tune hyperparameters and network parameters simultaneously.
This procedure yields systematically better generalization performance compared to standard methods, at a fraction of wallclock time.
arXiv Detail & Related papers (2021-02-15T19:36:18Z) - Delta-STN: Efficient Bilevel Optimization for Neural Networks using
Structured Response Jacobians [5.33024001730262]
Self-Tuning Networks (STNs) have recently gained traction due to their ability to amortize the optimization of the inner objective.
We propose the $Delta$-STN, an improved hypernetwork architecture which stabilizes training.
arXiv Detail & Related papers (2020-10-26T12:12:23Z) - Adaptive pruning-based optimization of parameterized quantum circuits [62.997667081978825]
Variisy hybrid quantum-classical algorithms are powerful tools to maximize the use of Noisy Intermediate Scale Quantum devices.
We propose a strategy for such ansatze used in variational quantum algorithms, which we call "Efficient Circuit Training" (PECT)
Instead of optimizing all of the ansatz parameters at once, PECT launches a sequence of variational algorithms.
arXiv Detail & Related papers (2020-10-01T18:14:11Z) - Weighting Is Worth the Wait: Bayesian Optimization with Importance
Sampling [34.67740033646052]
We improve upon Bayesian optimization state-of-the-art runtime and final validation error across a variety of datasets and complex neural architectures.
By learning a parameterization of IS that trades-off evaluation complexity and quality, we improve upon Bayesian optimization state-of-the-art runtime and final validation error across a variety of datasets and complex neural architectures.
arXiv Detail & Related papers (2020-02-23T15:52:08Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.