Gradient-based Hyperparameter Optimization Over Long Horizons
- URL: http://arxiv.org/abs/2007.07869v2
- Date: Thu, 30 Sep 2021 15:51:36 GMT
- Title: Gradient-based Hyperparameter Optimization Over Long Horizons
- Authors: Paul Micaelli and Amos Storkey
- Abstract summary: Forward-mode differentiation with sharing (FDS) is a simple and efficient algorithm which tackles memory scaling issues with forward-mode differentiation.
We demonstrate its efficiency empirically by differentiating through $sim 104$ gradient steps of unrolled optimization.
- Score: 2.28438857884398
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Gradient-based hyperparameter optimization has earned a widespread popularity
in the context of few-shot meta-learning, but remains broadly impractical for
tasks with long horizons (many gradient steps), due to memory scaling and
gradient degradation issues. A common workaround is to learn hyperparameters
online, but this introduces greediness which comes with a significant
performance drop. We propose forward-mode differentiation with sharing (FDS), a
simple and efficient algorithm which tackles memory scaling issues with
forward-mode differentiation, and gradient degradation issues by sharing
hyperparameters that are contiguous in time. We provide theoretical guarantees
about the noise reduction properties of our algorithm, and demonstrate its
efficiency empirically by differentiating through $\sim 10^4$ gradient steps of
unrolled optimization. We consider large hyperparameter search ranges on
CIFAR-10 where we significantly outperform greedy gradient-based alternatives,
while achieving $\times 20$ speedups compared to the state-of-the-art black-box
methods. Code is available at: \url{https://github.com/polo5/FDS}
Related papers
- Polyak Meets Parameter-free Clipped Gradient Descent [29.764853985834403]
We study the parameter-free methods for clipped gradient descent.
Specifically, we propose Inexact Polyak Stepsize, which converges to the optimal solution without any hyperparameters tuning.
We numerically validated our convergence results using a synthetic function and demonstrated the effectiveness of our proposed methods.
arXiv Detail & Related papers (2024-05-23T19:29:38Z) - ELRA: Exponential learning rate adaption gradient descent optimization
method [83.88591755871734]
We present a novel, fast (exponential rate), ab initio (hyper-free) gradient based adaption.
The main idea of the method is to adapt the $alpha by situational awareness.
It can be applied to problems of any dimensions n and scales only linearly.
arXiv Detail & Related papers (2023-09-12T14:36:13Z) - Optimization using Parallel Gradient Evaluations on Multiple Parameters [51.64614793990665]
We propose a first-order method for convex optimization, where gradients from multiple parameters can be used during each step of gradient descent.
Our method uses gradients from multiple parameters in synergy to update these parameters together towards the optima.
arXiv Detail & Related papers (2023-02-06T23:39:13Z) - Online Hyperparameter Meta-Learning with Hypergradient Distillation [59.973770725729636]
gradient-based meta-learning methods assume a set of parameters that do not participate in inner-optimization.
We propose a novel HO method that can overcome these limitations, by approximating the second-order term with knowledge distillation.
arXiv Detail & Related papers (2021-10-06T05:14:53Z) - Implicit differentiation for fast hyperparameter selection in non-smooth
convex learning [87.60600646105696]
We study first-order methods when the inner optimization problem is convex but non-smooth.
We show that the forward-mode differentiation of proximal gradient descent and proximal coordinate descent yield sequences of Jacobians converging toward the exact Jacobian.
arXiv Detail & Related papers (2021-05-04T17:31:28Z) - Optimizing Large-Scale Hyperparameters via Automated Learning Algorithm [97.66038345864095]
We propose a new hyperparameter optimization method with zeroth-order hyper-gradients (HOZOG)
Specifically, we first formulate hyperparameter optimization as an A-based constrained optimization problem.
Then, we use the average zeroth-order hyper-gradients to update hyper parameters.
arXiv Detail & Related papers (2021-02-17T21:03:05Z) - Efficient hyperparameter optimization by way of PAC-Bayes bound
minimization [4.191847852775072]
We present an alternative objective that is equivalent to a Probably Approximately Correct-Bayes (PAC-Bayes) bound on the expected out-of-sample error.
We then devise an efficient gradient-based algorithm to minimize this objective.
arXiv Detail & Related papers (2020-08-14T15:54:51Z) - Implicit differentiation of Lasso-type models for hyperparameter
optimization [82.73138686390514]
We introduce an efficient implicit differentiation algorithm, without matrix inversion, tailored for Lasso-type problems.
Our approach scales to high-dimensional data by leveraging the sparsity of the solutions.
arXiv Detail & Related papers (2020-02-20T18:43:42Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.