Related papers: Gradient-based Hyperparameter Optimization Over Long Horizons

Gradient-based Hyperparameter Optimization Over Long Horizons

URL: http://arxiv.org/abs/2007.07869v2
Date: Thu, 30 Sep 2021 15:51:36 GMT
Title: Gradient-based Hyperparameter Optimization Over Long Horizons
Authors: Paul Micaelli and Amos Storkey
Abstract summary: Forward-mode differentiation with sharing (FDS) is a simple and efficient algorithm which tackles memory scaling issues with forward-mode differentiation. We demonstrate its efficiency empirically by differentiating through $sim 104$ gradient steps of unrolled optimization.
Score: 2.28438857884398
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Gradient-based hyperparameter optimization has earned a widespread popularity in the context of few-shot meta-learning, but remains broadly impractical for tasks with long horizons (many gradient steps), due to memory scaling and gradient degradation issues. A common workaround is to learn hyperparameters online, but this introduces greediness which comes with a significant performance drop. We propose forward-mode differentiation with sharing (FDS), a simple and efficient algorithm which tackles memory scaling issues with forward-mode differentiation, and gradient degradation issues by sharing hyperparameters that are contiguous in time. We provide theoretical guarantees about the noise reduction properties of our algorithm, and demonstrate its efficiency empirically by differentiating through $\sim 10^4$ gradient steps of unrolled optimization. We consider large hyperparameter search ranges on CIFAR-10 where we significantly outperform greedy gradient-based alternatives, while achieving $\times 20$ speedups compared to the state-of-the-art black-box methods. Code is available at: \url{https://github.com/polo5/FDS}

Related papers

Learning Algorithm Hyperparameters for Fast Parametric Convex Optimization [2.0403774954994858]
We introduce a machine-learning framework to learn the hyper Parameters sequence of first-order methods. We show how to learn the hyper Parameters for several popular algorithms. We showcase the effectiveness of our method with many examples, including ones from control, signal processing, and machine learning.
arXiv Detail & Related papers (2024-11-24T04:58:36Z)
Massive Dimensions Reduction and Hybridization with Meta-heuristics in Deep Learning [0.24578723416255746]
Histogram-based Differential Evolution (HBDE) hybridizes gradient-based and gradient-free algorithms to optimize parameters. HBDE outperforms baseline gradient-based and parent gradient-free DE algorithms evaluated on CIFAR-10 and CIFAR-100 datasets.
arXiv Detail & Related papers (2024-08-13T20:28:20Z)
ELRA: Exponential learning rate adaption gradient descent optimization method [83.88591755871734]
We present a novel, fast (exponential rate), ab initio (hyper-free) gradient based adaption. The main idea of the method is to adapt the $alpha by situational awareness. It can be applied to problems of any dimensions n and scales only linearly.
arXiv Detail & Related papers (2023-09-12T14:36:13Z)
Optimization using Parallel Gradient Evaluations on Multiple Parameters [51.64614793990665]
We propose a first-order method for convex optimization, where gradients from multiple parameters can be used during each step of gradient descent. Our method uses gradients from multiple parameters in synergy to update these parameters together towards the optima.
arXiv Detail & Related papers (2023-02-06T23:39:13Z)
Online Hyperparameter Meta-Learning with Hypergradient Distillation [59.973770725729636]
gradient-based meta-learning methods assume a set of parameters that do not participate in inner-optimization. We propose a novel HO method that can overcome these limitations, by approximating the second-order term with knowledge distillation.
arXiv Detail & Related papers (2021-10-06T05:14:53Z)
Implicit differentiation for fast hyperparameter selection in non-smooth convex learning [87.60600646105696]
We study first-order methods when the inner optimization problem is convex but non-smooth. We show that the forward-mode differentiation of proximal gradient descent and proximal coordinate descent yield sequences of Jacobians converging toward the exact Jacobian.
arXiv Detail & Related papers (2021-05-04T17:31:28Z)
Optimizing Large-Scale Hyperparameters via Automated Learning Algorithm [97.66038345864095]
We propose a new hyperparameter optimization method with zeroth-order hyper-gradients (HOZOG) Specifically, we first formulate hyperparameter optimization as an A-based constrained optimization problem. Then, we use the average zeroth-order hyper-gradients to update hyper parameters.
arXiv Detail & Related papers (2021-02-17T21:03:05Z)
Sparse Communication for Training Deep Networks [56.441077560085475]
Synchronous gradient descent (SGD) is the most common method used for distributed training of deep learning models. In this algorithm, each worker shares its local gradients with others and updates the parameters using the average gradients of all workers. We study several compression schemes and identify how three key parameters affect the performance.
arXiv Detail & Related papers (2020-09-19T17:28:11Z)
Efficient hyperparameter optimization by way of PAC-Bayes bound minimization [4.191847852775072]
We present an alternative objective that is equivalent to a Probably Approximately Correct-Bayes (PAC-Bayes) bound on the expected out-of-sample error. We then devise an efficient gradient-based algorithm to minimize this objective.
arXiv Detail & Related papers (2020-08-14T15:54:51Z)
Implicit differentiation of Lasso-type models for hyperparameter optimization [82.73138686390514]
We introduce an efficient implicit differentiation algorithm, without matrix inversion, tailored for Lasso-type problems. Our approach scales to high-dimensional data by leveraging the sparsity of the solutions.
arXiv Detail & Related papers (2020-02-20T18:43:42Z)

This list is automatically generated from the titles and abstracts of the papers in this site.