Online Hyperparameter Meta-Learning with Hypergradient Distillation
- URL: http://arxiv.org/abs/2110.02508v1
- Date: Wed, 6 Oct 2021 05:14:53 GMT
- Title: Online Hyperparameter Meta-Learning with Hypergradient Distillation
- Authors: Hae Beom Lee, Hayeon Lee, Jaewoong Shin, Eunho Yang, Timothy
Hospedales, Sung Ju Hwang
- Abstract summary: gradient-based meta-learning methods assume a set of parameters that do not participate in inner-optimization.
We propose a novel HO method that can overcome these limitations, by approximating the second-order term with knowledge distillation.
- Score: 59.973770725729636
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Many gradient-based meta-learning methods assume a set of parameters that do
not participate in inner-optimization, which can be considered as
hyperparameters. Although such hyperparameters can be optimized using the
existing gradient-based hyperparameter optimization (HO) methods, they suffer
from the following issues. Unrolled differentiation methods do not scale well
to high-dimensional hyperparameters or horizon length, Implicit Function
Theorem (IFT) based methods are restrictive for online optimization, and short
horizon approximations suffer from short horizon bias. In this work, we propose
a novel HO method that can overcome these limitations, by approximating the
second-order term with knowledge distillation. Specifically, we parameterize a
single Jacobian-vector product (JVP) for each HO step and minimize the distance
from the true second-order term. Our method allows online optimization and also
is scalable to the hyperparameter dimension and the horizon length. We
demonstrate the effectiveness of our method on two different meta-learning
methods and three benchmark datasets.
Related papers
- Learning Algorithm Hyperparameters for Fast Parametric Convex Optimization [2.0403774954994858]
We introduce a machine-learning framework to learn the hyper Parameters sequence of first-order methods.
We show how to learn the hyper Parameters for several popular algorithms.
We showcase the effectiveness of our method with many examples, including ones from control, signal processing, and machine learning.
arXiv Detail & Related papers (2024-11-24T04:58:36Z) - Parameter-free Clipped Gradient Descent Meets Polyak [29.764853985834403]
Gradient descent and its variants are de facto standard algorithms for training machine learning models.
We propose Inexact Polyak Stepsize, which converges to the optimal solution without any hyperparameters tuning.
We numerically validated our convergence results using a synthetic function and demonstrated the effectiveness of our proposed methods.
arXiv Detail & Related papers (2024-05-23T19:29:38Z) - ELRA: Exponential learning rate adaption gradient descent optimization
method [83.88591755871734]
We present a novel, fast (exponential rate), ab initio (hyper-free) gradient based adaption.
The main idea of the method is to adapt the $alpha by situational awareness.
It can be applied to problems of any dimensions n and scales only linearly.
arXiv Detail & Related papers (2023-09-12T14:36:13Z) - Stochastic Marginal Likelihood Gradients using Neural Tangent Kernels [78.6096486885658]
We introduce lower bounds to the linearized Laplace approximation of the marginal likelihood.
These bounds are amenable togradient-based optimization and allow to trade off estimation accuracy against computational complexity.
arXiv Detail & Related papers (2023-06-06T19:02:57Z) - Optimization using Parallel Gradient Evaluations on Multiple Parameters [51.64614793990665]
We propose a first-order method for convex optimization, where gradients from multiple parameters can be used during each step of gradient descent.
Our method uses gradients from multiple parameters in synergy to update these parameters together towards the optima.
arXiv Detail & Related papers (2023-02-06T23:39:13Z) - A Globally Convergent Gradient-based Bilevel Hyperparameter Optimization
Method [0.0]
We propose a gradient-based bilevel method for solving the hyperparameter optimization problem.
We show that the proposed method converges with lower computation and leads to models that generalize better on the testing set.
arXiv Detail & Related papers (2022-08-25T14:25:16Z) - Optimizing Large-Scale Hyperparameters via Automated Learning Algorithm [97.66038345864095]
We propose a new hyperparameter optimization method with zeroth-order hyper-gradients (HOZOG)
Specifically, we first formulate hyperparameter optimization as an A-based constrained optimization problem.
Then, we use the average zeroth-order hyper-gradients to update hyper parameters.
arXiv Detail & Related papers (2021-02-17T21:03:05Z) - Efficient hyperparameter optimization by way of PAC-Bayes bound
minimization [4.191847852775072]
We present an alternative objective that is equivalent to a Probably Approximately Correct-Bayes (PAC-Bayes) bound on the expected out-of-sample error.
We then devise an efficient gradient-based algorithm to minimize this objective.
arXiv Detail & Related papers (2020-08-14T15:54:51Z) - Implicit differentiation of Lasso-type models for hyperparameter
optimization [82.73138686390514]
We introduce an efficient implicit differentiation algorithm, without matrix inversion, tailored for Lasso-type problems.
Our approach scales to high-dimensional data by leveraging the sparsity of the solutions.
arXiv Detail & Related papers (2020-02-20T18:43:42Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.