Meta-Learning with Adjoint Methods
- URL: http://arxiv.org/abs/2110.08432v1
- Date: Sat, 16 Oct 2021 01:18:50 GMT
- Title: Meta-Learning with Adjoint Methods
- Authors: Shibo Li, Zheng Wang, Akil Narayan, Robert Kirby, Shandian Zhe
- Abstract summary: A Meta-Learning (MAML) is widely used to find a good initialization for a family of tasks.
Despite its success, a critical challenge in MAML is to calculate the gradient w.r.t the initialization of a long training trajectory for the sampled tasks.
We propose Adjoint MAML (A-MAML) to address this problem.
We demonstrate the advantage of our approach in both synthetic and real-world meta-learning tasks.
- Score: 16.753336086160598
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Model Agnostic Meta-Learning (MAML) is widely used to find a good
initialization for a family of tasks. Despite its success, a critical challenge
in MAML is to calculate the gradient w.r.t the initialization of a long
training trajectory for the sampled tasks, because the computation graph can
rapidly explode and the computational cost is very expensive. To address this
problem, we propose Adjoint MAML (A-MAML). We view gradient descent in the
inner optimization as the evolution of an Ordinary Differential Equation (ODE).
To efficiently compute the gradient of the validation loss w.r.t the
initialization, we use the adjoint method to construct a companion, backward
ODE. To obtain the gradient w.r.t the initialization, we only need to run the
standard ODE solver twice -- one is forward in time that evolves a long
trajectory of gradient flow for the sampled task; the other is backward and
solves the adjoint ODE. We need not create or expand any intermediate
computational graphs, adopt aggressive approximations, or impose proximal
regularizers in the training loss. Our approach is cheap, accurate, and
adaptable to different trajectory lengths. We demonstrate the advantage of our
approach in both synthetic and real-world meta-learning tasks.
Related papers
- Flow Priors for Linear Inverse Problems via Iterative Corrupted Trajectory Matching [35.77769905072651]
We propose an iterative algorithm to approximate the MAP estimator efficiently to solve a variety of linear inverse problems.
Our algorithm is mathematically justified by the observation that the MAP objective can be approximated by a sum of $N$ local MAP'' objectives.
We validate our approach for various linear inverse problems, such as super-resolution, deblurring, inpainting, and compressed sensing.
arXiv Detail & Related papers (2024-05-29T06:56:12Z) - ELRA: Exponential learning rate adaption gradient descent optimization
method [83.88591755871734]
We present a novel, fast (exponential rate), ab initio (hyper-free) gradient based adaption.
The main idea of the method is to adapt the $alpha by situational awareness.
It can be applied to problems of any dimensions n and scales only linearly.
arXiv Detail & Related papers (2023-09-12T14:36:13Z) - Meta-Value Learning: a General Framework for Learning with Learning
Awareness [1.4323566945483497]
We propose to judge joint policies by their long-term prospects as measured by the meta-value.
We apply a form of Q-learning to the meta-game of optimization, in a way that avoids the need to explicitly represent the continuous action space of policy updates.
arXiv Detail & Related papers (2023-07-17T21:40:57Z) - Continuous-Time Meta-Learning with Forward Mode Differentiation [65.26189016950343]
We introduce Continuous Meta-Learning (COMLN), a meta-learning algorithm where adaptation follows the dynamics of a gradient vector field.
Treating the learning process as an ODE offers the notable advantage that the length of the trajectory is now continuous.
We show empirically its efficiency in terms of runtime and memory usage, and we illustrate its effectiveness on a range of few-shot image classification problems.
arXiv Detail & Related papers (2022-03-02T22:35:58Z) - Adapting Stepsizes by Momentumized Gradients Improves Optimization and
Generalization [89.66571637204012]
textscAdaMomentum on vision, and achieves state-the-art results consistently on other tasks including language processing.
textscAdaMomentum on vision, and achieves state-the-art results consistently on other tasks including language processing.
textscAdaMomentum on vision, and achieves state-the-art results consistently on other tasks including language processing.
arXiv Detail & Related papers (2021-06-22T03:13:23Z) - Meta-Learning with Neural Tangent Kernels [58.06951624702086]
We propose the first meta-learning paradigm in the Reproducing Kernel Hilbert Space (RKHS) induced by the meta-model's Neural Tangent Kernel (NTK)
Within this paradigm, we introduce two meta-learning algorithms, which no longer need a sub-optimal iterative inner-loop adaptation as in the MAML framework.
We achieve this goal by 1) replacing the adaptation with a fast-adaptive regularizer in the RKHS; and 2) solving the adaptation analytically based on the NTK theory.
arXiv Detail & Related papers (2021-02-07T20:53:23Z) - Physarum Powered Differentiable Linear Programming Layers and
Applications [48.77235931652611]
We propose an efficient and differentiable solver for general linear programming problems.
We show the use of our solver in a video segmentation task and meta-learning for few-shot learning.
arXiv Detail & Related papers (2020-04-30T01:50:37Z) - A Sample Complexity Separation between Non-Convex and Convex
Meta-Learning [42.51788412283446]
One popular trend in meta-learning is to learn from many tasks a common method that can be used to solve a new task with few samples.
This paper shows that it is important to understand the optimization blackbox, specifically at the subspaces of a linear network.
analyses of these methods reveal that they can meta-learn the correct subspace onto which the data should be projected.
arXiv Detail & Related papers (2020-02-25T20:55:09Z) - Towards Better Understanding of Adaptive Gradient Algorithms in
Generative Adversarial Nets [71.05306664267832]
Adaptive algorithms perform gradient updates using the history of gradients and are ubiquitous in training deep neural networks.
In this paper we analyze a variant of OptimisticOA algorithm for nonconcave minmax problems.
Our experiments show that adaptive GAN non-adaptive gradient algorithms can be observed empirically.
arXiv Detail & Related papers (2019-12-26T22:10:10Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.