Meta-Learning with Adjoint Methods
- URL: http://arxiv.org/abs/2110.08432v1
- Date: Sat, 16 Oct 2021 01:18:50 GMT
- Title: Meta-Learning with Adjoint Methods
- Authors: Shibo Li, Zheng Wang, Akil Narayan, Robert Kirby, Shandian Zhe
- Abstract summary: A Meta-Learning (MAML) is widely used to find a good initialization for a family of tasks.
Despite its success, a critical challenge in MAML is to calculate the gradient w.r.t the initialization of a long training trajectory for the sampled tasks.
We propose Adjoint MAML (A-MAML) to address this problem.
We demonstrate the advantage of our approach in both synthetic and real-world meta-learning tasks.
- Score: 16.753336086160598
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Model Agnostic Meta-Learning (MAML) is widely used to find a good
initialization for a family of tasks. Despite its success, a critical challenge
in MAML is to calculate the gradient w.r.t the initialization of a long
training trajectory for the sampled tasks, because the computation graph can
rapidly explode and the computational cost is very expensive. To address this
problem, we propose Adjoint MAML (A-MAML). We view gradient descent in the
inner optimization as the evolution of an Ordinary Differential Equation (ODE).
To efficiently compute the gradient of the validation loss w.r.t the
initialization, we use the adjoint method to construct a companion, backward
ODE. To obtain the gradient w.r.t the initialization, we only need to run the
standard ODE solver twice -- one is forward in time that evolves a long
trajectory of gradient flow for the sampled task; the other is backward and
solves the adjoint ODE. We need not create or expand any intermediate
computational graphs, adopt aggressive approximations, or impose proximal
regularizers in the training loss. Our approach is cheap, accurate, and
adaptable to different trajectory lengths. We demonstrate the advantage of our
approach in both synthetic and real-world meta-learning tasks.
Related papers
- Fast Adaptation with Kernel and Gradient based Meta Leaning [4.763682200721131]
We propose two algorithms to improve both the inner and outer loops of Model A Meta Learning (MAML)
Our first algorithm redefines the optimization problem in the function space to update the model using closed-form solutions.
In the outer loop, the second algorithm adjusts the learning of the meta-learner by assigning weights to the losses from each task of the inner loop.
arXiv Detail & Related papers (2024-11-01T07:05:03Z) - A Stochastic Approach to Bi-Level Optimization for Hyperparameter Optimization and Meta Learning [74.80956524812714]
We tackle the general differentiable meta learning problem that is ubiquitous in modern deep learning.
These problems are often formalized as Bi-Level optimizations (BLO)
We introduce a novel perspective by turning a given BLO problem into a ii optimization, where the inner loss function becomes a smooth distribution, and the outer loss becomes an expected loss over the inner distribution.
arXiv Detail & Related papers (2024-10-14T12:10:06Z) - Unified Gradient-Based Machine Unlearning with Remain Geometry Enhancement [29.675650285351768]
Machine unlearning (MU) has emerged to enhance the privacy and trustworthiness of deep neural networks.
Approximate MU is a practical method for large-scale models.
We propose a fast-slow parameter update strategy to implicitly approximate the up-to-date salient unlearning direction.
arXiv Detail & Related papers (2024-09-29T15:17:33Z) - Flow Priors for Linear Inverse Problems via Iterative Corrupted Trajectory Matching [35.77769905072651]
We propose an iterative algorithm to approximate the MAP estimator efficiently to solve a variety of linear inverse problems.
Our algorithm is mathematically justified by the observation that the MAP objective can be approximated by a sum of $N$ local MAP'' objectives.
We validate our approach for various linear inverse problems, such as super-resolution, deblurring, inpainting, and compressed sensing.
arXiv Detail & Related papers (2024-05-29T06:56:12Z) - Continuous-Time Meta-Learning with Forward Mode Differentiation [65.26189016950343]
We introduce Continuous Meta-Learning (COMLN), a meta-learning algorithm where adaptation follows the dynamics of a gradient vector field.
Treating the learning process as an ODE offers the notable advantage that the length of the trajectory is now continuous.
We show empirically its efficiency in terms of runtime and memory usage, and we illustrate its effectiveness on a range of few-shot image classification problems.
arXiv Detail & Related papers (2022-03-02T22:35:58Z) - Adapting Stepsizes by Momentumized Gradients Improves Optimization and
Generalization [89.66571637204012]
textscAdaMomentum on vision, and achieves state-the-art results consistently on other tasks including language processing.
textscAdaMomentum on vision, and achieves state-the-art results consistently on other tasks including language processing.
textscAdaMomentum on vision, and achieves state-the-art results consistently on other tasks including language processing.
arXiv Detail & Related papers (2021-06-22T03:13:23Z) - Meta-Learning with Neural Tangent Kernels [58.06951624702086]
We propose the first meta-learning paradigm in the Reproducing Kernel Hilbert Space (RKHS) induced by the meta-model's Neural Tangent Kernel (NTK)
Within this paradigm, we introduce two meta-learning algorithms, which no longer need a sub-optimal iterative inner-loop adaptation as in the MAML framework.
We achieve this goal by 1) replacing the adaptation with a fast-adaptive regularizer in the RKHS; and 2) solving the adaptation analytically based on the NTK theory.
arXiv Detail & Related papers (2021-02-07T20:53:23Z) - Physarum Powered Differentiable Linear Programming Layers and
Applications [48.77235931652611]
We propose an efficient and differentiable solver for general linear programming problems.
We show the use of our solver in a video segmentation task and meta-learning for few-shot learning.
arXiv Detail & Related papers (2020-04-30T01:50:37Z) - Towards Better Understanding of Adaptive Gradient Algorithms in
Generative Adversarial Nets [71.05306664267832]
Adaptive algorithms perform gradient updates using the history of gradients and are ubiquitous in training deep neural networks.
In this paper we analyze a variant of OptimisticOA algorithm for nonconcave minmax problems.
Our experiments show that adaptive GAN non-adaptive gradient algorithms can be observed empirically.
arXiv Detail & Related papers (2019-12-26T22:10:10Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.