Convergence of Meta-Learning with Task-Specific Adaptation over Partial
Parameters
- URL: http://arxiv.org/abs/2006.09486v3
- Date: Thu, 22 Oct 2020 18:01:25 GMT
- Title: Convergence of Meta-Learning with Task-Specific Adaptation over Partial
Parameters
- Authors: Kaiyi Ji, Jason D. Lee, Yingbin Liang, H. Vincent Poor
- Abstract summary: Although model-agnostic metalearning (MAML) is a very successful algorithm meta-learning practice, it can have high computational complexity.
Our paper shows that such complexity can significantly affect the overall convergence performance of ANIL.
- Score: 152.03852111442114
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Although model-agnostic meta-learning (MAML) is a very successful algorithm
in meta-learning practice, it can have high computational cost because it
updates all model parameters over both the inner loop of task-specific
adaptation and the outer-loop of meta initialization training. A more efficient
algorithm ANIL (which refers to almost no inner loop) was proposed recently by
Raghu et al. 2019, which adapts only a small subset of parameters in the inner
loop and thus has substantially less computational cost than MAML as
demonstrated by extensive experiments. However, the theoretical convergence of
ANIL has not been studied yet. In this paper, we characterize the convergence
rate and the computational complexity for ANIL under two representative
inner-loop loss geometries, i.e., strongly-convexity and nonconvexity. Our
results show that such a geometric property can significantly affect the
overall convergence performance of ANIL. For example, ANIL achieves a faster
convergence rate for a strongly-convex inner-loop loss as the number $N$ of
inner-loop gradient descent steps increases, but a slower convergence rate for
a nonconvex inner-loop loss as $N$ increases. Moreover, our complexity analysis
provides a theoretical quantification on the improved efficiency of ANIL over
MAML. The experiments on standard few-shot meta-learning benchmarks validate
our theoretical findings.
Related papers
- Stochastic Optimization for Non-convex Problem with Inexact Hessian
Matrix, Gradient, and Function [99.31457740916815]
Trust-region (TR) and adaptive regularization using cubics have proven to have some very appealing theoretical properties.
We show that TR and ARC methods can simultaneously provide inexact computations of the Hessian, gradient, and function values.
arXiv Detail & Related papers (2023-10-18T10:29:58Z) - Smoothing ADMM for Sparse-Penalized Quantile Regression with Non-Convex
Penalties [8.294148737585543]
This paper investigates concave and clipped quantile regression in the presence of nonsecondary absolute and non-smooth convergence penalties.
We introduce a novel-loop ADM algorithm with an increasing penalty multiplier, named SIAD, specifically for sparse regression.
arXiv Detail & Related papers (2023-09-04T21:48:51Z) - Rényi Divergence Deep Mutual Learning [3.682680183777648]
This paper revisits Deep Learning Mutual (DML) as a simple yet effective computing paradigm.
We propose using R'enyi divergence instead of the KL divergence, which is more flexible and limited.
Our empirical results demonstrate the advantage combining DML and R'enyi divergence, leading to further improvement in model generalization.
arXiv Detail & Related papers (2022-09-13T04:58:35Z) - Finite-Sum Coupled Compositional Stochastic Optimization: Theory and
Applications [43.48388050033774]
This paper provides a comprehensive analysis of a simple algorithm for both non- and convex objectives.
Our analysis also exhibits new insights for improving the practical implementation by sampling batches of equal size for the outer and inner levels.
arXiv Detail & Related papers (2022-02-24T22:39:35Z) - Meta-Learning with Neural Tangent Kernels [58.06951624702086]
We propose the first meta-learning paradigm in the Reproducing Kernel Hilbert Space (RKHS) induced by the meta-model's Neural Tangent Kernel (NTK)
Within this paradigm, we introduce two meta-learning algorithms, which no longer need a sub-optimal iterative inner-loop adaptation as in the MAML framework.
We achieve this goal by 1) replacing the adaptation with a fast-adaptive regularizer in the RKHS; and 2) solving the adaptation analytically based on the NTK theory.
arXiv Detail & Related papers (2021-02-07T20:53:23Z) - Geom-SPIDER-EM: Faster Variance Reduced Stochastic Expectation
Maximization for Nonconvex Finite-Sum Optimization [21.81837334970773]
We propose an extension of the Path-Integrated Differential Estima to the Expectation Maximization (EM) algorithm.
We show it supports the same state art bounds as SPIDER-EM-IDER; and results provide for a rate for our findings.
arXiv Detail & Related papers (2020-11-24T21:20:53Z) - Bilevel Optimization: Convergence Analysis and Enhanced Design [63.64636047748605]
Bilevel optimization is a tool for many machine learning problems.
We propose a novel stoc-efficientgradient estimator named stoc-BiO.
arXiv Detail & Related papers (2020-10-15T18:09:48Z) - Single-Timescale Stochastic Nonconvex-Concave Optimization for Smooth
Nonlinear TD Learning [145.54544979467872]
We propose two single-timescale single-loop algorithms that require only one data point each step.
Our results are expressed in a form of simultaneous primal and dual side convergence.
arXiv Detail & Related papers (2020-08-23T20:36:49Z) - Theoretical Convergence of Multi-Step Model-Agnostic Meta-Learning [63.64636047748605]
We develop a new theoretical framework to provide convergence guarantee for the general multi-step MAML algorithm.
In particular, our results suggest that an inner-stage step needs to be chosen inversely proportional to $N$ of inner-stage steps in order for $N$ MAML to have guaranteed convergence.
arXiv Detail & Related papers (2020-02-18T19:17:54Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.