Provable Generalization of Overparameterized Meta-learning Trained with
SGD
- URL: http://arxiv.org/abs/2206.09136v1
- Date: Sat, 18 Jun 2022 07:22:57 GMT
- Title: Provable Generalization of Overparameterized Meta-learning Trained with
SGD
- Authors: Yu Huang and Yingbin Liang and Longbo Huang
- Abstract summary: We study the generalization of a widely used meta-learning approach, Model-Agnostic Meta-Learning (MAML)
We provide both upper and lower bounds for the excess risk of MAML, which captures how SGD dynamics affect these generalization bounds.
Our theoretical findings are further validated by experiments.
- Score: 62.892930625034374
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Despite the superior empirical success of deep meta-learning, theoretical
understanding of overparameterized meta-learning is still limited. This paper
studies the generalization of a widely used meta-learning approach,
Model-Agnostic Meta-Learning (MAML), which aims to find a good initialization
for fast adaptation to new tasks. Under a mixed linear regression model, we
analyze the generalization properties of MAML trained with SGD in the
overparameterized regime. We provide both upper and lower bounds for the excess
risk of MAML, which captures how SGD dynamics affect these generalization
bounds. With such sharp characterizations, we further explore how various
learning parameters impact the generalization capability of overparameterized
MAML, including explicitly identifying typical data and task distributions that
can achieve diminishing generalization error with overparameterization, and
characterizing the impact of adaptation learning rate on both excess risk and
the early stopping time. Our theoretical findings are further validated by
experiments.
Related papers
- Theoretical Characterization of the Generalization Performance of
Overfitted Meta-Learning [70.52689048213398]
This paper studies the performance of overfitted meta-learning under a linear regression model with Gaussian features.
We find new and interesting properties that do not exist in single-task linear regression.
Our analysis suggests that benign overfitting is more significant and easier to observe when the noise and the diversity/fluctuation of the ground truth of each training task are large.
arXiv Detail & Related papers (2023-04-09T20:36:13Z) - Scalable PAC-Bayesian Meta-Learning via the PAC-Optimal Hyper-Posterior:
From Theory to Practice [54.03076395748459]
A central question in the meta-learning literature is how to regularize to ensure generalization to unseen tasks.
We present a generalization bound for meta-learning, which was first derived by Rothfuss et al.
We provide a theoretical analysis and empirical case study under which conditions and to what extent these guarantees for meta-learning improve upon PAC-Bayesian per-task learning bounds.
arXiv Detail & Related papers (2022-11-14T08:51:04Z) - A Generalized Weighted Optimization Method for Computational Learning
and Inversion [15.535124460414588]
We analyze a generalized weighted least-squares optimization method for computational learning and inversion with noisy data.
We characterize the impact of the weighting scheme on the generalization error of the learning method.
We demonstrate that appropriate weighting from prior knowledge can improve the generalization capability of the learned model.
arXiv Detail & Related papers (2022-01-23T10:35:34Z) - Generalization Bounds For Meta-Learning: An Information-Theoretic
Analysis [8.028776552383365]
We propose a generic understanding of both the conventional learning-to-learn framework and the modern model-agnostic meta-learning algorithms.
We provide a data-dependent generalization bound for a variant of MAML, which is non-vacuous for deep few-shot learning.
arXiv Detail & Related papers (2021-09-29T17:45:54Z) - MAML is a Noisy Contrastive Learner [72.04430033118426]
Model-agnostic meta-learning (MAML) is one of the most popular and widely-adopted meta-learning algorithms nowadays.
We provide a new perspective to the working mechanism of MAML and discover that: MAML is analogous to a meta-learner using a supervised contrastive objective function.
We propose a simple but effective technique, zeroing trick, to alleviate such interference.
arXiv Detail & Related papers (2021-06-29T12:52:26Z) - On the Generalization of Stochastic Gradient Descent with Momentum [58.900860437254885]
We first show that there exists a convex loss function for which algorithmic stability fails to establish generalization guarantees.
For smooth Lipschitz loss functions, we analyze a modified momentum-based update rule, and show that it admits an upper-bound on the generalization error.
For the special case of strongly convex loss functions, we find a range of momentum such that multiple epochs of standard SGDM, as a special form of SGDEM, also generalizes.
arXiv Detail & Related papers (2021-02-26T18:58:29Z) - On Fast Adversarial Robustness Adaptation in Model-Agnostic
Meta-Learning [100.14809391594109]
Model-agnostic meta-learning (MAML) has emerged as one of the most successful meta-learning techniques in few-shot learning.
Despite the generalization power of the meta-model, it remains elusive that how adversarial robustness can be maintained by MAML in few-shot learning.
We propose a general but easily-optimized robustness-regularized meta-learning framework, which allows the use of unlabeled data augmentation, fast adversarial attack generation, and computationally-light fine-tuning.
arXiv Detail & Related papers (2021-02-20T22:03:04Z) - B-SMALL: A Bayesian Neural Network approach to Sparse Model-Agnostic
Meta-Learning [2.9189409618561966]
We propose a Bayesian neural network based MAML algorithm, which we refer to as the B-SMALL algorithm.
We demonstrate the performance of B-MAML using classification and regression tasks, and highlight that training a sparsifying BNN using MAML indeed improves the parameter footprint of the model.
arXiv Detail & Related papers (2021-01-01T09:19:48Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.