Related papers: Provable Generalization of Overparameterized Meta-learning Trained with SGD

Provable Generalization of Overparameterized Meta-learning Trained with SGD

URL: http://arxiv.org/abs/2206.09136v1
Date: Sat, 18 Jun 2022 07:22:57 GMT
Title: Provable Generalization of Overparameterized Meta-learning Trained with SGD
Authors: Yu Huang and Yingbin Liang and Longbo Huang
Abstract summary: We study the generalization of a widely used meta-learning approach, Model-Agnostic Meta-Learning (MAML) We provide both upper and lower bounds for the excess risk of MAML, which captures how SGD dynamics affect these generalization bounds. Our theoretical findings are further validated by experiments.
Score: 62.892930625034374
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Despite the superior empirical success of deep meta-learning, theoretical understanding of overparameterized meta-learning is still limited. This paper studies the generalization of a widely used meta-learning approach, Model-Agnostic Meta-Learning (MAML), which aims to find a good initialization for fast adaptation to new tasks. Under a mixed linear regression model, we analyze the generalization properties of MAML trained with SGD in the overparameterized regime. We provide both upper and lower bounds for the excess risk of MAML, which captures how SGD dynamics affect these generalization bounds. With such sharp characterizations, we further explore how various learning parameters impact the generalization capability of overparameterized MAML, including explicitly identifying typical data and task distributions that can achieve diminishing generalization error with overparameterization, and characterizing the impact of adaptation learning rate on both excess risk and the early stopping time. Our theoretical findings are further validated by experiments.

Related papers

Towards Sharper Information-theoretic Generalization Bounds for Meta-Learning [31.843499167305115]
We establish novel single-step information-theoretic bounds for meta-learning. Our bounds exhibit substantial advantages over prior MI- and CMI-based bounds. We provide novel theoretical insights into the generalization behavior of two classes of noise and iterative meta-learning algorithms.
arXiv Detail & Related papers (2025-01-26T15:22:04Z)
Learn from Downstream and Be Yourself in Multimodal Large Language Model Fine-Tuning [104.27224674122313]
Fine-tuning MLLM has become a common practice to improve performance on specific downstream tasks. To balance the trade-off between generalization and specialization, we propose measuring the parameter importance for both pre-trained and fine-tuning distributions.
arXiv Detail & Related papers (2024-11-17T01:16:37Z)
Theoretical Characterization of the Generalization Performance of Overfitted Meta-Learning [70.52689048213398]
This paper studies the performance of overfitted meta-learning under a linear regression model with Gaussian features. We find new and interesting properties that do not exist in single-task linear regression. Our analysis suggests that benign overfitting is more significant and easier to observe when the noise and the diversity/fluctuation of the ground truth of each training task are large.
arXiv Detail & Related papers (2023-04-09T20:36:13Z)
A Generalized Weighted Optimization Method for Computational Learning and Inversion [15.535124460414588]
We analyze a generalized weighted least-squares optimization method for computational learning and inversion with noisy data. We characterize the impact of the weighting scheme on the generalization error of the learning method. We demonstrate that appropriate weighting from prior knowledge can improve the generalization capability of the learned model.
arXiv Detail & Related papers (2022-01-23T10:35:34Z)
Generalization Bounds For Meta-Learning: An Information-Theoretic Analysis [8.028776552383365]
We propose a generic understanding of both the conventional learning-to-learn framework and the modern model-agnostic meta-learning algorithms. We provide a data-dependent generalization bound for a variant of MAML, which is non-vacuous for deep few-shot learning.
arXiv Detail & Related papers (2021-09-29T17:45:54Z)
MAML is a Noisy Contrastive Learner [72.04430033118426]
Model-agnostic meta-learning (MAML) is one of the most popular and widely-adopted meta-learning algorithms nowadays. We provide a new perspective to the working mechanism of MAML and discover that: MAML is analogous to a meta-learner using a supervised contrastive objective function. We propose a simple but effective technique, zeroing trick, to alleviate such interference.
arXiv Detail & Related papers (2021-06-29T12:52:26Z)
On the Generalization of Stochastic Gradient Descent with Momentum [58.900860437254885]
We first show that there exists a convex loss function for which algorithmic stability fails to establish generalization guarantees. For smooth Lipschitz loss functions, we analyze a modified momentum-based update rule, and show that it admits an upper-bound on the generalization error. For the special case of strongly convex loss functions, we find a range of momentum such that multiple epochs of standard SGDM, as a special form of SGDEM, also generalizes.
arXiv Detail & Related papers (2021-02-26T18:58:29Z)
On Fast Adversarial Robustness Adaptation in Model-Agnostic Meta-Learning [100.14809391594109]
Model-agnostic meta-learning (MAML) has emerged as one of the most successful meta-learning techniques in few-shot learning. Despite the generalization power of the meta-model, it remains elusive that how adversarial robustness can be maintained by MAML in few-shot learning. We propose a general but easily-optimized robustness-regularized meta-learning framework, which allows the use of unlabeled data augmentation, fast adversarial attack generation, and computationally-light fine-tuning.
arXiv Detail & Related papers (2021-02-20T22:03:04Z)
B-SMALL: A Bayesian Neural Network approach to Sparse Model-Agnostic Meta-Learning [2.9189409618561966]
We propose a Bayesian neural network based MAML algorithm, which we refer to as the B-SMALL algorithm. We demonstrate the performance of B-MAML using classification and regression tasks, and highlight that training a sparsifying BNN using MAML indeed improves the parameter footprint of the model.
arXiv Detail & Related papers (2021-01-01T09:19:48Z)

This list is automatically generated from the titles and abstracts of the papers in this site.