Theoretical Characterization of the Generalization Performance of
Overfitted Meta-Learning
- URL: http://arxiv.org/abs/2304.04312v1
- Date: Sun, 9 Apr 2023 20:36:13 GMT
- Title: Theoretical Characterization of the Generalization Performance of
Overfitted Meta-Learning
- Authors: Peizhong Ju, Yingbin Liang, Ness B. Shroff
- Abstract summary: This paper studies the performance of overfitted meta-learning under a linear regression model with Gaussian features.
We find new and interesting properties that do not exist in single-task linear regression.
Our analysis suggests that benign overfitting is more significant and easier to observe when the noise and the diversity/fluctuation of the ground truth of each training task are large.
- Score: 70.52689048213398
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Meta-learning has arisen as a successful method for improving training
performance by training over many similar tasks, especially with deep neural
networks (DNNs). However, the theoretical understanding of when and why
overparameterized models such as DNNs can generalize well in meta-learning is
still limited. As an initial step towards addressing this challenge, this paper
studies the generalization performance of overfitted meta-learning under a
linear regression model with Gaussian features. In contrast to a few recent
studies along the same line, our framework allows the number of model
parameters to be arbitrarily larger than the number of features in the ground
truth signal, and hence naturally captures the overparameterized regime in
practical deep meta-learning. We show that the overfitted min $\ell_2$-norm
solution of model-agnostic meta-learning (MAML) can be beneficial, which is
similar to the recent remarkable findings on ``benign overfitting'' and
``double descent'' phenomenon in the classical (single-task) linear regression.
However, due to the uniqueness of meta-learning such as task-specific gradient
descent inner training and the diversity/fluctuation of the ground-truth
signals among training tasks, we find new and interesting properties that do
not exist in single-task linear regression. We first provide a high-probability
upper bound (under reasonable tightness) on the generalization error, where
certain terms decrease when the number of features increases. Our analysis
suggests that benign overfitting is more significant and easier to observe when
the noise and the diversity/fluctuation of the ground truth of each training
task are large. Under this circumstance, we show that the overfitted min
$\ell_2$-norm solution can achieve an even lower generalization error than the
underparameterized solution.
Related papers
- Transformers are Minimax Optimal Nonparametric In-Context Learners [36.291980654891496]
In-context learning of large language models has proven to be a surprisingly effective method of learning a new task from only a few demonstrative examples.
We develop approximation and generalization error bounds for a transformer composed of a deep neural network and one linear attention layer.
We show that sufficiently trained transformers can achieve -- and even improve upon -- the minimax optimal estimation risk in context.
arXiv Detail & Related papers (2024-08-22T08:02:10Z) - Just How Flexible are Neural Networks in Practice? [89.80474583606242]
It is widely believed that a neural network can fit a training set containing at least as many samples as it has parameters.
In practice, however, we only find solutions via our training procedure, including the gradient and regularizers, limiting flexibility.
arXiv Detail & Related papers (2024-06-17T12:24:45Z) - Rethinking Classifier Re-Training in Long-Tailed Recognition: A Simple
Logits Retargeting Approach [102.0769560460338]
We develop a simple logits approach (LORT) without the requirement of prior knowledge of the number of samples per class.
Our method achieves state-of-the-art performance on various imbalanced datasets, including CIFAR100-LT, ImageNet-LT, and iNaturalist 2018.
arXiv Detail & Related papers (2024-03-01T03:27:08Z) - On the Dynamics Under the Unhinged Loss and Beyond [104.49565602940699]
We introduce the unhinged loss, a concise loss function, that offers more mathematical opportunities to analyze closed-form dynamics.
The unhinged loss allows for considering more practical techniques, such as time-vary learning rates and feature normalization.
arXiv Detail & Related papers (2023-12-13T02:11:07Z) - Learning Low Dimensional State Spaces with Overparameterized Recurrent
Neural Nets [57.06026574261203]
We provide theoretical evidence for learning low-dimensional state spaces, which can also model long-term memory.
Experiments corroborate our theory, demonstrating extrapolation via learning low-dimensional state spaces with both linear and non-linear RNNs.
arXiv Detail & Related papers (2022-10-25T14:45:15Z) - Provable Generalization of Overparameterized Meta-learning Trained with
SGD [62.892930625034374]
We study the generalization of a widely used meta-learning approach, Model-Agnostic Meta-Learning (MAML)
We provide both upper and lower bounds for the excess risk of MAML, which captures how SGD dynamics affect these generalization bounds.
Our theoretical findings are further validated by experiments.
arXiv Detail & Related papers (2022-06-18T07:22:57Z) - Generating meta-learning tasks to evolve parametric loss for
classification learning [1.1355370218310157]
In existing meta-learning approaches, learning tasks for training meta-models are usually collected from public datasets.
We propose a meta-learning approach based on randomly generated meta-learning tasks to obtain a parametric loss for classification learning based on big data.
arXiv Detail & Related papers (2021-11-20T13:07:55Z) - A Farewell to the Bias-Variance Tradeoff? An Overview of the Theory of
Overparameterized Machine Learning [37.01683478234978]
The rapid recent progress in machine learning (ML) has raised a number of scientific questions that challenge the longstanding dogma of the field.
One of the most important riddles is the good empirical generalization of over parameterized models.
arXiv Detail & Related papers (2021-09-06T10:48:40Z) - On the Treatment of Optimization Problems with L1 Penalty Terms via
Multiobjective Continuation [0.0]
We present a novel algorithm that allows us to gain detailed insight into the effects of sparsity in linear and nonlinear optimization.
Our method can be seen as a generalization of well-known homotopy methods for linear regression problems to the nonlinear case.
arXiv Detail & Related papers (2020-12-14T13:00:50Z) - Generalization Error of Generalized Linear Models in High Dimensions [25.635225717360466]
We provide a framework to characterize neural networks with arbitrary non-linearities.
We analyze the effect of regular logistic regression on learning.
Our model also captures examples between training and distributions special cases.
arXiv Detail & Related papers (2020-05-01T02:17:47Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.