Theoretical Characterization of the Generalization Performance of
Overfitted Meta-Learning
- URL: http://arxiv.org/abs/2304.04312v1
- Date: Sun, 9 Apr 2023 20:36:13 GMT
- Title: Theoretical Characterization of the Generalization Performance of
Overfitted Meta-Learning
- Authors: Peizhong Ju, Yingbin Liang, Ness B. Shroff
- Abstract summary: This paper studies the performance of overfitted meta-learning under a linear regression model with Gaussian features.
We find new and interesting properties that do not exist in single-task linear regression.
Our analysis suggests that benign overfitting is more significant and easier to observe when the noise and the diversity/fluctuation of the ground truth of each training task are large.
- Score: 70.52689048213398
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Meta-learning has arisen as a successful method for improving training
performance by training over many similar tasks, especially with deep neural
networks (DNNs). However, the theoretical understanding of when and why
overparameterized models such as DNNs can generalize well in meta-learning is
still limited. As an initial step towards addressing this challenge, this paper
studies the generalization performance of overfitted meta-learning under a
linear regression model with Gaussian features. In contrast to a few recent
studies along the same line, our framework allows the number of model
parameters to be arbitrarily larger than the number of features in the ground
truth signal, and hence naturally captures the overparameterized regime in
practical deep meta-learning. We show that the overfitted min $\ell_2$-norm
solution of model-agnostic meta-learning (MAML) can be beneficial, which is
similar to the recent remarkable findings on ``benign overfitting'' and
``double descent'' phenomenon in the classical (single-task) linear regression.
However, due to the uniqueness of meta-learning such as task-specific gradient
descent inner training and the diversity/fluctuation of the ground-truth
signals among training tasks, we find new and interesting properties that do
not exist in single-task linear regression. We first provide a high-probability
upper bound (under reasonable tightness) on the generalization error, where
certain terms decrease when the number of features increases. Our analysis
suggests that benign overfitting is more significant and easier to observe when
the noise and the diversity/fluctuation of the ground truth of each training
task are large. Under this circumstance, we show that the overfitted min
$\ell_2$-norm solution can achieve an even lower generalization error than the
underparameterized solution.
Related papers
- What Do Learning Dynamics Reveal About Generalization in LLM Reasoning? [83.83230167222852]
We find that a model's generalization behavior can be effectively characterized by a training metric we call pre-memorization train accuracy.
By connecting a model's learning behavior to its generalization, pre-memorization train accuracy can guide targeted improvements to training strategies.
arXiv Detail & Related papers (2024-11-12T09:52:40Z) - Transformers are Minimax Optimal Nonparametric In-Context Learners [36.291980654891496]
In-context learning of large language models has proven to be a surprisingly effective method of learning a new task from only a few demonstrative examples.
We develop approximation and generalization error bounds for a transformer composed of a deep neural network and one linear attention layer.
We show that sufficiently trained transformers can achieve -- and even improve upon -- the minimax optimal estimation risk in context.
arXiv Detail & Related papers (2024-08-22T08:02:10Z) - Just How Flexible are Neural Networks in Practice? [89.80474583606242]
It is widely believed that a neural network can fit a training set containing at least as many samples as it has parameters.
In practice, however, we only find solutions via our training procedure, including the gradient and regularizers, limiting flexibility.
arXiv Detail & Related papers (2024-06-17T12:24:45Z) - Rethinking Classifier Re-Training in Long-Tailed Recognition: A Simple
Logits Retargeting Approach [102.0769560460338]
We develop a simple logits approach (LORT) without the requirement of prior knowledge of the number of samples per class.
Our method achieves state-of-the-art performance on various imbalanced datasets, including CIFAR100-LT, ImageNet-LT, and iNaturalist 2018.
arXiv Detail & Related papers (2024-03-01T03:27:08Z) - On the Dynamics Under the Unhinged Loss and Beyond [104.49565602940699]
We introduce the unhinged loss, a concise loss function, that offers more mathematical opportunities to analyze closed-form dynamics.
The unhinged loss allows for considering more practical techniques, such as time-vary learning rates and feature normalization.
arXiv Detail & Related papers (2023-12-13T02:11:07Z) - Learning Low Dimensional State Spaces with Overparameterized Recurrent
Neural Nets [57.06026574261203]
We provide theoretical evidence for learning low-dimensional state spaces, which can also model long-term memory.
Experiments corroborate our theory, demonstrating extrapolation via learning low-dimensional state spaces with both linear and non-linear RNNs.
arXiv Detail & Related papers (2022-10-25T14:45:15Z) - A Farewell to the Bias-Variance Tradeoff? An Overview of the Theory of
Overparameterized Machine Learning [37.01683478234978]
The rapid recent progress in machine learning (ML) has raised a number of scientific questions that challenge the longstanding dogma of the field.
One of the most important riddles is the good empirical generalization of over parameterized models.
arXiv Detail & Related papers (2021-09-06T10:48:40Z) - On the Treatment of Optimization Problems with L1 Penalty Terms via
Multiobjective Continuation [0.0]
We present a novel algorithm that allows us to gain detailed insight into the effects of sparsity in linear and nonlinear optimization.
Our method can be seen as a generalization of well-known homotopy methods for linear regression problems to the nonlinear case.
arXiv Detail & Related papers (2020-12-14T13:00:50Z) - Generalization Error of Generalized Linear Models in High Dimensions [25.635225717360466]
We provide a framework to characterize neural networks with arbitrary non-linearities.
We analyze the effect of regular logistic regression on learning.
Our model also captures examples between training and distributions special cases.
arXiv Detail & Related papers (2020-05-01T02:17:47Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.