Towards Sample-efficient Overparameterized Meta-learning
- URL: http://arxiv.org/abs/2201.06142v1
- Date: Sun, 16 Jan 2022 21:57:17 GMT
- Title: Towards Sample-efficient Overparameterized Meta-learning
- Authors: Yue Sun and Adhyyan Narang and Halil Ibrahim Gulluk and Samet Oymak
and Maryam Fazel
- Abstract summary: An overarching goal in machine learning is to build a generalizable model with few samples.
This paper aims to demystify over parameterization for meta-learning.
We show that learning the optimal representation coincides with the problem of designing a task-aware regularization.
- Score: 37.676063120293044
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: An overarching goal in machine learning is to build a generalizable model
with few samples. To this end, overparameterization has been the subject of
immense interest to explain the generalization ability of deep nets even when
the size of the dataset is smaller than that of the model. While the prior
literature focuses on the classical supervised setting, this paper aims to
demystify overparameterization for meta-learning. Here we have a sequence of
linear-regression tasks and we ask: (1) Given earlier tasks, what is the
optimal linear representation of features for a new downstream task? and (2)
How many samples do we need to build this representation? This work shows that
surprisingly, overparameterization arises as a natural answer to these
fundamental meta-learning questions. Specifically, for (1), we first show that
learning the optimal representation coincides with the problem of designing a
task-aware regularization to promote inductive bias. We leverage this inductive
bias to explain how the downstream task actually benefits from
overparameterization, in contrast to prior works on few-shot learning. For (2),
we develop a theory to explain how feature covariance can implicitly help
reduce the sample complexity well below the degrees of freedom and lead to
small estimation error. We then integrate these findings to obtain an overall
performance guarantee for our meta-learning algorithm. Numerical experiments on
real and synthetic data verify our insights on overparameterized meta-learning.
Related papers
- Transformers are Minimax Optimal Nonparametric In-Context Learners [36.291980654891496]
In-context learning of large language models has proven to be a surprisingly effective method of learning a new task from only a few demonstrative examples.
We develop approximation and generalization error bounds for a transformer composed of a deep neural network and one linear attention layer.
We show that sufficiently trained transformers can achieve -- and even improve upon -- the minimax optimal estimation risk in context.
arXiv Detail & Related papers (2024-08-22T08:02:10Z) - Parameter-Efficient and Memory-Efficient Tuning for Vision Transformer: A Disentangled Approach [87.8330887605381]
We show how to adapt a pre-trained Vision Transformer to downstream recognition tasks with only a few learnable parameters.
We synthesize a task-specific query with a learnable and lightweight module, which is independent of the pre-trained model.
Our method achieves state-of-the-art performance under memory constraints, showcasing its applicability in real-world situations.
arXiv Detail & Related papers (2024-07-09T15:45:04Z) - CorDA: Context-Oriented Decomposition Adaptation of Large Language Models for Task-Aware Parameter-Efficient Fine-tuning [101.81127587760831]
Current fine-tuning methods build adapters widely of the context of downstream task to learn, or the context of important knowledge to maintain.
We propose CorDA, a Context-oriented Decomposition Adaptation method that builds learnable task-aware adapters.
Our method enables two options, the knowledge-preserved adaptation and the instruction-previewed adaptation.
arXiv Detail & Related papers (2024-06-07T19:10:35Z) - Theoretical Characterization of the Generalization Performance of
Overfitted Meta-Learning [70.52689048213398]
This paper studies the performance of overfitted meta-learning under a linear regression model with Gaussian features.
We find new and interesting properties that do not exist in single-task linear regression.
Our analysis suggests that benign overfitting is more significant and easier to observe when the noise and the diversity/fluctuation of the ground truth of each training task are large.
arXiv Detail & Related papers (2023-04-09T20:36:13Z) - The Trade-off between Universality and Label Efficiency of
Representations from Contrastive Learning [32.15608637930748]
We show that there exists a trade-off between the two desiderata so that one may not be able to achieve both simultaneously.
We provide analysis using a theoretical data model and show that, while more diverse pre-training data result in more diverse features for different tasks, it puts less emphasis on task-specific features.
arXiv Detail & Related papers (2023-02-28T22:14:33Z) - Generalization on the Unseen, Logic Reasoning and Degree Curriculum [25.7378861650474]
This paper considers the learning of logical (Boolean) functions with a focus on the generalization on the unseen (GOTU) setting.
We study how different network architectures trained by (S)GD perform under GOTU.
More specifically, this means an interpolator of the training data that has minimal Fourier mass on the higher degree basis elements.
arXiv Detail & Related papers (2023-01-30T17:44:05Z) - From Canonical Correlation Analysis to Self-supervised Graph Neural
Networks [99.44881722969046]
We introduce a conceptually simple yet effective model for self-supervised representation learning with graph data.
We optimize an innovative feature-level objective inspired by classical Canonical Correlation Analysis.
Our method performs competitively on seven public graph datasets.
arXiv Detail & Related papers (2021-06-23T15:55:47Z) - Tailoring: encoding inductive biases by optimizing unsupervised
objectives at prediction time [34.03150701567508]
Adding auxiliary losses to the main objective function is a general way of encoding biases that can help networks learn better representations.
In this work we take inspiration from textittransductive learning and note that after receiving an input, we can fine-tune our networks on any unsupervised loss.
We formulate em meta-tailoring, a nested optimization similar to that in meta-learning, and train our models to perform well on the task objective after adapting them using an unsupervised loss.
arXiv Detail & Related papers (2020-09-22T15:26:24Z) - Learning What Makes a Difference from Counterfactual Examples and
Gradient Supervision [57.14468881854616]
We propose an auxiliary training objective that improves the generalization capabilities of neural networks.
We use pairs of minimally-different examples with different labels, a.k.a counterfactual or contrasting examples, which provide a signal indicative of the underlying causal structure of the task.
Models trained with this technique demonstrate improved performance on out-of-distribution test sets.
arXiv Detail & Related papers (2020-04-20T02:47:49Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.