Related papers: A Sample Complexity Separation between Non-Convex and Convex Meta-Learning

A Sample Complexity Separation between Non-Convex and Convex Meta-Learning

URL: http://arxiv.org/abs/2002.11172v1
Date: Tue, 25 Feb 2020 20:55:09 GMT
Title: A Sample Complexity Separation between Non-Convex and Convex Meta-Learning
Authors: Nikunj Saunshi, Yi Zhang, Mikhail Khodak, Sanjeev Arora
Abstract summary: One popular trend in meta-learning is to learn from many tasks a common method that can be used to solve a new task with few samples. This paper shows that it is important to understand the optimization blackbox, specifically at the subspaces of a linear network. analyses of these methods reveal that they can meta-learn the correct subspace onto which the data should be projected.
Score: 42.51788412283446
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: One popular trend in meta-learning is to learn from many training tasks a common initialization for a gradient-based method that can be used to solve a new task with few samples. The theory of meta-learning is still in its early stages, with several recent learning-theoretic analyses of methods such as Reptile [Nichol et al., 2018] being for convex models. This work shows that convex-case analysis might be insufficient to understand the success of meta-learning, and that even for non-convex models it is important to look inside the optimization black-box, specifically at properties of the optimization trajectory. We construct a simple meta-learning instance that captures the problem of one-dimensional subspace learning. For the convex formulation of linear regression on this instance, we show that the new task sample complexity of any initialization-based meta-learning algorithm is $\Omega(d)$, where $d$ is the input dimension. In contrast, for the non-convex formulation of a two layer linear network on the same instance, we show that both Reptile and multi-task representation learning can have new task sample complexity of $\mathcal{O}(1)$, demonstrating a separation from convex meta-learning. Crucially, analyses of the training dynamics of these methods reveal that they can meta-learn the correct subspace onto which the data should be projected.

Related papers

Learnable Loss Geometries with Mirror Descent for Scalable and Convergent Meta-Learning [41.28925127311434]
meta-learning is a principled approach to learning a new task with limited data records.<n>We present a novel method for preconditioning that speeds up convergence of the per-task training.<n>Tests on few-shot learning datasets demonstrate the superior empirical performance of the novel algorithm.
arXiv Detail & Related papers (2025-09-02T15:23:21Z)
Theoretical Characterization of the Generalization Performance of Overfitted Meta-Learning [70.52689048213398]
This paper studies the performance of overfitted meta-learning under a linear regression model with Gaussian features. We find new and interesting properties that do not exist in single-task linear regression. Our analysis suggests that benign overfitting is more significant and easier to observe when the noise and the diversity/fluctuation of the ground truth of each training task are large.
arXiv Detail & Related papers (2023-04-09T20:36:13Z)
Learning Tensor Representations for Meta-Learning [8.185750946886001]
We introduce a tensor-based model of shared representation for meta-learning from a diverse set of tasks. Substituting the estimated tensor from the first step allows us estimating the task-specific parameters with very few samples of the new task.
arXiv Detail & Related papers (2022-01-18T23:01:35Z)
Simple Stochastic and Online Gradient DescentAlgorithms for Pairwise Learning [65.54757265434465]
Pairwise learning refers to learning tasks where the loss function depends on a pair instances. Online descent (OGD) is a popular approach to handle streaming data in pairwise learning. In this paper, we propose simple and online descent to methods for pairwise learning.
arXiv Detail & Related papers (2021-11-23T18:10:48Z)
Meta-Learning with Adjoint Methods [16.753336086160598]
A Meta-Learning (MAML) is widely used to find a good initialization for a family of tasks. Despite its success, a critical challenge in MAML is to calculate the gradient w.r.t the initialization of a long training trajectory for the sampled tasks. We propose Adjoint MAML (A-MAML) to address this problem. We demonstrate the advantage of our approach in both synthetic and real-world meta-learning tasks.
arXiv Detail & Related papers (2021-10-16T01:18:50Z)
A Representation Learning Perspective on the Importance of Train-Validation Splitting in Meta-Learning [14.720411598827365]
splitting data from each task into train and validation sets during meta-training. We argue that the train-validation split encourages the learned representation to be low-rank without compromising on expressivity. Since sample efficiency benefits from low-rankness, the splitting strategy will require very few samples to solve unseen test tasks.
arXiv Detail & Related papers (2021-06-29T17:59:33Z)
Fast Few-Shot Classification by Few-Iteration Meta-Learning [173.32497326674775]
We introduce a fast optimization-based meta-learning method for few-shot classification. Our strategy enables important aspects of the base learner objective to be learned during meta-training. We perform a comprehensive experimental analysis, demonstrating the speed and effectiveness of our approach.
arXiv Detail & Related papers (2020-10-01T15:59:31Z)
Provable Meta-Learning of Linear Representations [114.656572506859]
We provide fast, sample-efficient algorithms to address the dual challenges of learning a common set of features from multiple, related tasks, and transferring this knowledge to new, unseen tasks. We also provide information-theoretic lower bounds on the sample complexity of learning these linear features.
arXiv Detail & Related papers (2020-02-26T18:21:34Z)
Incremental Meta-Learning via Indirect Discriminant Alignment [118.61152684795178]
We develop a notion of incremental learning during the meta-training phase of meta-learning. Our approach performs favorably at test time as compared to training a model with the full meta-training set.
arXiv Detail & Related papers (2020-02-11T01:39:12Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.