Sample Efficient Subspace-based Representations for Nonlinear
Meta-Learning
- URL: http://arxiv.org/abs/2102.07206v1
- Date: Sun, 14 Feb 2021 17:40:04 GMT
- Title: Sample Efficient Subspace-based Representations for Nonlinear
Meta-Learning
- Authors: Halil Ibrahim Gulluk, Yue Sun, Samet Oymak, Maryam Fazel
- Abstract summary: This work explores a more general class of nonlinear tasks with applications ranging from binary classification to neural nets.
We prove that subspace-based representations can be learned in a sample-efficient manner and provably benefit future tasks in terms of sample complexity.
- Score: 28.2312127482203
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Constructing good representations is critical for learning complex tasks in a
sample efficient manner. In the context of meta-learning, representations can
be constructed from common patterns of previously seen tasks so that a future
task can be learned quickly. While recent works show the benefit of
subspace-based representations, such results are limited to linear-regression
tasks. This work explores a more general class of nonlinear tasks with
applications ranging from binary classification, generalized linear models and
neural nets. We prove that subspace-based representations can be learned in a
sample-efficient manner and provably benefit future tasks in terms of sample
complexity. Numerical results verify the theoretical predictions in
classification and neural-network regression tasks.
Related papers
- Coding schemes in neural networks learning classification tasks [52.22978725954347]
We investigate fully-connected, wide neural networks learning classification tasks.
We show that the networks acquire strong, data-dependent features.
Surprisingly, the nature of the internal representations depends crucially on the neuronal nonlinearity.
arXiv Detail & Related papers (2024-06-24T14:50:05Z) - Towards Scalable and Versatile Weight Space Learning [51.78426981947659]
This paper introduces the SANE approach to weight-space learning.
Our method extends the idea of hyper-representations towards sequential processing of subsets of neural network weights.
arXiv Detail & Related papers (2024-06-14T13:12:07Z) - A Quantitative Approach to Predicting Representational Learning and
Performance in Neural Networks [5.544128024203989]
Key property of neural networks is how they learn to represent and manipulate input information in order to solve a task.
We introduce a new pseudo-kernel based tool for analyzing and predicting learned representations.
arXiv Detail & Related papers (2023-07-14T18:39:04Z) - Provable Multi-Task Representation Learning by Two-Layer ReLU Neural Networks [69.38572074372392]
We present the first results proving that feature learning occurs during training with a nonlinear model on multiple tasks.
Our key insight is that multi-task pretraining induces a pseudo-contrastive loss that favors representations that align points that typically have the same label across tasks.
arXiv Detail & Related papers (2023-07-13T16:39:08Z) - Probing Representation Forgetting in Supervised and Unsupervised
Continual Learning [14.462797749666992]
Catastrophic forgetting is associated with an abrupt loss of knowledge previously learned by a model.
We show that representation forgetting can lead to new insights on the effect of model capacity and loss function used in continual learning.
arXiv Detail & Related papers (2022-03-24T23:06:08Z) - Trace norm regularization for multi-task learning with scarce data [20.085733305266572]
This work provides the first estimation error bound for the trace norm regularized estimator when the number of samples per task is small.
The advantages of trace norm regularization for learning data-scarce tasks extend to meta-learning and are confirmed empirically on synthetic datasets.
arXiv Detail & Related papers (2022-02-14T14:18:31Z) - How Fine-Tuning Allows for Effective Meta-Learning [50.17896588738377]
We present a theoretical framework for analyzing representations derived from a MAML-like algorithm.
We provide risk bounds on the best predictor found by fine-tuning via gradient descent, demonstrating that the algorithm can provably leverage the shared structure.
This separation result underscores the benefit of fine-tuning-based methods, such as MAML, over methods with "frozen representation" objectives in few-shot learning.
arXiv Detail & Related papers (2021-05-05T17:56:00Z) - Learning Purified Feature Representations from Task-irrelevant Labels [18.967445416679624]
We propose a novel learning framework called PurifiedLearning to exploit task-irrelevant features extracted from task-irrelevant labels.
Our work is built on solid theoretical analysis and extensive experiments, which demonstrate the effectiveness of PurifiedLearning.
arXiv Detail & Related papers (2021-02-22T12:50:49Z) - Latent Representation Prediction Networks [0.0]
We find this principle of learning representations unsatisfying.
We propose a new way of jointly learning this representation along with the prediction function.
Our approach is shown to be more sample-efficient than standard reinforcement learning methods.
arXiv Detail & Related papers (2020-09-20T14:26:03Z) - Learning What Makes a Difference from Counterfactual Examples and
Gradient Supervision [57.14468881854616]
We propose an auxiliary training objective that improves the generalization capabilities of neural networks.
We use pairs of minimally-different examples with different labels, a.k.a counterfactual or contrasting examples, which provide a signal indicative of the underlying causal structure of the task.
Models trained with this technique demonstrate improved performance on out-of-distribution test sets.
arXiv Detail & Related papers (2020-04-20T02:47:49Z) - Provable Meta-Learning of Linear Representations [114.656572506859]
We provide fast, sample-efficient algorithms to address the dual challenges of learning a common set of features from multiple, related tasks, and transferring this knowledge to new, unseen tasks.
We also provide information-theoretic lower bounds on the sample complexity of learning these linear features.
arXiv Detail & Related papers (2020-02-26T18:21:34Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.