Learning where to learn: Gradient sparsity in meta and continual
learning
- URL: http://arxiv.org/abs/2110.14402v1
- Date: Wed, 27 Oct 2021 12:54:36 GMT
- Title: Learning where to learn: Gradient sparsity in meta and continual
learning
- Authors: Johannes von Oswald, Dominic Zhao, Seijin Kobayashi, Simon Schug,
Massimo Caccia, Nicolas Zucchet, Jo\~ao Sacramento
- Abstract summary: We show that meta-learning can be improved by letting the learning algorithm decide which weights to change.
We find that patterned sparsity emerges from this process, with the pattern of sparsity varying on a problem-by-problem basis.
Our results shed light on an ongoing debate on whether meta-learning can discover adaptable features and suggest that learning by sparse gradient descent is a powerful inductive bias for meta-learning systems.
- Score: 4.845285139609619
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Finding neural network weights that generalize well from small datasets is
difficult. A promising approach is to learn a weight initialization such that a
small number of weight changes results in low generalization error. We show
that this form of meta-learning can be improved by letting the learning
algorithm decide which weights to change, i.e., by learning where to learn. We
find that patterned sparsity emerges from this process, with the pattern of
sparsity varying on a problem-by-problem basis. This selective sparsity results
in better generalization and less interference in a range of few-shot and
continual learning problems. Moreover, we find that sparse learning also
emerges in a more expressive model where learning rates are meta-learned. Our
results shed light on an ongoing debate on whether meta-learning can discover
adaptable features and suggest that learning by sparse gradient descent is a
powerful inductive bias for meta-learning systems.
Related papers
- Ticketed Learning-Unlearning Schemes [57.89421552780526]
We propose a new ticketed model for learning--unlearning.
We provide space-efficient ticketed learning--unlearning schemes for a broad family of concept classes.
arXiv Detail & Related papers (2023-06-27T18:54:40Z) - Towards Scalable Adaptive Learning with Graph Neural Networks and
Reinforcement Learning [0.0]
We introduce a flexible and scalable approach towards the problem of learning path personalization.
Our model is a sequential recommender system based on a graph neural network.
Our results demonstrate that it can learn to make good recommendations in the small-data regime.
arXiv Detail & Related papers (2023-05-10T18:16:04Z) - Continual Learning by Modeling Intra-Class Variation [33.30614232534283]
It has been observed that neural networks perform poorly when the data or tasks are presented sequentially.
Unlike humans, neural networks suffer greatly from catastrophic forgetting, making it impossible to perform life-long learning.
We examine memory-based continual learning and identify that large variation in the representation space is crucial for avoiding catastrophic forgetting.
arXiv Detail & Related papers (2022-10-11T12:17:43Z) - Learning an Explicit Hyperparameter Prediction Function Conditioned on
Tasks [62.63852372239708]
Meta learning aims to learn the learning methodology for machine learning from observed tasks, so as to generalize to new query tasks.
We interpret such learning methodology as learning an explicit hyper- parameter prediction function shared by all training tasks.
Such setting guarantees that the meta-learned learning methodology is able to flexibly fit diverse query tasks.
arXiv Detail & Related papers (2021-07-06T04:05:08Z) - Variable-Shot Adaptation for Online Meta-Learning [123.47725004094472]
We study the problem of learning new tasks from a small, fixed number of examples, by meta-learning across static data from a set of previous tasks.
We find that meta-learning solves the full task set with fewer overall labels and greater cumulative performance, compared to standard supervised methods.
These results suggest that meta-learning is an important ingredient for building learning systems that continuously learn and improve over a sequence of problems.
arXiv Detail & Related papers (2020-12-14T18:05:24Z) - Deep Reinforcement Learning for Adaptive Learning Systems [4.8685842576962095]
We formulate the problem of how to find an individualized learning plan based on learner's latent traits.
We apply a model-free deep reinforcement learning algorithm that can effectively find the optimal learning policy.
We also develop a transition model estimator that emulates the learner's learning process using neural networks.
arXiv Detail & Related papers (2020-04-17T18:04:03Z) - Meta-Meta Classification for One-Shot Learning [11.27833234287093]
We present a new approach, called meta-meta classification, to learning in small-data settings.
In this approach, one uses a large set of learning problems to design an ensemble of learners, where each learner has high bias and low variance.
We evaluate the approach on a one-shot, one-class-versus-all classification task and show that it is able to outperform traditional meta-learning as well as ensembling approaches.
arXiv Detail & Related papers (2020-04-17T07:05:03Z) - Meta Cyclical Annealing Schedule: A Simple Approach to Avoiding
Meta-Amortization Error [50.83356836818667]
We develop a novel meta-regularization objective using it cyclical annealing schedule and it maximum mean discrepancy (MMD) criterion.
The experimental results show that our approach substantially outperforms standard meta-learning algorithms.
arXiv Detail & Related papers (2020-03-04T04:43:16Z) - Provable Meta-Learning of Linear Representations [114.656572506859]
We provide fast, sample-efficient algorithms to address the dual challenges of learning a common set of features from multiple, related tasks, and transferring this knowledge to new, unseen tasks.
We also provide information-theoretic lower bounds on the sample complexity of learning these linear features.
arXiv Detail & Related papers (2020-02-26T18:21:34Z) - Revisiting Meta-Learning as Supervised Learning [69.2067288158133]
We aim to provide a principled, unifying framework by revisiting and strengthening the connection between meta-learning and traditional supervised learning.
By treating pairs of task-specific data sets and target models as (feature, label) samples, we can reduce many meta-learning algorithms to instances of supervised learning.
This view not only unifies meta-learning into an intuitive and practical framework but also allows us to transfer insights from supervised learning directly to improve meta-learning.
arXiv Detail & Related papers (2020-02-03T06:13:01Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.