Continual Few-Shot Learning Using HyperTransformers
- URL: http://arxiv.org/abs/2301.04584v2
- Date: Thu, 12 Jan 2023 19:44:13 GMT
- Title: Continual Few-Shot Learning Using HyperTransformers
- Authors: Max Vladymyrov, Andrey Zhmoginov, Mark Sandler
- Abstract summary: We focus on the problem of learning without forgetting from multiple tasks arriving sequentially, where each task is defined using a few-shot episode of novel or already seen classes.
We approach this problem using the recently published HyperTransformer (HT), a Transformer-based hypernetwork that generates specialized task-specific CNN weights directly from the support set.
This way, the generated CNN weights themselves act as a representation of previously learned tasks, and the HT is trained to update these weights so that the new task can be learned without forgetting past tasks.
- Score: 14.412066456583917
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We focus on the problem of learning without forgetting from multiple tasks
arriving sequentially, where each task is defined using a few-shot episode of
novel or already seen classes. We approach this problem using the recently
published HyperTransformer (HT), a Transformer-based hypernetwork that
generates specialized task-specific CNN weights directly from the support set.
In order to learn from a continual sequence of tasks, we propose to recursively
re-use the generated weights as input to the HT for the next task. This way,
the generated CNN weights themselves act as a representation of previously
learned tasks, and the HT is trained to update these weights so that the new
task can be learned without forgetting past tasks. This approach is different
from most continual learning algorithms that typically rely on using replay
buffers, weight regularization or task-dependent architectural changes. We
demonstrate that our proposed Continual HyperTransformer method equipped with a
prototypical loss is capable of learning and retaining knowledge about past
tasks for a variety of scenarios, including learning from mini-batches, and
task-incremental and class-incremental learning scenarios.
Related papers
- Dynamic Transformer Architecture for Continual Learning of Multimodal
Tasks [27.59758964060561]
Transformer neural networks are increasingly replacing prior architectures in a wide range of applications in different data modalities.
Continual learning (CL) emerges as a solution by facilitating the transfer of knowledge across tasks that arrive sequentially for an autonomously learning agent.
We propose a transformer-based CL framework focusing on learning tasks that involve both vision and language.
arXiv Detail & Related papers (2024-01-27T03:03:30Z) - Generalization to New Sequential Decision Making Tasks with In-Context
Learning [23.36106067650874]
Training autonomous agents that can learn new tasks from only a handful of demonstrations is a long-standing problem in machine learning.
In this paper, we show that naively applying transformers to sequential decision making problems does not enable in-context learning of new tasks.
We investigate different design choices and find that larger model and dataset sizes, as well as more task diversity, environmentity, and trajectory burstiness, all result in better in-context learning of new out-of-distribution tasks.
arXiv Detail & Related papers (2023-12-06T15:19:28Z) - CLR: Channel-wise Lightweight Reprogramming for Continual Learning [63.94773340278971]
Continual learning aims to emulate the human ability to continually accumulate knowledge over sequential tasks.
The main challenge is to maintain performance on previously learned tasks after learning new tasks.
We propose a Channel-wise Lightweight Reprogramming approach that helps convolutional neural networks overcome catastrophic forgetting.
arXiv Detail & Related papers (2023-07-21T06:56:21Z) - Provable Multi-Task Representation Learning by Two-Layer ReLU Neural Networks [69.38572074372392]
We present the first results proving that feature learning occurs during training with a nonlinear model on multiple tasks.
Our key insight is that multi-task pretraining induces a pseudo-contrastive loss that favors representations that align points that typically have the same label across tasks.
arXiv Detail & Related papers (2023-07-13T16:39:08Z) - Incremental Task Learning with Incremental Rank Updates [20.725181015069435]
We propose a new incremental task learning framework based on low-rank factorization.
We show that our approach performs better than the current state-of-the-art methods in terms of accuracy and forgetting.
arXiv Detail & Related papers (2022-07-19T05:21:14Z) - Fast Inference and Transfer of Compositional Task Structures for
Few-shot Task Generalization [101.72755769194677]
We formulate it as a few-shot reinforcement learning problem where a task is characterized by a subtask graph.
Our multi-task subtask graph inferencer (MTSGI) first infers the common high-level task structure in terms of the subtask graph from the training tasks.
Our experiment results on 2D grid-world and complex web navigation domains show that the proposed method can learn and leverage the common underlying structure of the tasks for faster adaptation to the unseen tasks.
arXiv Detail & Related papers (2022-05-25T10:44:25Z) - Task Adaptive Parameter Sharing for Multi-Task Learning [114.80350786535952]
Adaptive Task Adapting Sharing (TAPS) is a method for tuning a base model to a new task by adaptively modifying a small, task-specific subset of layers.
Compared to other methods, TAPS retains high accuracy on downstream tasks while introducing few task-specific parameters.
We evaluate our method on a suite of fine-tuning tasks and architectures (ResNet, DenseNet, ViT) and show that it achieves state-of-the-art performance while being simple to implement.
arXiv Detail & Related papers (2022-03-30T23:16:07Z) - Rectification-based Knowledge Retention for Continual Learning [49.1447478254131]
Deep learning models suffer from catastrophic forgetting when trained in an incremental learning setting.
We propose a novel approach to address the task incremental learning problem, which involves training a model on new tasks that arrive in an incremental manner.
Our approach can be used in both the zero-shot and non zero-shot task incremental learning settings.
arXiv Detail & Related papers (2021-03-30T18:11:30Z) - Ternary Feature Masks: zero-forgetting for task-incremental learning [68.34518408920661]
We propose an approach without any forgetting to continual learning for the task-aware regime.
By using ternary masks we can upgrade a model to new tasks, reusing knowledge from previous tasks while not forgetting anything about them.
Our method outperforms current state-of-the-art while reducing memory overhead in comparison to weight-based approaches.
arXiv Detail & Related papers (2020-01-23T18:08:37Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.