Dynamic Transformer Architecture for Continual Learning of Multimodal
Tasks
- URL: http://arxiv.org/abs/2401.15275v1
- Date: Sat, 27 Jan 2024 03:03:30 GMT
- Title: Dynamic Transformer Architecture for Continual Learning of Multimodal
Tasks
- Authors: Yuliang Cai and Mohammad Rostami
- Abstract summary: Transformer neural networks are increasingly replacing prior architectures in a wide range of applications in different data modalities.
Continual learning (CL) emerges as a solution by facilitating the transfer of knowledge across tasks that arrive sequentially for an autonomously learning agent.
We propose a transformer-based CL framework focusing on learning tasks that involve both vision and language.
- Score: 27.59758964060561
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Transformer neural networks are increasingly replacing prior architectures in
a wide range of applications in different data modalities. The increasing size
and computational demands of fine-tuning large pre-trained transformer neural
networks pose significant challenges for the widespread adoption of these
models for applications that demand on-edge computing. To tackle this
challenge, continual learning (CL) emerges as a solution by facilitating the
transfer of knowledge across tasks that arrive sequentially for an autonomously
learning agent. However, current CL methods mainly focus on learning tasks that
are exclusively vision-based or language-based. We propose a transformer-based
CL framework focusing on learning tasks that involve both vision and language,
known as Vision-and-Language (VaL) tasks. Due to the success of transformers in
other modalities, our architecture has the potential to be used in multimodal
learning settings. In our framework, we benefit from introducing extra
parameters to a base transformer to specialize the network for each task. As a
result, we enable dynamic model expansion to learn several tasks in a sequence.
We also use knowledge distillation to benefit from relevant past experiences to
learn the current task more efficiently. Our proposed method, Task Attentive
Multimodal Continual Learning (TAM-CL), allows for the exchange of information
between tasks while mitigating the problem of catastrophic forgetting. Notably,
our approach is scalable, incurring minimal memory and time overhead. TAM-CL
achieves state-of-the-art (SOTA) performance on challenging multimodal tasks
Related papers
- How does Multi-Task Training Affect Transformer In-Context Capabilities? Investigations with Function Classes [6.652837942112205]
Large language models (LLM) have recently shown the extraordinary ability to perform unseen tasks based on few-shot examples provided as text.
We propose several effective curriculum learning strategies that allow ICL models to achieve higher data efficiency and more stable convergence.
Our experiments reveal that ICL models can effectively learn difficult tasks by training on progressively harder tasks while mixing in prior tasks, denoted as mixed curriculum in this work.
arXiv Detail & Related papers (2024-04-04T16:15:23Z) - Mastering Robot Manipulation with Multimodal Prompts through Pretraining and Multi-task Fine-tuning [49.92517970237088]
We tackle the problem of training a robot to understand multimodal prompts.
This type of task poses a major challenge to robots' capability to understand the interconnection and complementarity between vision and language signals.
We introduce an effective framework that learns a policy to perform robot manipulation with multimodal prompts.
arXiv Detail & Related papers (2023-10-14T22:24:58Z) - Task-Attentive Transformer Architecture for Continual Learning of
Vision-and-Language Tasks Using Knowledge Distillation [18.345183818638475]
Continual learning (CL) can serve as a remedy through enabling knowledge-transfer across sequentially arriving tasks.
We develop a transformer-based CL architecture for learning bimodal vision-and-language tasks.
Our approach is scalable learning to a large number of tasks because it requires little memory and time overhead.
arXiv Detail & Related papers (2023-03-25T10:16:53Z) - Continual Learning via Learning a Continual Memory in Vision Transformer [7.116223171323158]
We study task-incremental continual learning (TCL) using Vision Transformers (ViTs)
Our goal is to improve the overall streaming-task performance without catastrophic forgetting by learning task synergies.
We present a Hierarchical task-synergy Exploration-Exploitation (HEE) sampling based neural architecture search (NAS) method for effectively learning task synergies.
arXiv Detail & Related papers (2023-03-14T21:52:27Z) - Effective Adaptation in Multi-Task Co-Training for Unified Autonomous
Driving [103.745551954983]
In this paper, we investigate the transfer performance of various types of self-supervised methods, including MoCo and SimCLR, on three downstream tasks.
We find that their performances are sub-optimal or even lag far behind the single-task baseline.
We propose a simple yet effective pretrain-adapt-finetune paradigm for general multi-task training.
arXiv Detail & Related papers (2022-09-19T12:15:31Z) - Multi-Task Learning with Sequence-Conditioned Transporter Networks [67.57293592529517]
We aim to solve multi-task learning through the lens of sequence-conditioning and weighted sampling.
We propose a new suite of benchmark aimed at compositional tasks, MultiRavens, which allows defining custom task combinations.
Second, we propose a vision-based end-to-end system architecture, Sequence-Conditioned Transporter Networks, which augments Goal-Conditioned Transporter Networks with sequence-conditioning and weighted sampling.
arXiv Detail & Related papers (2021-09-15T21:19:11Z) - Efficient Feature Transformations for Discriminative and Generative
Continual Learning [98.10425163678082]
We propose a simple task-specific feature map transformation strategy for continual learning.
Theses provide powerful flexibility for learning new tasks, achieved with minimal parameters added to the base architecture.
We demonstrate the efficacy and efficiency of our method with an extensive set of experiments in discriminative (CIFAR-100 and ImageNet-1K) and generative sequences of tasks.
arXiv Detail & Related papers (2021-03-25T01:48:14Z) - UPDeT: Universal Multi-agent Reinforcement Learning via Policy
Decoupling with Transformers [108.92194081987967]
We make the first attempt to explore a universal multi-agent reinforcement learning pipeline, designing one single architecture to fit tasks.
Unlike previous RNN-based models, we utilize a transformer-based model to generate a flexible policy.
The proposed model, named as Universal Policy Decoupling Transformer (UPDeT), further relaxes the action restriction and makes the multi-agent task's decision process more explainable.
arXiv Detail & Related papers (2021-01-20T07:24:24Z) - Multi-Task Learning with Deep Neural Networks: A Survey [0.0]
Multi-task learning (MTL) is a subfield of machine learning in which multiple tasks are simultaneously learned by a shared model.
We give an overview of multi-task learning methods for deep neural networks, with the aim of summarizing both the well-established and most recent directions within the field.
arXiv Detail & Related papers (2020-09-10T19:31:04Z) - Reparameterizing Convolutions for Incremental Multi-Task Learning
without Task Interference [75.95287293847697]
Two common challenges in developing multi-task models are often overlooked in literature.
First, enabling the model to be inherently incremental, continuously incorporating information from new tasks without forgetting the previously learned ones (incremental learning)
Second, eliminating adverse interactions amongst tasks, which has been shown to significantly degrade the single-task performance in a multi-task setup (task interference)
arXiv Detail & Related papers (2020-07-24T14:44:46Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.