Related papers: Task-Attentive Transformer Architecture for Continual Learning of Vision-and-Language Tasks Using Knowledge Distillation

Task-Attentive Transformer Architecture for Continual Learning of Vision-and-Language Tasks Using Knowledge Distillation

URL: http://arxiv.org/abs/2303.14423v1
Date: Sat, 25 Mar 2023 10:16:53 GMT
Title: Task-Attentive Transformer Architecture for Continual Learning of Vision-and-Language Tasks Using Knowledge Distillation
Authors: Yuliang Cai, Jesse Thomason, Mohammad Rostami
Abstract summary: Continual learning (CL) can serve as a remedy through enabling knowledge-transfer across sequentially arriving tasks. We develop a transformer-based CL architecture for learning bimodal vision-and-language tasks. Our approach is scalable learning to a large number of tasks because it requires little memory and time overhead.
Score: 18.345183818638475
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: The size and the computational load of fine-tuning large-scale pre-trained neural network are becoming two major obstacles in adopting machine learning in many applications. Continual learning (CL) can serve as a remedy through enabling knowledge-transfer across sequentially arriving tasks which relaxes the need to fine-tune all network weights from scratch. However, existing CL algorithms primarily consider learning unimodal vision-only or language-only tasks. We develop a transformer-based CL architecture for learning bimodal vision-and-language tasks based on increasing the number of the learnable parameters dynamically and using knowledge distillation. The new additional parameters are used to specialize the network for each task. Our approach enables sharing information between the tasks while addressing the challenge of catastrophic forgetting. Our approach is scalable learning to a large number of tasks because it requires little memory and time overhead. Our model reaches state-of-the-art performance on challenging vision-and-language tasks.

Related papers

Progressive Homeostatic and Plastic Prompt Tuning for Audio-Visual Multi-Task Incremental Learning [23.22385310060951]
We introduce a three-stage Progressive Homeostatic and Plastic audio-visual prompt (PHP) method.<n>In the shallow phase, we design the task-shared modality aggregating adapter to foster cross-task and cross-modal audio-visual representation learning.<n>In the middle phase, we propose the task-specific modality-shared dynamic generating adapter, which constructs prompts that are tailored to individual tasks.<n>In the deep phase, we introduce the task-specific modality-independent prompts to further refine the understand ability.
arXiv Detail & Related papers (2025-07-29T08:42:36Z)
Is Visual in-Context Learning for Compositional Medical Tasks within Reach? [68.56630652862293]
In this paper, we explore the potential of visual in-context learning to enable a single model to handle multiple tasks.<n>We introduce a novel method for training in-context learners using a synthetic compositional task generation engine.
arXiv Detail & Related papers (2025-07-01T15:32:23Z)
Analytic Subspace Routing: How Recursive Least Squares Works in Continual Learning of Large Language Model [6.42114585934114]
Large Language Models (LLMs) possess capabilities that can process diverse language-related tasks. Continual Learning in Large Language Models (LLMs) arises which aims to continually adapt the LLMs to new tasks. This paper proposes Analytic Subspace Routing(ASR) to address these challenges.
arXiv Detail & Related papers (2025-03-17T13:40:46Z)
TWIST & SCOUT: Grounding Multimodal LLM-Experts by Forget-Free Tuning [54.033346088090674]
We introduce TWIST & SCOUT, a framework that equips pre-trained MLLMs with visual grounding ability. To fine-tune the model effectively, we generate a high-quality synthetic dataset we call SCOUT. This dataset provides rich supervision signals, describing a step-by-step multimodal reasoning process.
arXiv Detail & Related papers (2024-10-14T13:35:47Z)
Dynamic Transformer Architecture for Continual Learning of Multimodal Tasks [27.59758964060561]
Transformer neural networks are increasingly replacing prior architectures in a wide range of applications in different data modalities. Continual learning (CL) emerges as a solution by facilitating the transfer of knowledge across tasks that arrive sequentially for an autonomously learning agent. We propose a transformer-based CL framework focusing on learning tasks that involve both vision and language.
arXiv Detail & Related papers (2024-01-27T03:03:30Z)
Negotiated Representations to Prevent Forgetting in Machine Learning Applications [0.0]
Catastrophic forgetting is a significant challenge in the field of machine learning. We propose a novel method for preventing catastrophic forgetting in machine learning applications.
arXiv Detail & Related papers (2023-11-30T22:43:50Z)
Few-shot Multimodal Multitask Multilingual Learning [0.0]
We propose few-shot learning for a multimodal multitask multilingual (FM3) setting by adapting pre-trained vision and language models. FM3 learns the most prominent tasks in the vision and language domains along with their intersections.
arXiv Detail & Related papers (2023-02-19T03:48:46Z)
Fast Inference and Transfer of Compositional Task Structures for Few-shot Task Generalization [101.72755769194677]
We formulate it as a few-shot reinforcement learning problem where a task is characterized by a subtask graph. Our multi-task subtask graph inferencer (MTSGI) first infers the common high-level task structure in terms of the subtask graph from the training tasks. Our experiment results on 2D grid-world and complex web navigation domains show that the proposed method can learn and leverage the common underlying structure of the tasks for faster adaptation to the unseen tasks.
arXiv Detail & Related papers (2022-05-25T10:44:25Z)
Combining Modular Skills in Multitask Learning [149.8001096811708]
A modular design encourages neural models to disentangle and recombine different facets of knowledge to generalise more systematically to new tasks. In this work, we assume each task is associated with a subset of latent discrete skills from a (potentially small) inventory. We find that the modular design of a network significantly increases sample efficiency in reinforcement learning and few-shot generalisation in supervised learning.
arXiv Detail & Related papers (2022-02-28T16:07:19Z)
Unified Multimodal Pre-training and Prompt-based Tuning for Vision-Language Understanding and Generation [86.26522210882699]
We propose Unified multimodal pre-training for both Vision-Language understanding and generation. The proposed UniVL is capable of handling both understanding tasks and generative tasks. Our experiments show that there is a trade-off between understanding tasks and generation tasks while using the same model.
arXiv Detail & Related papers (2021-12-10T14:59:06Z)
Multi-Task Learning with Sequence-Conditioned Transporter Networks [67.57293592529517]
We aim to solve multi-task learning through the lens of sequence-conditioning and weighted sampling. We propose a new suite of benchmark aimed at compositional tasks, MultiRavens, which allows defining custom task combinations. Second, we propose a vision-based end-to-end system architecture, Sequence-Conditioned Transporter Networks, which augments Goal-Conditioned Transporter Networks with sequence-conditioning and weighted sampling.
arXiv Detail & Related papers (2021-09-15T21:19:11Z)
MT-Opt: Continuous Multi-Task Robotic Reinforcement Learning at Scale [103.7609761511652]
We show how a large-scale collective robotic learning system can acquire a repertoire of behaviors simultaneously. New tasks can be continuously instantiated from previously learned tasks. We train and evaluate our system on a set of 12 real-world tasks with data collected from 7 robots.
arXiv Detail & Related papers (2021-04-16T16:38:02Z)
Reparameterizing Convolutions for Incremental Multi-Task Learning without Task Interference [75.95287293847697]
Two common challenges in developing multi-task models are often overlooked in literature. First, enabling the model to be inherently incremental, continuously incorporating information from new tasks without forgetting the previously learned ones (incremental learning) Second, eliminating adverse interactions amongst tasks, which has been shown to significantly degrade the single-task performance in a multi-task setup (task interference)
arXiv Detail & Related papers (2020-07-24T14:44:46Z)
NeurAll: Towards a Unified Visual Perception Model for Automated Driving [8.49826472556323]
We propose a joint multi-task network design for learning several tasks simultaneously. The main bottleneck in automated driving systems is the limited processing power available on deployment hardware.
arXiv Detail & Related papers (2019-02-10T12:45:49Z)

This list is automatically generated from the titles and abstracts of the papers in this site.