Related papers: Continual Learning via Learning a Continual Memory in Vision Transformer

Continual Learning via Learning a Continual Memory in Vision Transformer

URL: http://arxiv.org/abs/2303.08250v4
Date: Tue, 08 Oct 2024 16:29:41 GMT
Title: Continual Learning via Learning a Continual Memory in Vision Transformer
Authors: Chinmay Savadikar, Michelle Dai, Tianfu Wu,
Abstract summary: We study task-incremental continual learning (TCL) using Vision Transformers (ViTs) Our goal is to improve the overall streaming-task performance without catastrophic forgetting by learning task synergies. We present a Hierarchical task-synergy Exploration-Exploitation (HEE) sampling based neural architecture search (NAS) method for effectively learning task synergies.
Score: 7.116223171323158
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: This paper studies task-incremental continual learning (TCL) using Vision Transformers (ViTs). Our goal is to improve the overall streaming-task performance without catastrophic forgetting by learning task synergies (e.g., a new task learns to automatically reuse/adapt modules from previous similar tasks, or to introduce new modules when needed, or to skip some modules when it appears to be an easier task). One grand challenge is how to tame ViTs at streaming diverse tasks in terms of balancing their plasticity and stability in a task-aware way while overcoming the catastrophic forgetting. To address the challenge, we propose a simple yet effective approach that identifies a lightweight yet expressive ``sweet spot'' in the ViT block as the task-synergy memory in TCL. We present a Hierarchical task-synergy Exploration-Exploitation (HEE) sampling based neural architecture search (NAS) method for effectively learning task synergies by structurally updating the identified memory component with respect to four basic operations (reuse, adapt, new and skip) at streaming tasks. The proposed method is thus dubbed as CHEEM (Continual Hierarchical-Exploration-Exploitation Memory). In experiments, we test the proposed CHEEM on the challenging Visual Domain Decathlon (VDD) benchmark and the 5-Dataset benchmark. It obtains consistently better performance than the prior art with sensible CHEEM learned continually.

Related papers

CLARE: Continual Learning for Vision-Language-Action Models via Autonomous Adapter Routing and Expansion [9.808005698482914]
CLARE is a framework for exemplar-free continual learning with vision-language-action models.<n>We show that CLARE achieves high performance on new tasks without catastrophic forgetting of earlier tasks.
arXiv Detail & Related papers (2026-01-14T14:23:42Z)
Few-Shot Vision-Language Action-Incremental Policy Learning [55.07841353049953]
Transformer-based robotic manipulation methods utilize multi-view spatial representations and language instructions to learn robot motion trajectories. Existing methods lack the capability for continuous learning on new tasks with only a few demonstrations. We develop a Task-prOmpt graPh evolutIon poliCy (TOPIC) to address these issues.
arXiv Detail & Related papers (2025-04-22T01:30:47Z)
LLaVA-CMoE: Towards Continual Mixture of Experts for Large Vision-Language Models [21.888139819188105]
LLaVA-CMoE is a continual learning framework for large language models.<n> Probe-Guided Knowledge Extension mechanism determines when and where new experts should be added.<n>Probabilistic Task Locator assigns each task a dedicated, lightweight router.
arXiv Detail & Related papers (2025-03-27T07:36:11Z)
Transforming Vision Transformer: Towards Efficient Multi-Task Asynchronous Learning [59.001091197106085]
Multi-Task Learning (MTL) for Vision Transformer aims at enhancing the model capability by tackling multiple tasks simultaneously. Most recent works have predominantly focused on designing Mixture-of-Experts (MoE) structures and in tegrating Low-Rank Adaptation (LoRA) to efficiently perform multi-task learning. We propose a novel approach dubbed Efficient Multi-Task Learning (EMTAL) by transforming a pre-trained Vision Transformer into an efficient multi-task learner.
arXiv Detail & Related papers (2025-01-12T17:41:23Z)
PECTP: Parameter-Efficient Cross-Task Prompts for Incremental Vision Transformer [76.39111896665585]
Incremental Learning (IL) aims to learn deep models on sequential tasks continually. Recent vast pre-trained models (PTMs) have achieved outstanding performance by prompt technique in practical IL without the old samples.
arXiv Detail & Related papers (2024-07-04T10:37:58Z)
Dynamic Transformer Architecture for Continual Learning of Multimodal Tasks [27.59758964060561]
Transformer neural networks are increasingly replacing prior architectures in a wide range of applications in different data modalities. Continual learning (CL) emerges as a solution by facilitating the transfer of knowledge across tasks that arrive sequentially for an autonomously learning agent. We propose a transformer-based CL framework focusing on learning tasks that involve both vision and language.
arXiv Detail & Related papers (2024-01-27T03:03:30Z)
LIBERO: Benchmarking Knowledge Transfer for Lifelong Robot Learning [64.55001982176226]
LIBERO is a novel benchmark of lifelong learning for robot manipulation. We focus on how to efficiently transfer declarative knowledge, procedural knowledge, or the mixture of both. We develop an extendible procedural generation pipeline that can in principle generate infinitely many tasks.
arXiv Detail & Related papers (2023-06-05T23:32:26Z)
Task-Attentive Transformer Architecture for Continual Learning of Vision-and-Language Tasks Using Knowledge Distillation [18.345183818638475]
Continual learning (CL) can serve as a remedy through enabling knowledge-transfer across sequentially arriving tasks. We develop a transformer-based CL architecture for learning bimodal vision-and-language tasks. Our approach is scalable learning to a large number of tasks because it requires little memory and time overhead.
arXiv Detail & Related papers (2023-03-25T10:16:53Z)
Task-Adaptive Saliency Guidance for Exemplar-free Class Incremental Learning [60.501201259732625]
We introduce task-adaptive saliency for EFCIL and propose a new framework, which we call Task-Adaptive Saliency Supervision (TASS) Our experiments demonstrate that our method can better preserve saliency maps across tasks and achieve state-of-the-art results on the CIFAR-100, Tiny-ImageNet, and ImageNet-Subset EFCIL benchmarks.
arXiv Detail & Related papers (2022-12-16T02:43:52Z)
A Unified Meta-Learning Framework for Dynamic Transfer Learning [42.34180707803632]
We propose a generic meta-learning framework L2E for modeling the knowledge transferability on dynamic tasks. L2E enjoys the following properties: (1) effective knowledge transferability across dynamic tasks; (2) fast adaptation to the new target task; (3) mitigation of catastrophic forgetting on historical target tasks; and (4) flexibility in incorporating any existing static transfer learning algorithms.
arXiv Detail & Related papers (2022-07-05T02:56:38Z)
Rethinking Task-Incremental Learning Baselines [5.771817160915079]
We present a simple yet effective adjustment network (SAN) for task incremental learning that achieves near state-of-the-art performance. We investigate this approach on both 3D point cloud object (ModelNet40) and 2D image (CIFAR10, CIFAR100, MiniImageNet, MNIST, PermutedMNIST, notMNIST, SVHN, and FashionMNIST) recognition tasks.
arXiv Detail & Related papers (2022-05-23T14:52:38Z)
Continual Object Detection via Prototypical Task Correlation Guided Gating Mechanism [120.1998866178014]
We present a flexible framework for continual object detection via pRotOtypical taSk corrElaTion guided gaTingAnism (ROSETTA) Concretely, a unified framework is shared by all tasks while task-aware gates are introduced to automatically select sub-models for specific tasks. Experiments on COCO-VOC, KITTI-Kitchen, class-incremental detection on VOC and sequential learning of four tasks show that ROSETTA yields state-of-the-art performance.
arXiv Detail & Related papers (2022-05-06T07:31:28Z)
Benchmarking Detection Transfer Learning with Vision Transformers [60.97703494764904]
complexity of object detection methods can make benchmarking non-trivial when new architectures, such as Vision Transformer (ViT) models, arrive. We present training techniques that overcome these challenges, enabling the use of standard ViT models as the backbone of Mask R-CNN. Our results show that recent masking-based unsupervised learning methods may, for the first time, provide convincing transfer learning improvements on COCO.
arXiv Detail & Related papers (2021-11-22T18:59:15Z)
Multi-Task Learning with Sequence-Conditioned Transporter Networks [67.57293592529517]
We aim to solve multi-task learning through the lens of sequence-conditioning and weighted sampling. We propose a new suite of benchmark aimed at compositional tasks, MultiRavens, which allows defining custom task combinations. Second, we propose a vision-based end-to-end system architecture, Sequence-Conditioned Transporter Networks, which augments Goal-Conditioned Transporter Networks with sequence-conditioning and weighted sampling.
arXiv Detail & Related papers (2021-09-15T21:19:11Z)
Efficient Continual Learning with Modular Networks and Task-Driven Priors [31.03712334701338]
Existing literature in Continual Learning (CL) has focused on overcoming catastrophic forgetting. We introduce a new modular architecture, whose modules represent atomic skills that can be composed to perform a certain task. Our learning algorithm leverages a task-driven prior over the exponential search space of all possible ways to combine modules, enabling efficient learning on long streams of tasks.
arXiv Detail & Related papers (2020-12-23T12:42:16Z)

This list is automatically generated from the titles and abstracts of the papers in this site.