Related papers: Continual Instruction Tuning for Large Multimodal Models

Continual Instruction Tuning for Large Multimodal Models

URL: http://arxiv.org/abs/2311.16206v1
Date: Mon, 27 Nov 2023 15:04:48 GMT
Title: Continual Instruction Tuning for Large Multimodal Models
Authors: Jinghan He, Haiyun Guo, Ming Tang, Jinqiao Wang
Abstract summary: Multi-task joint instruction tuning can facilitate the model's continual learning ability and forgetting. We propose task-similarity-informed regularization and model expansion methods for continual instruction tuning of LMMs.
Score: 30.438442723421556
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Instruction tuning is now a widely adopted approach to aligning large multimodal models (LMMs) to follow human intent. It unifies the data format of vision-language tasks, enabling multi-task joint training. However, vision-language tasks are constantly being created in practice. Instead of always re-training LMMs when new tasks arrive, continual learning offers flexibility for models to continually and efficiently exploit the evolving data. This work aims to explore the following two questions: 1) Do LMMs still suffer from catastrophic forgetting in continual instruction tuning? 2) Are the existing three classes of continual learning methods still applicable to the continual instruction tuning of LMMs? An extensive study is conducted to address the above questions. First, we establish the first benchmark in this setting and reveal that catastrophic forgetting is still observed when continually instruction-tuning LMMs. However, the multi-task joint instruction tuning can facilitate the model's continual learning ability and mitigate forgetting. Second, we integrate and adapt classic continual learning methods to our context, demonstrating the efficacy of data replay and model expansion strategies across diverse scenarios. In contrast, regularization-based methods only perform well on models that have been jointly instruction-tuned on multiple tasks. Third, we delve into the correlation and forgetting dynamics between vision-language task pairs and propose task-similarity-informed regularization and model expansion methods for continual instruction tuning of LMMs. Experimental results show that our approach consistently boosts the model's performance.

Related papers

Harmony: A Unified Framework for Modality Incremental Learning [81.13765007314781]
This paper investigates the feasibility of developing a unified model capable of incremental learning across continuously evolving modal sequences. We propose a novel framework named Harmony, designed to achieve modal alignment and knowledge retention. Our approach introduces the adaptive compatible feature modulation and cumulative modal bridging.
arXiv Detail & Related papers (2025-04-17T06:35:01Z)
Modality-Inconsistent Continual Learning of Multimodal Large Language Models [37.15220266767881]
We introduce Modality-Inconsistent Continual Learning (MICL), a new continual learning scenario for Multimodal Large Language Models (MLLMs) Unlike existing vision-only or modality-incremental settings, MICL combines modality and task type shifts, both of which drive catastrophic forgetting. We propose MoInCL, which employs a Pseudo Targets Generation Module to mitigate forgetting caused by task type shifts in previously seen modalities.
arXiv Detail & Related papers (2024-12-17T16:13:56Z)
Separable Mixture of Low-Rank Adaptation for Continual Visual Instruction Tuning [16.873306091966693]
Visual instruction tuning enables large language models (MLLMs) to handle a wide range of vision tasks by framing them as language-based instructions. We identify a dual form of catastrophic forgetting in CVIT, where MLLMs forget previously learned visual understanding but also experience a decline in instruction following abilities. We introduce the Separable Mixture of Low-Rank Adaptation (SMoLoRA) framework, which employs separable routing through two distinct modules. This dual-routing design enables specialized adaptation in both domains, preventing forgetting while improving performance.
arXiv Detail & Related papers (2024-11-21T09:00:15Z)
LLMs Can Evolve Continually on Modality for X-Modal Reasoning [62.2874638875554]
Existing methods rely heavily on modal-specific pretraining and joint-modal tuning, leading to significant computational burdens when expanding to new modalities. We propose PathWeave, a flexible and scalable framework with modal-Path sWitching and ExpAnsion abilities. PathWeave performs comparably to state-of-the-art MLLMs while concurrently reducing parameter training burdens by 98.73%.
arXiv Detail & Related papers (2024-10-26T13:19:57Z)
ATLAS: Adapter-Based Multi-Modal Continual Learning with a Two-Stage Learning Strategy [12.150065431702055]
We propose a multi-modal continual learning scheme that consists of experience-based learning and novel knowledge expansion. Our method is proficient for continual learning. It expands the distribution of representation upstream while also minimizing the negative impact of forgetting previous tasks.
arXiv Detail & Related papers (2024-10-14T13:29:42Z)
ModalPrompt:Dual-Modality Guided Prompt for Continual Learning of Large Multimodal Models [40.7613157799378]
Large Multimodal Models (LMMs) exhibit remarkable multi-tasking ability by learning mixed datasets jointly. Existing methods leverage data replay or model expansion, both of which are not specially developed for LMMs. We propose a novel dual-modality guided prompt learning framework (ModalPrompt) tailored for multimodal continual learning.
arXiv Detail & Related papers (2024-10-08T09:35:37Z)
M2Distill: Multi-Modal Distillation for Lifelong Imitation Learning [9.15567555909617]
M2Distill is a multi-modal distillation-based method for lifelong imitation learning. We regulate the shifts in latent representations across different modalities from previous to current steps. We ensure that the learned policy retains its ability to perform previously learned tasks while seamlessly integrating new skills.
arXiv Detail & Related papers (2024-09-30T01:43:06Z)
A Practitioner's Guide to Continual Multimodal Pretraining [83.63894495064855]
Multimodal foundation models serve numerous applications at the intersection of vision and language. To keep models updated, research into continual pretraining mainly explores scenarios with either infrequent, indiscriminate updates on large-scale new data, or frequent, sample-level updates. We introduce FoMo-in-Flux, a continual multimodal pretraining benchmark with realistic compute constraints and practical deployment requirements.
arXiv Detail & Related papers (2024-08-26T17:59:01Z)
MoExtend: Tuning New Experts for Modality and Task Extension [61.29100693866109]
MoExtend is an effective framework designed to streamline the modality adaptation and extension of Mixture-of-Experts (MoE) models. MoExtend seamlessly integrates new experts into pre-trained MoE models, endowing them with novel knowledge without the need to tune pretrained models.
arXiv Detail & Related papers (2024-08-07T02:28:37Z)
Enhancing Continual Learning in Visual Question Answering with Modality-Aware Feature Distillation [48.071162716120334]
We study how the multimodal nature of the input affects the learning dynamics of a model. Motivated by this observation, we propose a modality-aware feature distillation (MAFED) approach.
arXiv Detail & Related papers (2024-06-27T16:12:57Z)
Scalable Language Model with Generalized Continual Learning [58.700439919096155]
The Joint Adaptive Re-ization (JARe) is integrated with Dynamic Task-related Knowledge Retrieval (DTKR) to enable adaptive adjustment of language models based on specific downstream tasks. Our method demonstrates state-of-the-art performance on diverse backbones and benchmarks, achieving effective continual learning in both full-set and few-shot scenarios with minimal forgetting.
arXiv Detail & Related papers (2024-04-11T04:22:15Z)
CoIN: A Benchmark of Continual Instruction tuNing for Multimodel Large Language Model [121.23360004498893]
We present a benchmark, namely Continual Instruction tuNing (CoIN), to assess existing MLLMs in the sequential instruction tuning paradigm. Experiments on CoIN demonstrate that current powerful MLLMs still suffer catastrophic forgetting. We introduce MoELoRA to MLLMs which is effective to retain the previous instruction alignment.
arXiv Detail & Related papers (2024-03-13T08:54:31Z)
Continual Learning for Large Language Models: A Survey [95.79977915131145]
Large language models (LLMs) are not amenable to frequent re-training, due to high training costs arising from their massive scale. This paper surveys recent works on continual learning for LLMs.
arXiv Detail & Related papers (2024-02-02T12:34:09Z)

This list is automatically generated from the titles and abstracts of the papers in this site.