Continual Instruction Tuning for Large Multimodal Models
- URL: http://arxiv.org/abs/2311.16206v1
- Date: Mon, 27 Nov 2023 15:04:48 GMT
- Title: Continual Instruction Tuning for Large Multimodal Models
- Authors: Jinghan He, Haiyun Guo, Ming Tang, Jinqiao Wang
- Abstract summary: Multi-task joint instruction tuning can facilitate the model's continual learning ability and forgetting.
We propose task-similarity-informed regularization and model expansion methods for continual instruction tuning of LMMs.
- Score: 30.438442723421556
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Instruction tuning is now a widely adopted approach to aligning large
multimodal models (LMMs) to follow human intent. It unifies the data format of
vision-language tasks, enabling multi-task joint training. However,
vision-language tasks are constantly being created in practice. Instead of
always re-training LMMs when new tasks arrive, continual learning offers
flexibility for models to continually and efficiently exploit the evolving
data. This work aims to explore the following two questions: 1) Do LMMs still
suffer from catastrophic forgetting in continual instruction tuning? 2) Are the
existing three classes of continual learning methods still applicable to the
continual instruction tuning of LMMs? An extensive study is conducted to
address the above questions. First, we establish the first benchmark in this
setting and reveal that catastrophic forgetting is still observed when
continually instruction-tuning LMMs. However, the multi-task joint instruction
tuning can facilitate the model's continual learning ability and mitigate
forgetting. Second, we integrate and adapt classic continual learning methods
to our context, demonstrating the efficacy of data replay and model expansion
strategies across diverse scenarios. In contrast, regularization-based methods
only perform well on models that have been jointly instruction-tuned on
multiple tasks. Third, we delve into the correlation and forgetting dynamics
between vision-language task pairs and propose task-similarity-informed
regularization and model expansion methods for continual instruction tuning of
LMMs. Experimental results show that our approach consistently boosts the
model's performance.
Related papers
- Separable Mixture of Low-Rank Adaptation for Continual Visual Instruction Tuning [16.873306091966693]
Visual instruction tuning enables large language models (MLLMs) to handle a wide range of vision tasks by framing them as language-based instructions.
We identify a dual form of catastrophic forgetting in CVIT, where MLLMs forget previously learned visual understanding but also experience a decline in instruction following abilities.
We introduce the Separable Mixture of Low-Rank Adaptation (SMoLoRA) framework, which employs separable routing through two distinct modules.
This dual-routing design enables specialized adaptation in both domains, preventing forgetting while improving performance.
arXiv Detail & Related papers (2024-11-21T09:00:15Z) - LLMs Can Evolve Continually on Modality for X-Modal Reasoning [62.2874638875554]
Existing methods rely heavily on modal-specific pretraining and joint-modal tuning, leading to significant computational burdens when expanding to new modalities.
We propose PathWeave, a flexible and scalable framework with modal-Path sWitching and ExpAnsion abilities.
PathWeave performs comparably to state-of-the-art MLLMs while concurrently reducing parameter training burdens by 98.73%.
arXiv Detail & Related papers (2024-10-26T13:19:57Z) - ATLAS: Adapter-Based Multi-Modal Continual Learning with a Two-Stage Learning Strategy [12.150065431702055]
We propose a multi-modal continual learning scheme that consists of experience-based learning and novel knowledge expansion.
Our method is proficient for continual learning. It expands the distribution of representation upstream while also minimizing the negative impact of forgetting previous tasks.
arXiv Detail & Related papers (2024-10-14T13:29:42Z) - ModalPrompt:Dual-Modality Guided Prompt for Continual Learning of Large Multimodal Models [40.7613157799378]
Large Multimodal Models (LMMs) exhibit remarkable multi-tasking ability by learning mixed datasets jointly.
Existing methods leverage data replay or model expansion, both of which are not specially developed for LMMs.
We propose a novel dual-modality guided prompt learning framework (ModalPrompt) tailored for multimodal continual learning.
arXiv Detail & Related papers (2024-10-08T09:35:37Z) - M2Distill: Multi-Modal Distillation for Lifelong Imitation Learning [9.15567555909617]
M2Distill is a multi-modal distillation-based method for lifelong imitation learning.
We regulate the shifts in latent representations across different modalities from previous to current steps.
We ensure that the learned policy retains its ability to perform previously learned tasks while seamlessly integrating new skills.
arXiv Detail & Related papers (2024-09-30T01:43:06Z) - A Practitioner's Guide to Continual Multimodal Pretraining [83.63894495064855]
Multimodal foundation models serve numerous applications at the intersection of vision and language.
To keep models updated, research into continual pretraining mainly explores scenarios with either infrequent, indiscriminate updates on large-scale new data, or frequent, sample-level updates.
We introduce FoMo-in-Flux, a continual multimodal pretraining benchmark with realistic compute constraints and practical deployment requirements.
arXiv Detail & Related papers (2024-08-26T17:59:01Z) - MoExtend: Tuning New Experts for Modality and Task Extension [61.29100693866109]
MoExtend is an effective framework designed to streamline the modality adaptation and extension of Mixture-of-Experts (MoE) models.
MoExtend seamlessly integrates new experts into pre-trained MoE models, endowing them with novel knowledge without the need to tune pretrained models.
arXiv Detail & Related papers (2024-08-07T02:28:37Z) - Enhancing Continual Learning in Visual Question Answering with Modality-Aware Feature Distillation [48.071162716120334]
We study how the multimodal nature of the input affects the learning dynamics of a model.
Motivated by this observation, we propose a modality-aware feature distillation (MAFED) approach.
arXiv Detail & Related papers (2024-06-27T16:12:57Z) - Scalable Language Model with Generalized Continual Learning [58.700439919096155]
The Joint Adaptive Re-ization (JARe) is integrated with Dynamic Task-related Knowledge Retrieval (DTKR) to enable adaptive adjustment of language models based on specific downstream tasks.
Our method demonstrates state-of-the-art performance on diverse backbones and benchmarks, achieving effective continual learning in both full-set and few-shot scenarios with minimal forgetting.
arXiv Detail & Related papers (2024-04-11T04:22:15Z) - CoIN: A Benchmark of Continual Instruction tuNing for Multimodel Large Language Model [121.23360004498893]
We present a benchmark, namely Continual Instruction tuNing (CoIN), to assess existing MLLMs in the sequential instruction tuning paradigm.
Experiments on CoIN demonstrate that current powerful MLLMs still suffer catastrophic forgetting.
We introduce MoELoRA to MLLMs which is effective to retain the previous instruction alignment.
arXiv Detail & Related papers (2024-03-13T08:54:31Z) - Continual Learning for Large Language Models: A Survey [95.79977915131145]
Large language models (LLMs) are not amenable to frequent re-training, due to high training costs arising from their massive scale.
This paper surveys recent works on continual learning for LLMs.
arXiv Detail & Related papers (2024-02-02T12:34:09Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.