Continual Instruction Tuning for Large Multimodal Models
- URL: http://arxiv.org/abs/2311.16206v1
- Date: Mon, 27 Nov 2023 15:04:48 GMT
- Title: Continual Instruction Tuning for Large Multimodal Models
- Authors: Jinghan He, Haiyun Guo, Ming Tang, Jinqiao Wang
- Abstract summary: Multi-task joint instruction tuning can facilitate the model's continual learning ability and forgetting.
We propose task-similarity-informed regularization and model expansion methods for continual instruction tuning of LMMs.
- Score: 30.438442723421556
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Instruction tuning is now a widely adopted approach to aligning large
multimodal models (LMMs) to follow human intent. It unifies the data format of
vision-language tasks, enabling multi-task joint training. However,
vision-language tasks are constantly being created in practice. Instead of
always re-training LMMs when new tasks arrive, continual learning offers
flexibility for models to continually and efficiently exploit the evolving
data. This work aims to explore the following two questions: 1) Do LMMs still
suffer from catastrophic forgetting in continual instruction tuning? 2) Are the
existing three classes of continual learning methods still applicable to the
continual instruction tuning of LMMs? An extensive study is conducted to
address the above questions. First, we establish the first benchmark in this
setting and reveal that catastrophic forgetting is still observed when
continually instruction-tuning LMMs. However, the multi-task joint instruction
tuning can facilitate the model's continual learning ability and mitigate
forgetting. Second, we integrate and adapt classic continual learning methods
to our context, demonstrating the efficacy of data replay and model expansion
strategies across diverse scenarios. In contrast, regularization-based methods
only perform well on models that have been jointly instruction-tuned on
multiple tasks. Third, we delve into the correlation and forgetting dynamics
between vision-language task pairs and propose task-similarity-informed
regularization and model expansion methods for continual instruction tuning of
LMMs. Experimental results show that our approach consistently boosts the
model's performance.
Related papers
- Enhancing Continual Learning in Visual Question Answering with Modality-Aware Feature Distillation [48.071162716120334]
We study how the multimodal nature of the input affects the learning dynamics of a model.
Motivated by this observation, we propose a modality-aware feature distillation (MAFED) approach.
arXiv Detail & Related papers (2024-06-27T16:12:57Z) - Scalable Language Model with Generalized Continual Learning [58.700439919096155]
The Joint Adaptive Re-ization (JARe) is integrated with Dynamic Task-related Knowledge Retrieval (DTKR) to enable adaptive adjustment of language models based on specific downstream tasks.
Our method demonstrates state-of-the-art performance on diverse backbones and benchmarks, achieving effective continual learning in both full-set and few-shot scenarios with minimal forgetting.
arXiv Detail & Related papers (2024-04-11T04:22:15Z) - CoIN: A Benchmark of Continual Instruction tuNing for Multimodel Large
Language Model [128.46104068327435]
We present a benchmark, namely Continual Instruction tuNing (CoIN), to assess existing MLLMs in the sequential instruction tuning paradigm.
Experiments on CoIN demonstrate that current powerful MLLMs still suffer catastrophic forgetting.
We introduce MoELoRA to MLLMs which is effective to retain the previous instruction alignment.
arXiv Detail & Related papers (2024-03-13T08:54:31Z) - Continual Learning for Large Language Models: A Survey [95.79977915131145]
Large language models (LLMs) are not amenable to frequent re-training, due to high training costs arising from their massive scale.
This paper surveys recent works on continual learning for LLMs.
arXiv Detail & Related papers (2024-02-02T12:34:09Z) - Improving Discriminative Multi-Modal Learning with Large-Scale
Pre-Trained Models [51.5543321122664]
This paper investigates how to better leverage large-scale pre-trained uni-modal models to enhance discriminative multi-modal learning.
We introduce Multi-Modal Low-Rank Adaptation learning (MMLoRA)
arXiv Detail & Related papers (2023-10-08T15:01:54Z) - Learning Modality-Specific Representations with Self-Supervised
Multi-Task Learning for Multimodal Sentiment Analysis [11.368438990334397]
We develop a self-supervised learning strategy to acquire independent unimodal supervisions.
We conduct extensive experiments on three public multimodal baseline datasets.
Our method achieves comparable performance than human-annotated unimodal labels.
arXiv Detail & Related papers (2021-02-09T14:05:02Z) - Learning Invariant Representation for Continual Learning [5.979373021392084]
A key challenge in Continual learning is catastrophically forgetting previously learned tasks when the agent faces a new one.
We propose a new pseudo-rehearsal-based method, named learning Invariant Representation for Continual Learning (IRCL)
Disentangling the shared invariant representation helps to learn continually a sequence of tasks, while being more robust to forgetting and having better knowledge transfer.
arXiv Detail & Related papers (2021-01-15T15:12:51Z) - Online Fast Adaptation and Knowledge Accumulation: a New Approach to
Continual Learning [74.07455280246212]
Continual learning studies agents that learn from streams of tasks without forgetting previous ones while adapting to new ones.
We show that current continual learning, meta-learning, meta-continual learning, and continual-meta learning techniques fail in this new scenario.
We propose Continual-MAML, an online extension of the popular MAML algorithm as a strong baseline for this scenario.
arXiv Detail & Related papers (2020-03-12T15:47:16Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.