CoIN: A Benchmark of Continual Instruction tuNing for Multimodel Large Language Model
- URL: http://arxiv.org/abs/2403.08350v2
- Date: Wed, 23 Oct 2024 02:16:37 GMT
- Title: CoIN: A Benchmark of Continual Instruction tuNing for Multimodel Large Language Model
- Authors: Cheng Chen, Junchen Zhu, Xu Luo, Hengtao Shen, Lianli Gao, Jingkuan Song,
- Abstract summary: We present a benchmark, namely Continual Instruction tuNing (CoIN), to assess existing MLLMs in the sequential instruction tuning paradigm.
Experiments on CoIN demonstrate that current powerful MLLMs still suffer catastrophic forgetting.
We introduce MoELoRA to MLLMs which is effective to retain the previous instruction alignment.
- Score: 121.23360004498893
- License:
- Abstract: Instruction tuning represents a prevalent strategy employed by Multimodal Large Language Models (MLLMs) to align with human instructions and adapt to new tasks. Nevertheless, MLLMs encounter the challenge of adapting to users' evolving knowledge and demands. Therefore, how to retain existing skills while acquiring new knowledge needs to be investigated. In this paper, we present a comprehensive benchmark, namely Continual Instruction tuNing (CoIN), to assess existing MLLMs in the sequential instruction tuning paradigm. CoIN comprises 10 commonly used datasets spanning 8 task categories, ensuring a diverse range of instructions and tasks. Besides, the trained model is evaluated from two aspects: Instruction Following and General Knowledge, which assess the alignment with human intention and knowledge preserved for reasoning, respectively. Experiments on CoIN demonstrate that current powerful MLLMs still suffer catastrophic forgetting, and the failure in intention alignment assumes the main responsibility, instead of the knowledge forgetting. To this end, we introduce MoELoRA to MLLMs which is effective to retain the previous instruction alignment. Experimental results consistently illustrate the forgetting decreased from this method on CoIN.
Related papers
- The Inherent Limits of Pretrained LLMs: The Unexpected Convergence of Instruction Tuning and In-Context Learning Capabilities [51.594836904623534]
We investigate whether instruction-tuned models possess fundamentally different capabilities from base models that are prompted using in-context examples.
We show that the performance of instruction-tuned models is significantly correlated with the in-context performance of their base counterparts.
Specifically, we extend this understanding to instruction-tuned models, suggesting that their pretraining data similarly sets a limiting boundary on the tasks they can solve.
arXiv Detail & Related papers (2025-01-15T10:57:55Z) - Modality-Inconsistent Continual Learning of Multimodal Large Language Models [37.15220266767881]
We introduce Modality-Inconsistent Continual Learning (MICL), a new continual learning scenario for Multimodal Large Language Models (MLLMs)
Unlike existing vision-only or modality-incremental settings, MICL combines modality and task type shifts, both of which drive catastrophic forgetting.
We propose MoInCL, which employs a Pseudo Targets Generation Module to mitigate forgetting caused by task type shifts in previously seen modalities.
arXiv Detail & Related papers (2024-12-17T16:13:56Z) - Separable Mixture of Low-Rank Adaptation for Continual Visual Instruction Tuning [16.873306091966693]
Visual instruction tuning enables large language models (MLLMs) to handle a wide range of vision tasks by framing them as language-based instructions.
We identify a dual form of catastrophic forgetting in CVIT, where MLLMs forget previously learned visual understanding but also experience a decline in instruction following abilities.
We introduce the Separable Mixture of Low-Rank Adaptation (SMoLoRA) framework, which employs separable routing through two distinct modules.
This dual-routing design enables specialized adaptation in both domains, preventing forgetting while improving performance.
arXiv Detail & Related papers (2024-11-21T09:00:15Z) - MIA-Bench: Towards Better Instruction Following Evaluation of Multimodal LLMs [47.94710556156627]
MIA-Bench is a benchmark designed to evaluate multimodal large language models (MLLMs) on their ability to strictly adhere to complex instructions.
Our benchmark comprises a diverse set of 400 image-prompt pairs, each crafted to challenge the models' compliance with layered instructions.
arXiv Detail & Related papers (2024-07-01T17:53:35Z) - The SIFo Benchmark: Investigating the Sequential Instruction Following Ability of Large Language Models [48.455388608863785]
We introduce a benchmark designed to evaluate models' abilities to follow multiple instructions through sequential instruction following tasks.
Our benchmark evaluates instruction following using four tasks (text modification, question answering, mathematics, and security rules)
More recent and larger models significantly outperform their older and smaller counterparts on the SIFo tasks, validating the benchmark's effectiveness.
arXiv Detail & Related papers (2024-06-28T15:34:26Z) - Continual Instruction Tuning for Large Multimodal Models [30.438442723421556]
Multi-task joint instruction tuning can facilitate the model's continual learning ability and forgetting.
We propose task-similarity-informed regularization and model expansion methods for continual instruction tuning of LMMs.
arXiv Detail & Related papers (2023-11-27T15:04:48Z) - TRACE: A Comprehensive Benchmark for Continual Learning in Large
Language Models [52.734140807634624]
Aligned large language models (LLMs) demonstrate exceptional capabilities in task-solving, following instructions, and ensuring safety.
Existing continual learning benchmarks lack sufficient challenge for leading aligned LLMs.
We introduce TRACE, a novel benchmark designed to evaluate continual learning in LLMs.
arXiv Detail & Related papers (2023-10-10T16:38:49Z) - CINS: Comprehensive Instruction for Few-shot Learning in Task-oriented
Dialog Systems [56.302581679816775]
This paper proposes Comprehensive Instruction (CINS) that exploits PLMs with task-specific instructions.
We design a schema (definition, constraint, prompt) of instructions and their customized realizations for three important downstream tasks in ToD.
Experiments are conducted on these ToD tasks in realistic few-shot learning scenarios with small validation data.
arXiv Detail & Related papers (2021-09-10T03:23:06Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.