Model Tailor: Mitigating Catastrophic Forgetting in Multi-modal Large
Language Models
- URL: http://arxiv.org/abs/2402.12048v1
- Date: Mon, 19 Feb 2024 11:02:05 GMT
- Title: Model Tailor: Mitigating Catastrophic Forgetting in Multi-modal Large
Language Models
- Authors: Didi Zhu, Zhongyi Sun, Zexi Li, Tao Shen, Ke Yan, Shouhong Ding, Kun
Kuang, Chao Wu
- Abstract summary: Catastrophic forgetting emerges as a critical challenge when fine-tuning multi-modal large language models (MLLMs)
This paper presents a comprehensive analysis of catastrophic forgetting in MLLMs and introduces a post-training adjustment method called Model Tailor.
- Score: 46.92994945808424
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Catastrophic forgetting emerges as a critical challenge when fine-tuning
multi-modal large language models (MLLMs), where improving performance on
unseen tasks often leads to a significant performance drop on the original
tasks. This paper presents a comprehensive analysis of catastrophic forgetting
in MLLMs and introduces a post-training adjustment method called Model Tailor.
Our method primarily preserves the pre-trained parameters while replacing a
small number ($\leq$ 10\%) of fine-tuned parameters, maintaining $\sim$ 99\%
effectiveness on original tasks versus pre-training, and achieving $\sim$ 97\%
on new tasks compared to standard fine-tuning. Specifically, we derive a sparse
mask to identify the "model patch", based on a fusion strategy that integrates
salience and sensitivity analysis. Subsequently, a compensation mechanism is
introduced to "decorate the patch", enhancing the model's performance on both
target and original tasks. Additionally, our method is adaptable to multi-task
scenarios. Through extensive experiments on InstructBLIP and LLaVA-1.5 in both
image captioning and visual question answering tasks, our approach demonstrates
significant task adaptability while preserving inherent pre-trained
capabilities.
Related papers
- LiNeS: Post-training Layer Scaling Prevents Forgetting and Enhances Model Merging [80.17238673443127]
LiNeS is a post-training editing technique designed to preserve pre-trained generalization while enhancing fine-tuned task performance.
LiNeS demonstrates significant improvements in both single-task and multi-task settings across various benchmarks in vision and natural language processing.
arXiv Detail & Related papers (2024-10-22T16:26:05Z) - Efficient Stitchable Task Adaptation [47.94819192325723]
We present a novel framework, Efficient Stitchable Task Adaptation (ESTA), to efficiently produce a palette of fine-tuned models.
Specifically, we first tailor parameter-efficient fine-tuning to share low-rank updates among the stitches.
We streamline a simple yet effective one-stage deployment pipeline, which estimates the important stitches to deploy.
arXiv Detail & Related papers (2023-11-29T04:31:35Z) - Task-Distributionally Robust Data-Free Meta-Learning [99.56612787882334]
Data-Free Meta-Learning (DFML) aims to efficiently learn new tasks by leveraging multiple pre-trained models without requiring their original training data.
For the first time, we reveal two major challenges hindering their practical deployments: Task-Distribution Shift ( TDS) and Task-Distribution Corruption (TDC)
arXiv Detail & Related papers (2023-11-23T15:46:54Z) - AdaMerging: Adaptive Model Merging for Multi-Task Learning [68.75885518081357]
This paper introduces an innovative technique called Adaptive Model Merging (AdaMerging)
It aims to autonomously learn the coefficients for model merging, either in a task-wise or layer-wise manner, without relying on the original training data.
Compared to the current state-of-the-art task arithmetic merging scheme, AdaMerging showcases a remarkable 11% improvement in performance.
arXiv Detail & Related papers (2023-10-04T04:26:33Z) - One-Shot Pruning for Fast-adapting Pre-trained Models on Devices [28.696989086706186]
Large-scale pre-trained models have been remarkably successful in resolving downstream tasks.
deploying these models on low-capability devices still requires an effective approach, such as model pruning.
We present a scalable one-shot pruning method that leverages pruned knowledge of similar tasks to extract a sub-network from the pre-trained model for a new task.
arXiv Detail & Related papers (2023-07-10T06:44:47Z) - Task Adaptive Parameter Sharing for Multi-Task Learning [114.80350786535952]
Adaptive Task Adapting Sharing (TAPS) is a method for tuning a base model to a new task by adaptively modifying a small, task-specific subset of layers.
Compared to other methods, TAPS retains high accuracy on downstream tasks while introducing few task-specific parameters.
We evaluate our method on a suite of fine-tuning tasks and architectures (ResNet, DenseNet, ViT) and show that it achieves state-of-the-art performance while being simple to implement.
arXiv Detail & Related papers (2022-03-30T23:16:07Z) - Model-Agnostic Multitask Fine-tuning for Few-shot Vision-Language
Transfer Learning [59.38343286807997]
We propose Model-Agnostic Multitask Fine-tuning (MAMF) for vision-language models on unseen tasks.
Compared with model-agnostic meta-learning (MAML), MAMF discards the bi-level optimization and uses only first-order gradients.
We show that MAMF consistently outperforms the classical fine-tuning method for few-shot transfer learning on five benchmark datasets.
arXiv Detail & Related papers (2022-03-09T17:26:53Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.