Related papers: LLaCA: Multimodal Large Language Continual Assistant

LLaCA: Multimodal Large Language Continual Assistant

URL: http://arxiv.org/abs/2410.10868v1
Date: Tue, 08 Oct 2024 11:24:59 GMT
Title: LLaCA: Multimodal Large Language Continual Assistant
Authors: Jingyang Qiao, Zhizhong Zhang, Xin Tan, Yanyun Qu, Shouhong Ding, Yuan Xie,
Abstract summary: Multimodal Continual Instruction Tuning (MCIT) is adopted to continually instruct MLLMs to follow human intent in sequential datasets. Existing gradient update would heavily destroy the tuning performance on previous datasets. We propose a method called Multimodal Large Language Continual Assistant (LLaCA) to address the challenge.
Score: 59.585544987096974
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Instruction tuning guides the Multimodal Large Language Models (MLLMs) in aligning different modalities by designing text instructions, which seems to be an essential technique to enhance the capabilities and controllability of foundation models. In this framework, Multimodal Continual Instruction Tuning (MCIT) is adopted to continually instruct MLLMs to follow human intent in sequential datasets. We observe existing gradient update would heavily destroy the tuning performance on previous datasets and the zero-shot ability during continual instruction tuning. Exponential Moving Average (EMA) update policy owns the ability to trace previous parameters, which can aid in decreasing forgetting. However, its stable balance weight cannot deal with the ever-changing datasets, leading to the out-of-balance between plasticity and stability of MLLMs. In this paper, we propose a method called Multimodal Large Language Continual Assistant (LLaCA) to address the challenge. Starting from the trade-off prerequisite and EMA update, we propose the plasticity and stability ideal condition. Based on Taylor expansion in the loss function, we find the optimal balance weight is basically according to the gradient information and previous parameters. We automatically determine the balance weight and significantly improve the performance. Through comprehensive experiments on LLaVA-1.5 in a continual visual-question-answering benchmark, compared with baseline, our approach not only highly improves anti-forgetting ability (with reducing forgetting from 22.67 to 2.68), but also significantly promotes continual tuning performance (with increasing average accuracy from 41.31 to 61.89). Our code will be published soon.

Related papers

Beyond Freezing: Sparse Tuning Enhances Plasticity in Continual Learning with Pre-Trained Models [10.904981532789824]
Continual Learning with Pre-trained Models holds great promise for efficient adaptation across sequential tasks.<n>Existing approaches freeze PTMs and rely on auxiliary modules like prompts or adapters.<n>We propose Mutual Information-guided Sparse Tuning (MIST), a plug-and-play method that selectively updates a small subset of PTM parameters.
arXiv Detail & Related papers (2025-05-26T13:09:25Z)
Sculpting Subspaces: Constrained Full Fine-Tuning in LLMs for Continual Learning [19.27175827358111]
Continual learning in large language models (LLMs) is prone to catastrophic forgetting, where adapting to new tasks significantly degrades performance on previously learned ones. We propose a novel continual full fine-tuning approach leveraging adaptive singular value decomposition (SVD) We evaluate our approach extensively on standard continual learning benchmarks using both encoder-decoder (T5-Large) and decoder-only (LLaMA-2 7B) models.
arXiv Detail & Related papers (2025-04-09T17:59:42Z)
Star-Agents: Automatic Data Optimization with LLM Agents for Instruction Tuning [71.2981957820888]
We propose a novel Star-Agents framework, which automates the enhancement of data quality across datasets. The framework initially generates diverse instruction data with multiple LLM agents through a bespoke sampling method. The generated data undergo a rigorous evaluation using a dual-model method that assesses both difficulty and quality.
arXiv Detail & Related papers (2024-11-21T02:30:53Z)
SAFE: Slow and Fast Parameter-Efficient Tuning for Continual Learning with Pre-Trained Models [26.484208658326857]
Continual learning aims to incrementally acquire new concepts in data streams while resisting forgetting previous knowledge. With the rise of powerful pre-trained models (PTMs), there is a growing interest in training incremental learning systems.
arXiv Detail & Related papers (2024-11-04T15:34:30Z)
LLMs Can Evolve Continually on Modality for X-Modal Reasoning [62.2874638875554]
Existing methods rely heavily on modal-specific pretraining and joint-modal tuning, leading to significant computational burdens when expanding to new modalities. We propose PathWeave, a flexible and scalable framework with modal-Path sWitching and ExpAnsion abilities. PathWeave performs comparably to state-of-the-art MLLMs while concurrently reducing parameter training burdens by 98.73%.
arXiv Detail & Related papers (2024-10-26T13:19:57Z)
PACE: Marrying generalization in PArameter-efficient fine-tuning with Consistency rEgularization [35.922096876707975]
PACE is a generalization of PArameter-efficient fine-tuning with Consistency rEgularization. It implicitly regularizes gradients for enhanced generalization, but also implicitly aligns the fine-tuned and pre-trained models to retain knowledge. It also improves LoRA in text classification (GLUE) and mathematical reasoning.
arXiv Detail & Related papers (2024-09-25T17:56:00Z)
Get Confused Cautiously: Textual Sequence Memorization Erasure with Selective Entropy Maximization [17.20276556057748]
Large Language Models (LLMs) have been found to memorize and recite some of the textual sequences from their training set verbatim. This Textual Sequence Memorization (TSM) phenomenon leads to a high demand to regulate LLM output to prevent it from generating certain memorized text. Existing methods for TSM erasure fail to forget massive memorized samples without substantially jeopardizing the model utility.
arXiv Detail & Related papers (2024-08-09T10:26:11Z)
Enhancing Robustness of Vision-Language Models through Orthogonality Learning and Self-Regularization [77.62516752323207]
We introduce an orthogonal fine-tuning method for efficiently fine-tuning pretrained weights and enabling enhanced robustness and generalization. A self-regularization strategy is further exploited to maintain the stability in terms of zero-shot generalization of VLMs, dubbed OrthSR. For the first time, we revisit the CLIP and CoOp with our method to effectively improve the model on few-shot image classficiation scenario.
arXiv Detail & Related papers (2024-07-11T10:35:53Z)
Uncertainty Aware Learning for Language Model Alignment [97.36361196793929]
We propose uncertainty-aware learning (UAL) to improve the model alignment of different task scenarios. We implement UAL in a simple fashion -- adaptively setting the label smoothing value of training according to the uncertainty of individual samples. Experiments on widely used benchmarks demonstrate that our UAL significantly and consistently outperforms standard supervised fine-tuning.
arXiv Detail & Related papers (2024-06-07T11:37:45Z)
Improving Data-aware and Parameter-aware Robustness for Continual Learning [3.480626767752489]
This paper analyzes that this insufficiency arises from the ineffective handling of outliers. We propose a Robust Continual Learning (RCL) method to address this issue. The proposed method effectively maintains robustness and achieves new state-of-the-art (SOTA) results.
arXiv Detail & Related papers (2024-05-27T11:21:26Z)
Dynamic Adapter Meets Prompt Tuning: Parameter-Efficient Transfer Learning for Point Cloud Analysis [51.14136878142034]
Point cloud analysis has achieved outstanding performance by transferring point cloud pre-trained models. Existing methods for model adaptation usually update all model parameters, which is inefficient as it relies on high computational costs. In this paper, we aim to study parameter-efficient transfer learning for point cloud analysis with an ideal trade-off between task performance and parameter efficiency.
arXiv Detail & Related papers (2024-03-03T08:25:04Z)
Model Tailor: Mitigating Catastrophic Forgetting in Multi-modal Large Language Models [46.92994945808424]
Catastrophic forgetting emerges as a critical challenge when fine-tuning multi-modal large language models (MLLMs) This paper presents a comprehensive analysis of catastrophic forgetting in MLLMs and introduces a post-training adjustment method called Model Tailor.
arXiv Detail & Related papers (2024-02-19T11:02:05Z)
Weighted Ensemble Models Are Strong Continual Learners [20.62749699589017]
We study the problem of continual learning (CL) where the goal is to learn a model on a sequence of tasks. CL is essentially a balancing act between being able to learn on the new task and maintaining the performance on the previously learned concepts. Intending to address the stability-plasticity trade-off, we propose to perform weight-ensembling of the model parameters of the previous and current tasks.
arXiv Detail & Related papers (2023-12-14T14:26:57Z)
Dynamic Corrective Self-Distillation for Better Fine-Tuning of Pretrained Models [0.9217021281095907]
We tackle the issue of aggressive fine-tuning encountered during the process of transfer learning of pre-trained language models (PLMs) Inspired by the adaptive boosting method in traditional machine learning, we present an effective dynamic corrective self-distillation approach to improve the fine-tuning of the PLMs. Our technique involves performing a self-distillation mechanism where, at each iteration, the student model actively adapts and corrects itself by dynamically adjusting the weights assigned to individual data points.
arXiv Detail & Related papers (2023-12-12T07:26:36Z)
Recyclable Tuning for Continual Pre-training [98.51583779792031]
Continual pre-training is the paradigm where pre-trained language models (PLMs) continually acquire fresh knowledge from growing data and gradually get upgraded. We contend that proper algorithms for recycling outdated adapted weights should be developed. We show that both methods can be combined to achieve better performance.
arXiv Detail & Related papers (2023-05-15T15:05:44Z)
Online Hyperparameter Optimization for Class-Incremental Learning [99.70569355681174]
Class-incremental learning (CIL) aims to train a classification model while the number of classes increases phase-by-phase. An inherent challenge of CIL is the stability-plasticity tradeoff, i.e., CIL models should keep stable to retain old knowledge and keep plastic to absorb new knowledge. We propose an online learning method that can adaptively optimize the tradeoff without knowing the setting as a priori.
arXiv Detail & Related papers (2023-01-11T17:58:51Z)
Hyperparameter-free Continuous Learning for Domain Classification in Natural Language Understanding [60.226644697970116]
Domain classification is the fundamental task in natural language understanding (NLU) Most existing continual learning approaches suffer from low accuracy and performance fluctuation. We propose a hyper parameter-free continual learning model for text data that can stably produce high performance under various environments.
arXiv Detail & Related papers (2022-01-05T02:46:16Z)
DSEE: Dually Sparsity-embedded Efficient Tuning of Pre-trained Language Models [152.29364079385635]
As pre-trained models grow bigger, the fine-tuning process can be time-consuming and computationally expensive. We propose a framework for resource- and parameter-efficient fine-tuning by leveraging the sparsity prior in both weight updates and the final model weights. Our proposed framework, dubbed Dually Sparsity-Embedded Efficient Tuning (DSEE), aims to achieve two key objectives: (i) parameter efficient fine-tuning and (ii) resource-efficient inference.
arXiv Detail & Related papers (2021-10-30T03:29:47Z)
Training Generative Adversarial Networks by Solving Ordinary Differential Equations [54.23691425062034]
We study the continuous-time dynamics induced by GAN training. From this perspective, we hypothesise that instabilities in training GANs arise from the integration error. We experimentally verify that well-known ODE solvers (such as Runge-Kutta) can stabilise training.
arXiv Detail & Related papers (2020-10-28T15:23:49Z)
Extrapolation for Large-batch Training in Deep Learning [72.61259487233214]
We show that a host of variations can be covered in a unified framework that we propose. We prove the convergence of this novel scheme and rigorously evaluate its empirical performance on ResNet, LSTM, and Transformer.
arXiv Detail & Related papers (2020-06-10T08:22:41Z)
On the Stability of Fine-tuning BERT: Misconceptions, Explanations, and Strong Baselines [31.807628937487927]
Fine-tuning pre-trained language models such as BERT has become a common practice dominating leaderboards across various NLP benchmarks. Previous literature identified two potential reasons for the observed instability: catastrophic forgetting and small size of the fine-tuning datasets. We show that both hypotheses fail to explain the fine-tuning instability.
arXiv Detail & Related papers (2020-06-08T19:06:24Z)
Tracking Performance of Online Stochastic Learners [57.14673504239551]
Online algorithms are popular in large-scale learning settings due to their ability to compute updates on the fly, without the need to store and process data in large batches. When a constant step-size is used, these algorithms also have the ability to adapt to drifts in problem parameters, such as data or model properties, and track the optimal solution with reasonable accuracy. We establish a link between steady-state performance derived under stationarity assumptions and the tracking performance of online learners under random walk models.
arXiv Detail & Related papers (2020-04-04T14:16:27Z)

This list is automatically generated from the titles and abstracts of the papers in this site.