Related papers: Enhancing Multimodal Continual Instruction Tuning with BranchLoRA

Enhancing Multimodal Continual Instruction Tuning with BranchLoRA

URL: http://arxiv.org/abs/2506.02041v1
Date: Sat, 31 May 2025 09:02:38 GMT
Title: Enhancing Multimodal Continual Instruction Tuning with BranchLoRA
Authors: Duzhen Zhang, Yong Ren, Zhong-Zhi Li, Yahan Yu, Jiahua Dong, Chenxing Li, Zhilong Ji, Jinfeng Bai,
Abstract summary: Multimodal Continual Instruction Tuning aims to finetune Multimodal Large Language Models (MLLMs) to continually align with human intent.<n>Existing approaches often rely on the Mixture-of-Experts (MoE) LoRA framework to preserve previous instruction alignments.<n>We propose BranchLoRA, an asymmetric framework to enhance both efficiency and performance.
Score: 26.618850482040397
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Multimodal Continual Instruction Tuning (MCIT) aims to finetune Multimodal Large Language Models (MLLMs) to continually align with human intent across sequential tasks. Existing approaches often rely on the Mixture-of-Experts (MoE) LoRA framework to preserve previous instruction alignments. However, these methods are prone to Catastrophic Forgetting (CF), as they aggregate all LoRA blocks via simple summation, which compromises performance over time. In this paper, we identify a critical parameter inefficiency in the MoELoRA framework within the MCIT context. Based on this insight, we propose BranchLoRA, an asymmetric framework to enhance both efficiency and performance. To mitigate CF, we introduce a flexible tuning-freezing mechanism within BranchLoRA, enabling branches to specialize in intra-task knowledge while fostering inter-task collaboration. Moreover, we incrementally incorporate task-specific routers to ensure an optimal branch distribution over time, rather than favoring the most recent task. To streamline inference, we introduce a task selector that automatically routes test inputs to the appropriate router without requiring task identity. Extensive experiments on the latest MCIT benchmark demonstrate that BranchLoRA significantly outperforms MoELoRA and maintains its superiority across various MLLM sizes.

Related papers

RCR-Router: Efficient Role-Aware Context Routing for Multi-Agent LLM Systems with Structured Memory [57.449129198822476]
RCR is a role-aware context routing framework for multi-agent large language model (LLM) systems.<n>It dynamically selects semantically relevant memory subsets for each agent based on its role and task stage.<n>A lightweight scoring policy guides memory selection, and agent outputs are integrated into a shared memory store.
arXiv Detail & Related papers (2025-08-06T21:59:34Z)
MoRE: A Mixture of Low-Rank Experts for Adaptive Multi-Task Learning [18.0412262027514]
We propose a novel Mixture of Low-Rank Experts (MoRE) for multi-task.<n>Instead of using an individual LoRA for each task, we align different ranks of LoRA module with different tasks.<n>We also design a novel adaptive rank selector to select the appropriate expert for each task.
arXiv Detail & Related papers (2025-05-28T12:32:09Z)
ThanoRA: Task Heterogeneity-Aware Multi-Task Low-Rank Adaptation [73.18867725540865]
Low-Rank Adaptation (LoRA) is widely adopted for downstream fine-tuning of foundation models.<n>We propose ThanoRA, a Task Heterogeneity-Aware Multi-Task Low-Rank Adaptation framework.
arXiv Detail & Related papers (2025-05-24T11:01:45Z)
In-Context Meta LoRA Generation [61.690065588534296]
Low-rank Adaptation (LoRA) has demonstrated remarkable capabilities for task specific fine-tuning.<n>We propose In-Context Meta LoRA (ICM-LoRA), a novel approach that efficiently achieves task-specific customization of large language models.<n>ICM-LoRA enables more accurate LoRA parameter reconstruction than current parameter reconstruction methods.
arXiv Detail & Related papers (2025-01-29T13:12:01Z)
Each Rank Could be an Expert: Single-Ranked Mixture of Experts LoRA for Multi-Task Learning [53.98941571078398]
Low-Rank Adaptation (LoRA) is widely used for adapting large language models (LLMs) to specific domains due to its efficiency and modularity.<n>Recent works adopt Mixture of Experts (MoE) by treating each LoRA module as an expert, thereby mitigating task interference through multiple specialized LoRA modules.<n>While effective, these methods often isolate knowledge within individual tasks, failing to fully exploit the shared knowledge across related tasks.<n>We propose Single-ranked Mixture of Experts LoRA (textbfSMoRA), which embeds MoE into LoRA by textittreating each rank as an
arXiv Detail & Related papers (2025-01-25T06:56:39Z)
MTL-LoRA: Low-Rank Adaptation for Multi-Task Learning [74.43869839954168]
We propose MTL-LoRA, which retains the advantages of low-rank adaptation while significantly enhancing MTL capabilities.<n> MTL-LoRA augments LoRA by incorporating additional task-adaptive parameters that differentiate task-specific information and capture shared knowledge.<n>This approach enables pre-trained models to jointly adapt to different target domains with a limited number of trainable parameters.
arXiv Detail & Related papers (2024-10-12T08:32:26Z)
Glider: Global and Local Instruction-Driven Expert Router [83.785832410832]
"Model MoErging" methods prioritize generalization to unseen tasks at the expense of performance on held-in tasks. We propose Global and Local Instruction Driven Expert Router (GLIDER) that integrates a multi-scale routing mechanism. GLIDER achieves substantially improved held-in performance while maintaining strong generalization on held-out tasks.
arXiv Detail & Related papers (2024-10-09T17:59:14Z)
MeteoRA: Multiple-tasks Embedded LoRA for Large Language Models [4.978361907192563]
MeteoRA is a scalable and efficient framework that reuses multiple task-specific LoRA adapters into the base LLM. MeteoRA achieves superior performance in handling composite tasks, effectively solving ten sequential problems in a single inference pass.
arXiv Detail & Related papers (2024-05-19T20:46:07Z)
Mixture-of-LoRAs: An Efficient Multitask Tuning for Large Language Models [7.966452497550907]
We propose the Mixture-of-LoRAs (MoA) architecture for multi-task learning with large language models (LLMs) Multiple domain-specific LoRA modules can be aligned with the expert design principles observed in Mixture-of-Experts (MoE) Each LoRA model can be iteratively adapted to a new domain, allowing for quick domain-specific adaptation.
arXiv Detail & Related papers (2024-03-06T03:33:48Z)
Multimodal Instruction Tuning with Conditional Mixture of LoRA [51.58020580970644]
This paper introduces a novel approach that integrates multimodal instruction tuning with Low-Rank Adaption (LoRA)<n>It innovates upon LoRA by dynamically constructing low-rank adaptation matrices tailored to the unique demands of each input instance.<n> Experimental results on various multimodal evaluation datasets indicate that MixLoRA not only outperforms the conventional LoRA with the same or even higher ranks.
arXiv Detail & Related papers (2024-02-24T20:15:31Z)

This list is automatically generated from the titles and abstracts of the papers in this site.