LLaVA-MoLE: Sparse Mixture of LoRA Experts for Mitigating Data Conflicts
in Instruction Finetuning MLLMs
- URL: http://arxiv.org/abs/2401.16160v2
- Date: Tue, 30 Jan 2024 15:44:58 GMT
- Title: LLaVA-MoLE: Sparse Mixture of LoRA Experts for Mitigating Data Conflicts
in Instruction Finetuning MLLMs
- Authors: Shaoxiang Chen, Zequn Jie, Lin Ma
- Abstract summary: We propose an efficient Mixture of Experts (MoE) design for instruction finetuning MLLMs.
Extensive experiments proved that LLaVA-MoLE effectively mitigates the data conflict issue when mixing multiple distinct instruction datasets.
LLaVA-MoLE can even outperform the plain-LoRA baseline trained with twice the samples.
- Score: 29.96139552754377
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Instruction finetuning on a variety of image-text instruction data is the key
to obtaining a versatile Multimodal Large Language Model (MLLM), and different
configurations of the instruction data can lead to finetuned models with
different capabilities. However, we have discovered that data conflicts are
inevitable when mixing instruction data from distinct domains, which can result
in performance drops for tasks of a specific domain. To address this issue, we
propose to apply an efficient Mixture of Experts (MoE) design, which is a
sparse Mixture of LoRA Experts (MoLE) for instruction finetuning MLLMs. Within
the Transformer layers, we extend the popular Low-Rank Adaption (LoRA) method
by creating a set of LoRA experts specifically for the MLP layer, and route
each token to the top-1 expert based on a routing function, allowing adaptive
choices for tokens from different domains. Since the LoRA experts are sparsely
activated, the training and inference cost are kept roughly constant compared
to the original LoRA method. By replacing the plain-LoRA of LLaVA-1.5 with our
MoE design, our final model is named LLaVA-MoLE. Extensive experiments proved
that LLaVA-MoLE effectively mitigates the data conflict issue when mixing
multiple distinct instruction datasets with various configurations, and
achieves consistent performance gains over the strong plain-LoRA baselines.
Most importantly, on the mixed datasets, LLaVA-MoLE can even outperform the
plain-LoRA baseline trained with twice the samples.
Related papers
- MiLoRA: Efficient Mixture of Low-Rank Adaptation for Large Language Models Fine-tuning [9.91790333647256]
Low-rank adaptation (LoRA) and its mixture-of-experts (MOE) variants are highly effective parameter-efficient fine-tuning (PEFT) methods.
We propose Mixture of Low-Rank Adaptation (MiLoRA), a novel and efficient LoRA variant.
MiLoRA differs from previous MOE-style LoRA methods by considering each LoRA module as an expert and employing a prompt-aware routing mechanism.
arXiv Detail & Related papers (2024-10-23T17:04:40Z) - MTL-LoRA: Low-Rank Adaptation for Multi-Task Learning [74.43869839954168]
We propose MTL-LoRA, which retains the advantages of low-rank adaptation while significantly enhancing multi-task learning capabilities.
MTL-LoRA augments LoRA by incorporating additional task-adaptive parameters that differentiate task-specific information.
This approach enables large language models (LLMs) pre-trained on general corpus to adapt to different target task domains with a limited number of trainable parameters.
arXiv Detail & Related papers (2024-10-12T08:32:26Z) - DLP-LoRA: Efficient Task-Specific LoRA Fusion with a Dynamic, Lightweight Plugin for Large Language Models [10.179598253424103]
Large Language Models (LLMs) have achieved robust performance across diverse tasks, but fine-tuning these models for specific domains remains resource-intensive.
We propose a mini-MLP module with only 5M parameters to dynamically fuse multiple LoRAs at the sentence level using top-p sampling strategies.
This approach reduces inference time to less than twice that of single LoRA inference by leveraging parallel computation.
arXiv Detail & Related papers (2024-10-02T12:45:52Z) - Retrieval-Augmented Mixture of LoRA Experts for Uploadable Machine Learning [57.36978335727009]
Low-Rank Adaptation (LoRA) offers an efficient way to fine-tune large language models (LLMs)
In this paper, we propose a framework that adaptively retrieves and composes multiple LoRAs based on input prompts.
arXiv Detail & Related papers (2024-06-24T05:24:41Z) - Mixture of LoRA Experts [87.50120181861362]
This paper introduces the Mixture of LoRA Experts (MoLE) approach, which harnesses hierarchical control and unfettered branch selection.
The MoLE approach achieves superior LoRA fusion performance in comparison to direct arithmetic merging.
arXiv Detail & Related papers (2024-04-21T11:59:53Z) - Mixture-of-LoRAs: An Efficient Multitask Tuning for Large Language
Models [7.966452497550907]
We propose the Mixture-of-LoRAs (MoA) architecture for multi-task learning with large language models (LLMs)
Multiple domain-specific LoRA modules can be aligned with the expert design principles observed in Mixture-of-Experts (MoE)
Each LoRA model can be iteratively adapted to a new domain, allowing for quick domain-specific adaptation.
arXiv Detail & Related papers (2024-03-06T03:33:48Z) - Multimodal Instruction Tuning with Conditional Mixture of LoRA [54.65520214291653]
This paper introduces a novel approach that integrates multimodal instruction tuning with Low-Rank Adaption (LoRA)
It innovates upon LoRA by dynamically constructing low-rank adaptation matrices tailored to the unique demands of each input instance.
Experimental results on various multimodal evaluation datasets indicate that MixLoRA not only outperforms the conventional LoRA with the same or even higher ranks.
arXiv Detail & Related papers (2024-02-24T20:15:31Z) - LoraRetriever: Input-Aware LoRA Retrieval and Composition for Mixed
Tasks in the Wild [76.67343971195267]
Low-Rank Adaptation (LoRA) provides an efficient solution for fine-tuning large language models (LLM)
LoraRetriever is a retrieve-then-compose framework that adaptively retrieves and composes multiple LoRAs according to the input prompts.
Experimental results indicate that LoraRetriever consistently outperforms the baselines.
arXiv Detail & Related papers (2024-02-15T15:02:46Z) - MultiLoRA: Democratizing LoRA for Better Multi-Task Learning [20.750808913757396]
LoRA achieves remarkable resource efficiency and comparable performance when adapting LLMs for specific tasks.
LoRA is dominated by a small number of top singular vectors while fine-tuning decomposes into a set of less important unitary transforms.
We propose MultiLoRA for better multi-task adaptation by reducing the dominance of top singular vectors observed in LoRA.
arXiv Detail & Related papers (2023-11-20T02:59:18Z) - Octavius: Mitigating Task Interference in MLLMs via LoRA-MoE [83.00018517368973]
Large Language Models (LLMs) can extend their zero-shot capabilities to multimodal learning through instruction tuning.
negative conflicts and interference may have a worse impact on performance.
We combine the well-known Mixture-of-Experts (MoE) and one of the representative PEFT techniques, i.e., LoRA, designing a novel LLM-based decoder, called LoRA-MoE, for multimodal learning.
arXiv Detail & Related papers (2023-11-05T15:48:29Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.