MeteoRA: Multiple-tasks Embedded LoRA for Large Language Models
- URL: http://arxiv.org/abs/2405.13053v3
- Date: Wed, 09 Oct 2024 15:33:10 GMT
- Title: MeteoRA: Multiple-tasks Embedded LoRA for Large Language Models
- Authors: Jingwei Xu, Junyu Lai, Yunpeng Huang,
- Abstract summary: MeteoRA is a scalable and efficient framework that reuses multiple task-specific LoRA adapters into the base LLM.
MeteoRA achieves superior performance in handling composite tasks, effectively solving ten sequential problems in a single inference pass.
- Score: 4.978361907192563
- License:
- Abstract: The pretrain+fine-tune paradigm is foundational for deploying large language models (LLMs) across various downstream applications. Within this framework, Low-Rank Adaptation (LoRA) stands out for its parameter-efficient fine-tuning (PEFT), producing numerous reusable task-specific LoRA adapters. However, this approach requires explicit task intention selection, posing challenges for autonomous task sensing and switching during inference with multiple existing LoRA adapters embedded in a single LLM. In this work, we introduce MeteoRA (Multiple-tasks embedded LoRA), a scalable and efficient framework that reuses multiple task-specific LoRA adapters into the base LLM via a full-mode Mixture-of-Experts (MoE) architecture. This framework also includes novel MoE forward acceleration strategies to address the efficiency challenges of traditional MoE implementations. Our evaluation, using the LlaMA2-13B and LlaMA3-8B base models equipped with 28 existing LoRA adapters through MeteoRA, demonstrates equivalent performance with the traditional PEFT method. Moreover, the LLM equipped with MeteoRA achieves superior performance in handling composite tasks, effectively solving ten sequential problems in a single inference pass, thereby demonstrating the framework's enhanced capability for timely adapter switching.
Related papers
- MALoRA: Mixture of Asymmetric Low-Rank Adaptation for Enhanced Multi-Task Learning [29.957620178740186]
In multi-task scenarios, challenges such as training imbalance and the seesaw effect frequently emerge.
We propose Mixture of Asymmetric Low-Rank Adaptaion (MALoRA) as a flexible fine-tuning framework.
MALoRA reduces the number of trainable parameters by 30% to 48%, increases training speed by 1.2x, and matches the computational efficiency of single-task LoRA models.
arXiv Detail & Related papers (2024-10-30T07:53:52Z) - MiLoRA: Efficient Mixture of Low-Rank Adaptation for Large Language Models Fine-tuning [9.91790333647256]
Low-rank adaptation (LoRA) and its mixture-of-experts (MOE) variants are highly effective parameter-efficient fine-tuning (PEFT) methods.
We propose Mixture of Low-Rank Adaptation (MiLoRA), a novel and efficient LoRA variant.
MiLoRA differs from previous MOE-style LoRA methods by considering each LoRA module as an expert and employing a prompt-aware routing mechanism.
arXiv Detail & Related papers (2024-10-23T17:04:40Z) - MTL-LoRA: Low-Rank Adaptation for Multi-Task Learning [74.43869839954168]
We propose MTL-LoRA, which retains the advantages of low-rank adaptation while significantly enhancing multi-task learning capabilities.
MTL-LoRA augments LoRA by incorporating additional task-adaptive parameters that differentiate task-specific information.
This approach enables large language models (LLMs) pre-trained on general corpus to adapt to different target task domains with a limited number of trainable parameters.
arXiv Detail & Related papers (2024-10-12T08:32:26Z) - Retrieval-Augmented Mixture of LoRA Experts for Uploadable Machine Learning [57.36978335727009]
Low-Rank Adaptation (LoRA) offers an efficient way to fine-tune large language models (LLMs)
In this paper, we propose a framework that adaptively retrieves and composes multiple LoRAs based on input prompts.
arXiv Detail & Related papers (2024-06-24T05:24:41Z) - Mixture of LoRA Experts [87.50120181861362]
This paper introduces the Mixture of LoRA Experts (MoLE) approach, which harnesses hierarchical control and unfettered branch selection.
The MoLE approach achieves superior LoRA fusion performance in comparison to direct arithmetic merging.
arXiv Detail & Related papers (2024-04-21T11:59:53Z) - Multimodal Instruction Tuning with Conditional Mixture of LoRA [54.65520214291653]
This paper introduces a novel approach that integrates multimodal instruction tuning with Low-Rank Adaption (LoRA)
It innovates upon LoRA by dynamically constructing low-rank adaptation matrices tailored to the unique demands of each input instance.
Experimental results on various multimodal evaluation datasets indicate that MixLoRA not only outperforms the conventional LoRA with the same or even higher ranks.
arXiv Detail & Related papers (2024-02-24T20:15:31Z) - LoraRetriever: Input-Aware LoRA Retrieval and Composition for Mixed
Tasks in the Wild [76.67343971195267]
Low-Rank Adaptation (LoRA) provides an efficient solution for fine-tuning large language models (LLM)
LoraRetriever is a retrieve-then-compose framework that adaptively retrieves and composes multiple LoRAs according to the input prompts.
Experimental results indicate that LoraRetriever consistently outperforms the baselines.
arXiv Detail & Related papers (2024-02-15T15:02:46Z) - CA-LoRA: Adapting Existing LoRA for Compressed LLMs to Enable Efficient Multi-Tasking on Personal Devices [78.16679232748196]
We introduce a Compression-Aware LoRA (CA-LoRA) framework to transfer Large Language Models (LLMs) to other tasks.
Experiment results demonstrate that CA-LoRA outperforms the vanilla LoRA methods applied to a compressed LLM.
The source code of CA-LoRA is available at https://github.com/thunlp/CA-LoRA.
arXiv Detail & Related papers (2023-07-15T04:37:11Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.