Related papers: MALoRA: Mixture of Asymmetric Low-Rank Adaptation for Enhanced Multi-Task Learning

MALoRA: Mixture of Asymmetric Low-Rank Adaptation for Enhanced Multi-Task Learning

URL: http://arxiv.org/abs/2410.22782v1
Date: Wed, 30 Oct 2024 07:53:52 GMT
Title: MALoRA: Mixture of Asymmetric Low-Rank Adaptation for Enhanced Multi-Task Learning
Authors: Xujia Wang, Haiyan Zhao, Shuo Wang, Hanqing Wang, Zhiyuan Liu,
Abstract summary: In multi-task scenarios, challenges such as training imbalance and the seesaw effect frequently emerge. We propose Mixture of Asymmetric Low-Rank Adaptaion (MALoRA) as a flexible fine-tuning framework. MALoRA reduces the number of trainable parameters by 30% to 48%, increases training speed by 1.2x, and matches the computational efficiency of single-task LoRA models.
Score: 29.957620178740186
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Parameter-Efficient Fine-Tuning (PEFT) methods like LoRA have significantly improved the adaptation of LLMs to downstream tasks in a resource-efficient manner. However, in multi-task scenarios, challenges such as training imbalance and the seesaw effect frequently emerge. Mixture-of-LoRA (MoLoRA), which combines LoRA with sparse Mixture-of-Experts, mitigates some of these issues by promoting task-specific learning across experts. Despite this, MoLoRA remains inefficient in terms of training speed, parameter utilization, and overall multi-task performance. In this paper, we propose Mixture of Asymmetric Low-Rank Adaptaion (MALoRA), a flexible fine-tuning framework that leverages asymmetric optimization across LoRA experts. MALoRA reduces the number of trainable parameters by 30% to 48%, increases training speed by 1.2x, and matches the computational efficiency of single-task LoRA models. Additionally, MALoRA addresses overfitting issues commonly seen in high-rank configurations, enhancing performance stability. Extensive experiments across diverse multi-task learning scenarios demonstrate that MALoRA consistently outperforms all baseline methods in both inter-domain and intra-domain tasks.

Related papers

DR-LoRA: Dynamic Rank LoRA for Mixture-of-Experts Adaptation [26.24723718425076]
Mixture-of-Experts (MoE) has become a prominent paradigm for scaling Large Language Models (LLMs)<n>We propose a Dynamic Rank LoRA framework named DR-LoRA, which dynamically grows expert LoRA ranks during fine-tuning based on task-specific demands.<n>Experiments on multiple benchmarks demonstrate that DR-LoRA consistently outperforms standard LoRA and static allocation strategies under the same parameter budget.
arXiv Detail & Related papers (2026-01-08T10:58:51Z)
MoRE: A Mixture of Low-Rank Experts for Adaptive Multi-Task Learning [18.0412262027514]
We propose a novel Mixture of Low-Rank Experts (MoRE) for multi-task.<n>Instead of using an individual LoRA for each task, we align different ranks of LoRA module with different tasks.<n>We also design a novel adaptive rank selector to select the appropriate expert for each task.
arXiv Detail & Related papers (2025-05-28T12:32:09Z)
A Stronger Mixture of Low-Rank Experts for Fine-Tuning Foundation Models [22.457766373989365]
Low-Rank Adapters (LoRAs) have been substantially adopted across various fields, including instruction tuning and domain adaptation. To address the limited expressive capacity of LoRA, the Mixture-of-Expert (MoE) has been introduced for incorporating multiple LoRA adapters. We propose a new training strategy for MoE-LoRA, to stabilize and boost its feature learning procedure by multi-space projections.
arXiv Detail & Related papers (2025-02-20T05:58:53Z)
BeamLoRA: Beam-Constraint Low-Rank Adaptation [51.52097743781401]
Low-Rank Adaptation (LoRA) has been widely adopted as one of the most effective parameter-efficient fine-tuning methods. We propose BeamLoRA, which conceptualizes each LoRA module as a beam where each rank naturally corresponds to a potential sub-solution.
arXiv Detail & Related papers (2025-02-19T10:33:22Z)
Each Rank Could be an Expert: Single-Ranked Mixture of Experts LoRA for Multi-Task Learning [53.98941571078398]
Low-Rank Adaptation (LoRA) is widely used for adapting large language models (LLMs) to specific domains due to its efficiency and modularity. Recent works adopt Mixture of Experts (MoE) by treating each LoRA module as an expert, thereby mitigating task interference through multiple specialized LoRA modules. While effective, these methods often isolate knowledge within individual tasks, failing to fully exploit the shared knowledge across related tasks. We propose Single-ranked Mixture of Experts LoRA (textbfSMoRA), which embeds MoE into LoRA by textittreating each rank as an
arXiv Detail & Related papers (2025-01-25T06:56:39Z)
MoSLD: An Extremely Parameter-Efficient Mixture-of-Shared LoRAs for Multi-Task Learning [8.868481107848185]
MoSLD is a mixture-of-shared-LoRAs model with a dropout strategy. MoSLD addresses challenges by sharing the upper projection matrix in LoRA among different experts. Our model exhibits excellent performance in both single-task and multi-task scenarios.
arXiv Detail & Related papers (2024-12-12T05:22:49Z)
LoRA Done RITE: Robust Invariant Transformation Equilibration for LoRA Optimization [78.93425154518705]
Low-rank adaption (LoRA) is a widely used parameter-efficient finetuning method for LLM that reduces memory requirements. This paper introduces LoRA-RITE, a novel adaptive matrix preconditioning method for LoRA optimization.
arXiv Detail & Related papers (2024-10-27T22:57:12Z)
MiLoRA: Efficient Mixture of Low-Rank Adaptation for Large Language Models Fine-tuning [9.91790333647256]
Low-rank adaptation (LoRA) and its mixture-of-experts (MOE) variants are highly effective parameter-efficient fine-tuning (PEFT) methods. We propose Mixture of Low-Rank Adaptation (MiLoRA), a novel and efficient LoRA variant. MiLoRA differs from previous MOE-style LoRA methods by considering each LoRA module as an expert and employing a prompt-aware routing mechanism.
arXiv Detail & Related papers (2024-10-23T17:04:40Z)
MTL-LoRA: Low-Rank Adaptation for Multi-Task Learning [74.43869839954168]
We propose MTL-LoRA, which retains the advantages of low-rank adaptation while significantly enhancing multi-task learning capabilities. MTL-LoRA augments LoRA by incorporating additional task-adaptive parameters that differentiate task-specific information. This approach enables large language models (LLMs) pre-trained on general corpus to adapt to different target task domains with a limited number of trainable parameters.
arXiv Detail & Related papers (2024-10-12T08:32:26Z)
Randomized Asymmetric Chain of LoRA: The First Meaningful Theoretical Framework for Low-Rank Adaptation [58.288682735160585]
Low-Rank Adaptation (LoRA) is a popular technique for finetuning models. LoRA often under performs when compared to full- parameter fine-tuning. We present a framework that rigorously analyzes the adaptation rates of LoRA methods.
arXiv Detail & Related papers (2024-10-10T18:51:53Z)
Mixture of LoRA Experts [87.50120181861362]
This paper introduces the Mixture of LoRA Experts (MoLE) approach, which harnesses hierarchical control and unfettered branch selection. The MoLE approach achieves superior LoRA fusion performance in comparison to direct arithmetic merging.
arXiv Detail & Related papers (2024-04-21T11:59:53Z)
Multimodal Instruction Tuning with Conditional Mixture of LoRA [54.65520214291653]
This paper introduces a novel approach that integrates multimodal instruction tuning with Low-Rank Adaption (LoRA) It innovates upon LoRA by dynamically constructing low-rank adaptation matrices tailored to the unique demands of each input instance. Experimental results on various multimodal evaluation datasets indicate that MixLoRA not only outperforms the conventional LoRA with the same or even higher ranks.
arXiv Detail & Related papers (2024-02-24T20:15:31Z)
PRoLoRA: Partial Rotation Empowers More Parameter-Efficient LoRA [45.38491644250814]
Partially Rotation-enhanced Low-Rank Adaptation (PRoLoRA) is an intra-layer sharing mechanism. PRoLoRA retains its advantages, and effectively circumvents the drawbacks of peer parameter-sharing methods. Empirical experiments demonstrate the remarkably higher parameter efficiency of PRoLoRA.
arXiv Detail & Related papers (2024-02-24T13:39:05Z)
MoELoRA: Contrastive Learning Guided Mixture of Experts on Parameter-Efficient Fine-Tuning for Large Language Models [24.17147521556083]
We introduce a novel PEFT method: MoELoRA. We conduct experiments on 11 tasks in math reasoning and common-sense reasoning benchmarks. MoELoRA achieved an average performance that was 4.2% higher than LoRA, and demonstrated competitive performance compared to the 175B GPT-3.5 on several benchmarks.
arXiv Detail & Related papers (2024-02-20T09:30:48Z)
MultiLoRA: Democratizing LoRA for Better Multi-Task Learning [20.750808913757396]
LoRA achieves remarkable resource efficiency and comparable performance when adapting LLMs for specific tasks. LoRA is dominated by a small number of top singular vectors while fine-tuning decomposes into a set of less important unitary transforms. We propose MultiLoRA for better multi-task adaptation by reducing the dominance of top singular vectors observed in LoRA.
arXiv Detail & Related papers (2023-11-20T02:59:18Z)
SiRA: Sparse Mixture of Low Rank Adaptation [63.926732717719354]
We investigate the importance of leveraging "sparse" computation and propose SiRA: sparse mixture of low rank. Specifically it enforces the top $k$ experts routing with a capacity limit restricting the maximum number of tokens each expert can process.
arXiv Detail & Related papers (2023-11-15T18:15:37Z)

This list is automatically generated from the titles and abstracts of the papers in this site.