A Sensitivity-Driven Expert Allocation Method in LoRA-MoE for Efficient Fine-Tuning
- URL: http://arxiv.org/abs/2505.06272v1
- Date: Tue, 06 May 2025 13:22:46 GMT
- Title: A Sensitivity-Driven Expert Allocation Method in LoRA-MoE for Efficient Fine-Tuning
- Authors: Junzhou Xu, Boyu Diao,
- Abstract summary: We propose a method for allocating expert numbers based on parameter sensitivity LoRA-SMoE.<n> Experimental results demonstrate that our LoRA-SMoE approach can enhance model performance while reducing the number of trainable parameters.
- Score: 0.6906005491572401
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: As deep learning models expand, the pre-training-fine-tuning paradigm has become the standard approach for handling various downstream tasks. However, shared parameters can lead to diminished performance when dealing with complex datasets involving multiple tasks. While introducing Mixture-of-Experts (MoE) methods has alleviated this issue to some extent, it also significantly increases the number of parameters required for fine-tuning and training time, introducing greater parameter redundancy. To address these challenges, we propose a method for allocating expert numbers based on parameter sensitivity LoRA-SMoE (A Sensitivity-Driven Expert Allocation Method in LoRA-MoE for Efficient Fine-Tuning). This method rapidly assesses the sensitivity of different tasks to parameters by sampling a small amount of data and using gradient information. It then adaptively allocates expert numbers within a given budget. The process maintains comparable memory consumption to LoRA (Low-Rank Adaptation) while ensuring an efficient and resource-friendly fine-tuning procedure. Experimental results demonstrate that compared to SOTA fine-tuning methods, our LoRA-SMoE approach can enhance model performance while reducing the number of trainable parameters. This significantly improves model performance in resource-constrained environments. Additionally, due to its efficient parameter sensitivity evaluation mechanism, LoRA-SMoE requires minimal computational overhead to optimize expert allocation, making it particularly suitable for scenarios with limited computational resources. All the code in this study will be made publicly available following the acceptance of the paper for publication. Source code is at https://github.com/EMLS-ICTCAS/LoRA-SMoE
Related papers
- Beyond SGD, Without SVD: Proximal Subspace Iteration LoRA with Diagonal Fractional K-FAC [50.36542772932594]
Low-Rank Adaptation (LoRA) fine-tunes large models by learning low-rank updates on top of frozen weights.<n>In this work, we address the gap between training with full steps with low-rank projections (SVDLoRA) and LoRA fine-tuning.<n>We propose LoRSum, a memory-efficient subroutine that closes this gap for gradient descent.
arXiv Detail & Related papers (2026-02-18T13:41:41Z) - Layer-wise LoRA fine-tuning: a similarity metric approach [0.6323908398583081]
Low-Rank Adaptation (LoRA) techniques aim to reduce the computational cost of this process by freezing the pre-trained model and updating a smaller number of parameters.<n>We address the previous problem by systematically selecting only a few layers to fine-tune using LoRA or its variants.<n>We reduce the trainable parameters in LoRA-based techniques by up to 50%, while maintaining the predictive performance across different models and tasks.
arXiv Detail & Related papers (2026-02-05T18:38:53Z) - High-Rank Structured Modulation for Parameter-Efficient Fine-Tuning [57.85676271833619]
Low-rank Adaptation (LoRA) uses a low-rank update method to simulate full parameter fine-tuning.<n>We present textbfSMoA, a high-rank textbfStructured textbfMOdulation textbfAdapter that uses fewer trainable parameters while maintaining a higher rank.
arXiv Detail & Related papers (2026-01-12T13:06:17Z) - Parameter-Efficient Fine-Tuning for HAR: Integrating LoRA and QLoRA into Transformer Models [0.2939891130492345]
Low-Rank Adaptation (LoRA) and Quantized LoRA are investigated as scalable alternatives to full model fine-tuning for Human Activity Recognition.<n>LoRA maintains robust performance even under limited supervision.<n>QLoRA extends these benefits by reducing the memory footprint of frozen weights through quantization.
arXiv Detail & Related papers (2025-12-19T14:12:43Z) - PointLoRA: Low-Rank Adaptation with Token Selection for Point Cloud Learning [54.99373314906667]
Self-supervised representation learning for point cloud has demonstrated effectiveness in improving pre-trained model performance across diverse tasks.<n>As pre-trained models grow in complexity, fully fine-tuning them for downstream applications demands substantial computational and storage resources.<n>We propose PointLoRA, a simple yet effective method that combines low-rank adaptation (LoRA) with multi-scale token selection to efficiently fine-tune point cloud models.
arXiv Detail & Related papers (2025-04-22T16:41:21Z) - Optuna vs Code Llama: Are LLMs a New Paradigm for Hyperparameter Tuning? [42.362388367152256]
Large language models (LLMs) are used to fine-tune a parameter-efficient version of Code Llama using LoRA.<n>Our method achieves competitive or superior results in terms of Root Mean Square Error (RMSE) while significantly reducing computational overhead.
arXiv Detail & Related papers (2025-04-08T13:15:47Z) - MSPLoRA: A Multi-Scale Pyramid Low-Rank Adaptation for Efficient Model Fine-Tuning [5.412348391086257]
We propose MSPLoRA, which introduces Global Shared LoRA, Mid-Level Shared LoRA, and Layer-Specific LoRA to capture global patterns, mid-level features, and fine-grained information.<n> Experiments on various NLP tasks demonstrate that MSPLoRA achieves more efficient adaptation and better performance while significantly reducing the number of trainable parameters.
arXiv Detail & Related papers (2025-03-27T07:01:50Z) - Less is More: Extreme Gradient Boost Rank-1 Adaption for Efficient Finetuning of LLMs [75.11449420928139]
Fine-tuning Large Language Models (LLMs) has become a crucial technique for adapting pre-trained models to downstream tasks.
Low-Rank Adaptation (LoRA) has emerged as a promising solution, but there exists a gap between the practical performance of low-rank adaptations and its theoretical optimum.
We propose eXtreme Gradient Boosting LoRA, a novel framework that bridges this gap by leveraging the power of ensemble learning.
arXiv Detail & Related papers (2024-10-25T17:07:13Z) - LoRTA: Low Rank Tensor Adaptation of Large Language Models [70.32218116940393]
Low Rank Adaptation (LoRA) is a popular Efficient Fine Tuning (PEFT) method.<n>We propose a higher-order Candecomp/Parafac (CP) decomposition, enabling a more compact and flexible representation.<n>Our method can achieve a reduction in the number of parameters while maintaining comparable performance.
arXiv Detail & Related papers (2024-10-05T06:59:50Z) - DoRA: Enhancing Parameter-Efficient Fine-Tuning with Dynamic Rank Distribution [28.589498108609202]
Low-Rank Adaptation (LoRA) relies on a bypass framework that ignores the differential parameter budget requirements across weight matrices.
DoRA decomposes high-rank LoRA layers into structured single-rank components, allowing for dynamic pruning of parameter budget.
Experimental results demonstrate that DoRA can achieve competitive performance compared with LoRA and full model fine-tuning.
arXiv Detail & Related papers (2024-05-27T17:02:27Z) - MoRA: High-Rank Updating for Parameter-Efficient Fine-Tuning [105.11844150736536]
Low-rank adaptation is a popular parameter-efficient fine-tuning method for large language models.
We propose a new method called MoRA, which employs a square matrix to achieve high-rank updating while maintaining the same number of trainable parameters.
Our method outperforms LoRA on memory-intensive tasks and achieves comparable performance on other tasks.
arXiv Detail & Related papers (2024-05-20T15:48:32Z) - LoRA-SP: Streamlined Partial Parameter Adaptation for Resource-Efficient Fine-Tuning of Large Language Models [7.926974917872204]
LoRA-SP is a novel approach utilizing randomized half-selective parameter freezing.
LoRA-SP significantly reduces computational and memory requirements without compromising model performance.
arXiv Detail & Related papers (2024-02-28T06:50:10Z) - MELoRA: Mini-Ensemble Low-Rank Adapters for Parameter-Efficient Fine-Tuning [71.50432879573614]
Low-rank adaptation (LoRA) is based on the idea that the adaptation process is intrinsically low-dimensional.<n>We present MELoRA, a mini-ensemble low-rank adapters that uses fewer trainable parameters while maintaining a higher rank.<n>Our experimental results show that, compared to LoRA, MELoRA achieves better performance with 8 times fewer trainable parameters on natural language understanding tasks and 36 times fewer trainable parameters on instruction following tasks.
arXiv Detail & Related papers (2024-02-27T07:14:12Z) - IncreLoRA: Incremental Parameter Allocation Method for
Parameter-Efficient Fine-tuning [15.964205804768163]
IncreLoRA is an incremental parameter allocation method that adaptively adds trainable parameters during training.
We conduct extensive experiments on GLUE to demonstrate the effectiveness of IncreLoRA.
arXiv Detail & Related papers (2023-08-23T10:08:10Z) - AdaLoRA: Adaptive Budget Allocation for Parameter-Efficient Fine-Tuning [143.23123791557245]
Fine-tuning large pre-trained language models on downstream tasks has become an important paradigm in NLP.
We propose AdaLoRA, which adaptively allocates the parameter budget among weight matrices according to their importance score.
We conduct extensive experiments with several pre-trained models on natural language processing, question answering, and natural language generation to validate the effectiveness of AdaLoRA.
arXiv Detail & Related papers (2023-03-18T22:36:25Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.