Rank Also Matters: Hierarchical Configuration for Mixture of Adapter Experts in LLM Fine-Tuning
- URL: http://arxiv.org/abs/2502.03884v1
- Date: Thu, 06 Feb 2025 08:58:03 GMT
- Title: Rank Also Matters: Hierarchical Configuration for Mixture of Adapter Experts in LLM Fine-Tuning
- Authors: Peizhuang Cong, Wenpu Liu, Wenhan Yu, Haochen Zhao, Tong Yang,
- Abstract summary: We propose a hierarchical scheme for expert allocation and rank configuration, HILO, for large language models (LLMs)
HILO dynamically adjusts the number and rank of adapter experts across layers, matching the varying representational complexity of model layers in adapter-granularity.
Experiments on multiple benchmark tasks demonstrate that HILO outperforms existing methods in accuracy while introducing fewer trainable parameters.
- Score: 5.074620301447097
- License:
- Abstract: Large language models (LLMs) have demonstrated remarkable success across various tasks, accompanied by a continuous increase in their parameter size. Parameter-efficient fine-tuning (PEFT) methods, such as Low-Rank Adaptation (LoRA), address the challenges of fine-tuning LLMs by significantly reducing the number of trainable parameters. Recent studies have integrated LoRA with Mixture of Experts (MoE) architectures, leveraging multiple adapter experts and gating mechanisms to further improve fine-tuning performance. However, existing approaches primarily focus on adjusting the allocations of adapter experts per layer to optimize the introduced trainable parameter size, while neglecting a critical factor of adapters' rank. To this end, we propose a hierarchical scheme for expert allocation and rank configuration, HILO, which dynamically adjusts the number and rank of adapter experts across layers, matching the varying representational complexity of model layers in adapter-granularity. Extensive experiments on multiple benchmark tasks demonstrate that HILO outperforms existing methods in accuracy while introducing fewer trainable parameters, providing an efficient and practical solution for fine-tuning LLMs.
Related papers
- TriAdaptLoRA: Brain-Inspired Triangular Adaptive Low-Rank Adaptation for Parameter-Efficient Fine-Tuning [9.730075039461154]
Fine-tuning Large Language Models (LLMs) is pivotal for achieving optimal performance across diverse downstream tasks.
We propose Adaptive Low-Rank Adaptation (TriAdaptLoRA), a novel PEFT framework inspired by neuroscience principles.
Experiments conducted on a variety of natural language understanding and generation tasks demonstrate that TriAdaptLoRA consistently outperforms existing PEFT methods.
arXiv Detail & Related papers (2025-01-14T10:51:31Z) - OP-LoRA: The Blessing of Dimensionality [93.08208871549557]
Low-rank adapters enable fine-tuning of large models with only a small number of parameters.
They often pose optimization challenges, with poor convergence.
We introduce an over- parameterized approach that accelerates training without increasing inference costs.
We achieve improvements in vision-language tasks and especially notable increases in image generation.
arXiv Detail & Related papers (2024-12-13T18:55:19Z) - Less is More: Extreme Gradient Boost Rank-1 Adaption for Efficient Finetuning of LLMs [75.11449420928139]
Fine-tuning Large Language Models (LLMs) has become a crucial technique for adapting pre-trained models to downstream tasks.
Low-Rank Adaptation (LoRA) has emerged as a promising solution, but there exists a gap between the practical performance of low-rank adaptations and its theoretical optimum.
We propose eXtreme Gradient Boosting LoRA, a novel framework that bridges this gap by leveraging the power of ensemble learning.
arXiv Detail & Related papers (2024-10-25T17:07:13Z) - LoRTA: Low Rank Tensor Adaptation of Large Language Models [70.32218116940393]
Low Rank Adaptation (LoRA) is a popular Efficient Fine Tuning (PEFT) method.
We propose a higher-order Candecomp/Parafac (CP) decomposition, enabling a more compact and flexible representation.
Our method can achieve a reduction in the number of parameters while maintaining comparable performance.
arXiv Detail & Related papers (2024-10-05T06:59:50Z) - SARA: Singular-Value Based Adaptive Low-Rank Adaption [4.135688713311511]
LoRA as a parameter-efficient fine-tuning(PEFT) method is widely used for not adding inference overhead.
In this work, we first analyze the relationship between the performance of different layers and their ranks using SVD.
Based on this, we design the Singular-Value Based Adaptive Low-Rank Adaption(SARA)
arXiv Detail & Related papers (2024-08-06T16:39:42Z) - MoDE: Effective Multi-task Parameter Efficient Fine-Tuning with a Mixture of Dyadic Experts [6.245113492272563]
Mixture of Dyadic Experts (MoDE) is a novel design for efficient multi-task adaptation.
Our design allows for more fine-grained mixing, thereby increasing the model's ability to jointly handle multiple tasks.
arXiv Detail & Related papers (2024-08-02T18:05:10Z) - RankAdaptor: Hierarchical Rank Allocation for Efficient Fine-Tuning Pruned LLMs via Performance Model [4.926801686932735]
We introduce RankAdaptor, a hierarchical rank allocation method that enables efficient fine-tuning of pruned models.
We show that RankAdaptor consistently outperforms state-of-the-art methods across a variety of pruning settings and LLM architectures.
arXiv Detail & Related papers (2024-06-22T04:52:58Z) - MELoRA: Mini-Ensemble Low-Rank Adapters for Parameter-Efficient Fine-Tuning [71.50432879573614]
Low-rank adaptation (LoRA) is based on the idea that the adaptation process is intrinsically low-dimensional.
We present MELoRA, a mini-ensemble low-rank adapters that uses fewer trainable parameters while maintaining a higher rank.
Our experimental results show that, compared to LoRA, MELoRA achieves better performance with 8 times fewer trainable parameters on natural language understanding tasks and 36 times fewer trainable parameters on instruction following tasks.
arXiv Detail & Related papers (2024-02-27T07:14:12Z) - PRILoRA: Pruned and Rank-Increasing Low-Rank Adaptation [65.268245109828]
We introduce PRILoRA, which linearly allocates a different rank for each layer, in an increasing manner, and performs pruning throughout the training process.
We validate the effectiveness of PRILoRA through extensive experiments on eight GLUE benchmarks, setting a new state of the art.
arXiv Detail & Related papers (2024-01-20T20:25:17Z) - LLM-Adapters: An Adapter Family for Parameter-Efficient Fine-Tuning of
Large Language Models [75.25782573728677]
This paper presents a framework for adapter-based parameter-efficient fine-tuning (PEFT) of language models (LLMs)
The framework includes state-of-the-art open-access LLMs such as LLaMA, BLOOM, and GPT-J, as well as widely used adapters such as Series adapters, Parallel adapter, Prompt-based learning and Reparametrization-based methods.
We evaluate the effectiveness of the adapters on fourteen datasets from two different reasoning tasks, Arithmetic Reasoning and Commonsense Reasoning.
arXiv Detail & Related papers (2023-04-04T16:31:37Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.