Related papers: MoS: Unleashing Parameter Efficiency of Low-Rank Adaptation with Mixture of Shards

MoS: Unleashing Parameter Efficiency of Low-Rank Adaptation with Mixture of Shards

URL: http://arxiv.org/abs/2410.00938v2
Date: Sat, 15 Feb 2025 10:36:03 GMT
Title: MoS: Unleashing Parameter Efficiency of Low-Rank Adaptation with Mixture of Shards
Authors: Sheng Wang, Liheng Chen, Pengan Chen, Jingwei Dong, Boyang Xue, Jiyue Jiang, Lingpeng Kong, Chuan Wu,
Abstract summary: The rapid scaling of large language models requires more lightweight finetuning methods to reduce the explosive GPU memory overhead.<n>Our research highlights the indispensable role of differentiation in reversing the detrimental effects of pure sharing.<n>We propose Mixture of Shards (MoS), incorporating both inter-layer and intra-layer sharing schemes, and integrating four nearly cost-free differentiation strategies.
Score: 35.163843138935455
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: The rapid scaling of large language models necessitates more lightweight finetuning methods to reduce the explosive GPU memory overhead when numerous customized models are served simultaneously. Targeting more parameter-efficient low-rank adaptation (LoRA), parameter sharing presents a promising solution. Empirically, our research into high-level sharing principles highlights the indispensable role of differentiation in reversing the detrimental effects of pure sharing. Guided by this finding, we propose Mixture of Shards (MoS), incorporating both inter-layer and intra-layer sharing schemes, and integrating four nearly cost-free differentiation strategies, namely subset selection, pair dissociation, vector sharding, and shard privatization. Briefly, it selects a designated number of shards from global pools with a Mixture-of-Experts (MoE)-like routing mechanism before sequentially concatenating them to low-rank matrices. Hence, it retains all the advantages of LoRA while offering enhanced parameter efficiency, and effectively circumvents the drawbacks of peer parameter-sharing methods. Our empirical experiments demonstrate approximately 8x parameter savings in a standard LoRA setting. The ablation study confirms the significance of each component. Our insights into parameter sharing and MoS method may illuminate future developments of more parameter-efficient finetuning methods. The code is officially available at https://github.com/Forence1999/MoS.

Related papers

Decouple and Orthogonalize: A Data-Free Framework for LoRA Merging [18.650279202312614]
We propose a Decoupled and Orthogonal merging approach(DO-Merging)<n>By separating parameters into magnitude and direction components, we reduce the impact of magnitude differences on the directional alignment of the merged models.<n>We validate through extensive experiments across vision, language, and multi-modal domains that our proposed DO-Merging can achieve significantly higher performance than existing merging methods at a minimal cost.
arXiv Detail & Related papers (2025-05-21T16:34:37Z)
A Sensitivity-Driven Expert Allocation Method in LoRA-MoE for Efficient Fine-Tuning [0.6906005491572401]
We propose a method for allocating expert numbers based on parameter sensitivity LoRA-SMoE.<n> Experimental results demonstrate that our LoRA-SMoE approach can enhance model performance while reducing the number of trainable parameters.
arXiv Detail & Related papers (2025-05-06T13:22:46Z)
HSplitLoRA: A Heterogeneous Split Parameter-Efficient Fine-Tuning Framework for Large Language Models [30.345920952847752]
Large language models (LLMs) have achieved remarkable breakthroughs, revolutionizing the natural language processing domain and beyond.<n>Due to immense parameter sizes, fine-tuning these models with private data for diverse downstream tasks has become mainstream.<n>We propose HSplitLoRA, a framework built on split learning (SL) and low-rank adaptation (LoRA) fine-tuning, for efficiently fine-tuning LLMs on heterogeneous client devices.
arXiv Detail & Related papers (2025-05-05T17:09:19Z)
PointLoRA: Low-Rank Adaptation with Token Selection for Point Cloud Learning [54.99373314906667]
Self-supervised representation learning for point cloud has demonstrated effectiveness in improving pre-trained model performance across diverse tasks. As pre-trained models grow in complexity, fully fine-tuning them for downstream applications demands substantial computational and storage resources. We propose PointLoRA, a simple yet effective method that combines low-rank adaptation (LoRA) with multi-scale token selection to efficiently fine-tune point cloud models.
arXiv Detail & Related papers (2025-04-22T16:41:21Z)
Reinforced Model Merging [53.84354455400038]
We present an innovative framework termed Reinforced Model Merging (RMM), which encompasses an environment and agent tailored for merging tasks. By utilizing data subsets during the evaluation process, we addressed the bottleneck in the reward feedback phase, thereby accelerating RMM by up to 100 times.
arXiv Detail & Related papers (2025-03-27T08:52:41Z)
ASLoRA: Adaptive Sharing Low-Rank Adaptation Across Layers [37.77593687901923]
ASLoRA is a cross-layer parameter-sharing strategy combining global sharing with partial adaptive sharing. We conduct experiments on various NLP tasks, showing that ASLoRA outperforms LoRA while using less than 25% of the parameters.
arXiv Detail & Related papers (2024-12-13T13:32:13Z)
ALoRE: Efficient Visual Adaptation via Aggregating Low Rank Experts [71.91042186338163]
ALoRE is a novel PETL method that reuses the hypercomplex parameterized space constructed by Kronecker product to Aggregate Low Rank Experts. Thanks to the artful design, ALoRE maintains negligible extra parameters and can be effortlessly merged into the frozen backbone.
arXiv Detail & Related papers (2024-12-11T12:31:30Z)
Large Language Model Empowered Embedding Generator for Sequential Recommendation [57.49045064294086]
Large Language Model (LLM) has the potential to understand the semantic connections between items, regardless of their popularity. We present LLMEmb, an innovative technique that harnesses LLM to create item embeddings that bolster the performance of Sequential Recommender Systems.
arXiv Detail & Related papers (2024-09-30T03:59:06Z)
SHERL: Synthesizing High Accuracy and Efficient Memory for Resource-Limited Transfer Learning [63.93193829913252]
We propose an innovative METL strategy called SHERL for resource-limited scenarios. In the early route, intermediate outputs are consolidated via an anti-redundancy operation. In the late route, utilizing minimal late pre-trained layers could alleviate the peak demand on memory overhead.
arXiv Detail & Related papers (2024-07-10T10:22:35Z)
ShareLoRA: Parameter Efficient and Robust Large Language Model Fine-tuning via Shared Low-Rank Adaptation [4.07532985236519]
This study introduces an approach to optimize Efficient Fine Tuning (PEFT) for Pretrained Language Models (PLMs) by implementing a Shared Low Rank Adaptation (ShareLoRA) By strategically deploying ShareLoRA across different layers and adapting it for the Query, Key, and Value components of self-attention layers, we achieve a substantial reduction in the number of training parameters and memory usage. Our findings affirm that ShareLoRA effectively boosts parameter efficiency while ensuring scalable and high-quality performance across different language model architectures.
arXiv Detail & Related papers (2024-06-16T02:52:28Z)
VB-LoRA: Extreme Parameter Efficient Fine-Tuning with Vector Banks [10.266224162377371]
Low-rank adaptation (LoRA) and its variants incur substantial storage and transmission costs. We introduce a "divide-and-share" paradigm that breaks the barriers of low-rank decomposition across matrix dimensions, modules, and layers. VB-LoRA achieves extreme parameter efficiency while maintaining comparable or better performance compared to state-of-the-art PEFT methods.
arXiv Detail & Related papers (2024-05-24T03:24:34Z)
Mixture of LoRA Experts [87.50120181861362]
This paper introduces the Mixture of LoRA Experts (MoLE) approach, which harnesses hierarchical control and unfettered branch selection. The MoLE approach achieves superior LoRA fusion performance in comparison to direct arithmetic merging.
arXiv Detail & Related papers (2024-04-21T11:59:53Z)
MELoRA: Mini-Ensemble Low-Rank Adapters for Parameter-Efficient Fine-Tuning [71.50432879573614]
Low-rank adaptation (LoRA) is based on the idea that the adaptation process is intrinsically low-dimensional. We present MELoRA, a mini-ensemble low-rank adapters that uses fewer trainable parameters while maintaining a higher rank. Our experimental results show that, compared to LoRA, MELoRA achieves better performance with 8 times fewer trainable parameters on natural language understanding tasks and 36 times fewer trainable parameters on instruction following tasks.
arXiv Detail & Related papers (2024-02-27T07:14:12Z)
LoRA Meets Dropout under a Unified Framework [38.5176197615878]
Large language models (LLMs) have emerged as essential elements in numerous NLP applications. Various dropout methods, initially designed for full finetuning with all the parameters updated, alleviates overfitting associated with excessive parameter redundancy. We introduce a unified framework for a comprehensive investigation, which instantiates these methods based on dropping position, structural pattern and compensation measure.
arXiv Detail & Related papers (2024-02-25T07:09:10Z)
PRoLoRA: Partial Rotation Empowers More Parameter-Efficient LoRA [45.38491644250814]
Partially Rotation-enhanced Low-Rank Adaptation (PRoLoRA) is an intra-layer sharing mechanism. PRoLoRA retains its advantages, and effectively circumvents the drawbacks of peer parameter-sharing methods. Empirical experiments demonstrate the remarkably higher parameter efficiency of PRoLoRA.
arXiv Detail & Related papers (2024-02-24T13:39:05Z)
Parameter Efficient Fine-tuning via Cross Block Orchestration for Segment Anything Model [81.55141188169621]
We equip PEFT with a cross-block orchestration mechanism to enable the adaptation of the Segment Anything Model (SAM) to various downstream scenarios. We propose an intra-block enhancement module, which introduces a linear projection head whose weights are generated from a hyper-complex layer. Our proposed approach consistently improves the segmentation performance significantly on novel scenarios with only around 1K additional parameters.
arXiv Detail & Related papers (2023-11-28T11:23:34Z)
Sparse Low-rank Adaptation of Pre-trained Language Models [79.74094517030035]
We introduce sparse low-rank adaptation (SoRA) that enables dynamic adjustments to the intrinsic rank during the adaptation process. Our approach strengthens the representation power of LoRA by initializing it with a higher rank, while efficiently taming a temporarily increased number of parameters. Our experimental results demonstrate that SoRA can outperform other baselines even with 70% retained parameters and 70% training time.
arXiv Detail & Related papers (2023-11-20T11:56:25Z)
Tied-Lora: Enhancing parameter efficiency of LoRA with weight tying [6.172790376076545]
We introduce Tied-LoRA, a novel paradigm leveraging weight tying and selective training to enhance the parameter efficiency of Low-rank Adaptation (LoRA) Our exploration encompasses different plausible combinations of parameter training and freezing, coupled with weight tying, aimed at identifying the optimal trade-off between performance and the count of trainable parameters.
arXiv Detail & Related papers (2023-11-16T05:29:39Z)
IncreLoRA: Incremental Parameter Allocation Method for Parameter-Efficient Fine-tuning [15.964205804768163]
IncreLoRA is an incremental parameter allocation method that adaptively adds trainable parameters during training. We conduct extensive experiments on GLUE to demonstrate the effectiveness of IncreLoRA.
arXiv Detail & Related papers (2023-08-23T10:08:10Z)

This list is automatically generated from the titles and abstracts of the papers in this site.