Related papers: LoRA-drop: Efficient LoRA Parameter Pruning based on Output Evaluation

LoRA-drop: Efficient LoRA Parameter Pruning based on Output Evaluation

URL: http://arxiv.org/abs/2402.07721v2
Date: Tue, 18 Jun 2024 15:13:12 GMT
Title: LoRA-drop: Efficient LoRA Parameter Pruning based on Output Evaluation
Authors: Hongyun Zhou, Xiangyu Lu, Wang Xu, Conghui Zhu, Tiejun Zhao, Muyun Yang,
Abstract summary: Low-Rank Adaptation (LoRA) is currently the most commonly used. efficient fine-tuning (PEFT) method. It introduces auxiliary parameters for each layer to fine-tune the pre-trained model under limited computing resources. However, it still faces resource consumption challenges when scaling up to larger models.
Score: 27.123271324468657
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Low-Rank Adaptation (LoRA) is currently the most commonly used Parameter-efficient fine-tuning (PEFT) method, it introduces auxiliary parameters for each layer to fine-tune the pre-trained model under limited computing resources. However, it still faces resource consumption challenges during training when scaling up to larger models. Most previous studies have tackled this issue by using pruning techniques, which involve removing LoRA parameters deemed unimportant. Nonetheless, these efforts only analyze LoRA parameter features to evaluate their importance, such as parameter count, size, and gradient. In fact, the output of LoRA (product of LoRA parameter and hidden state), directly impacts the final results. Preliminary experiments indicate that a fraction of LoRA elements possesses significantly high output values, substantially influencing the layer output. Motivated by the observation, we propose LoRA-drop. Concretely, LoRA-drop evaluates the importance of LoRA based on the LoRA output. Then we retain LoRA for important layers and the other layers share the same LoRA. We conduct abundant experiments with models of different scales on NLU and NLG tasks. Results demonstrate that LoRA-drop can achieve performance comparable to full fine-tuning and LoRA, while retaining 50\% of the LoRA parameters on average.

Related papers

Not All LoRA Parameters Are Essential: Insights on Inference Necessity [36.65493658174926]
We investigate the contribution of each LoRA layer to the model's ability to predict the ground truth. We propose a simple yet effective method to enhance the performance of large language models fine-tuned with LoRA.
arXiv Detail & Related papers (2025-03-30T08:33:04Z)
BeamLoRA: Beam-Constraint Low-Rank Adaptation [51.52097743781401]
Low-Rank Adaptation (LoRA) has been widely adopted as one of the most effective parameter-efficient fine-tuning methods. We propose BeamLoRA, which conceptualizes each LoRA module as a beam where each rank naturally corresponds to a potential sub-solution.
arXiv Detail & Related papers (2025-02-19T10:33:22Z)
RepLoRA: Reparameterizing Low-Rank Adaptation via the Perspective of Mixture of Experts [37.43961020113692]
Low-rank adaptation (LoRA) has emerged as a powerful method for fine-tuning large-scale foundation models. This paper presents a theoretical analysis of LoRA by examining its connection to the Mixture of Experts models.
arXiv Detail & Related papers (2025-02-05T10:03:09Z)
LoRA Done RITE: Robust Invariant Transformation Equilibration for LoRA Optimization [78.93425154518705]
Low-rank adaption (LoRA) is a widely used parameter-efficient finetuning method for LLM that reduces memory requirements. This paper introduces LoRA-RITE, a novel adaptive matrix preconditioning method for LoRA optimization.
arXiv Detail & Related papers (2024-10-27T22:57:12Z)
Task-Specific Directions: Definition, Exploration, and Utilization in Parameter Efficient Fine-Tuning [65.31677646659895]
Large language models demonstrate impressive performance on downstream tasks, yet they require extensive resource consumption when fully fine-tuning all parameters. We propose a framework to clearly define task-specific directions (TSDs) and explore their properties and practical utilization challenges. We then introduce a novel approach, LoRA-Dash, which aims to maximize the impact of TSDs during the fine-tuning process.
arXiv Detail & Related papers (2024-09-02T08:10:51Z)
LoRA-Pro: Are Low-Rank Adapters Properly Optimized? [121.0693322732454]
Low-rank adaptation, also known as LoRA, has emerged as a prominent method for parameter-efficient fine-tuning of foundation models. Despite its computational efficiency, LoRA still yields inferior performance compared to full fine-tuning. We introduce LoRA-Pro, a method that enhances LoRA's performance by strategically adjusting the gradients of low-rank matrices.
arXiv Detail & Related papers (2024-07-25T17:57:12Z)
RoLoRA: Fine-tuning Rotated Outlier-free LLMs for Effective Weight-Activation Quantization [38.23587031169402]
We propose RoLoRA, the first LoRA-based scheme for effective weight-activation quantization. We evaluate RoLoRA across LLaMA2-7B/13B, LLaMA3-8B models, achieving up to 29.5% absolute accuracy gain of 4-bit weight-activation quantized LLaMA2- 13B.
arXiv Detail & Related papers (2024-07-10T20:52:18Z)
A Survey on LoRA of Large Language Models [19.85250609150331]
Low-Rank Adaptation(LoRA), which updates the dense neural network layers with pluggable low-rank matrices, is one of the best performed parameter efficient fine-tuning paradigms. This survey categorizes and reviews the progress from the perspectives of (1) downstream adaptation improving variants that improve LoRA's performance on downstream tasks; (2) cross-task generalization methods that mix multiple LoRA plugins to achieve cross-task generalization; (3) efficiency-improving methods that boost the computation-efficiency of LoRA; (4) data privacy-preserving methods that use LoRA in federated learning; (5) application.
arXiv Detail & Related papers (2024-07-08T12:32:10Z)
Compress then Serve: Serving Thousands of LoRA Adapters with Little Overhead [41.31302904190149]
Fine-tuning large language models with low-rank adaptations (LoRAs) has become common practice. We propose a method for joint compression of LoRAs into a shared basis paired with LoRA-specific scaling matrices. Experiments with up to 500 LoRAs demonstrate that compressed LoRAs preserve performance while offering major throughput gains.
arXiv Detail & Related papers (2024-06-17T15:21:35Z)
LoRA Learns Less and Forgets Less [25.09261710396838]
Low-Rank Adaptation (LoRA) is a widely-used parameter-efficient finetuning method for large language models. We compare the performance of LoRA and full finetuning on two target domains, programming and mathematics.
arXiv Detail & Related papers (2024-05-15T19:27:45Z)
Mixture of LoRA Experts [87.50120181861362]
This paper introduces the Mixture of LoRA Experts (MoLE) approach, which harnesses hierarchical control and unfettered branch selection. The MoLE approach achieves superior LoRA fusion performance in comparison to direct arithmetic merging.
arXiv Detail & Related papers (2024-04-21T11:59:53Z)
ResLoRA: Identity Residual Mapping in Low-Rank Adaption [96.59370314485074]
We propose ResLoRA, an improved framework of low-rank adaptation (LoRA) Our method can achieve better results in fewer training steps without any extra trainable parameters or inference cost compared to LoRA. The experiments on NLG, NLU, and text-to-image tasks demonstrate the effectiveness of our method.
arXiv Detail & Related papers (2024-02-28T04:33:20Z)
PeriodicLoRA: Breaking the Low-Rank Bottleneck in LoRA Optimization [39.30090456724925]
Supervised fine-tuning is the most common method to adapt large language models (LLMs) to downstream tasks. Full fine-tuning requires massive computational resources. LoRA is one of the most widely used methods, which assumes that the optimization process is essentially low-dimensional.
arXiv Detail & Related papers (2024-02-25T16:43:41Z)
LoRA-Flow: Dynamic LoRA Fusion for Large Language Models in Generative Tasks [72.88244322513039]
LoRA employs lightweight modules to customize large language models (LLMs) for each downstream task or domain. We propose LoRA-Flow, which utilizes dynamic weights to adjust the impact of different LoRAs. Experiments across six generative tasks demonstrate that our method consistently outperforms baselines with task-level fusion weights.
arXiv Detail & Related papers (2024-02-18T04:41:25Z)
DoRA: Weight-Decomposed Low-Rank Adaptation [57.68678247436207]
We introduce a novel weight decomposition analysis to investigate the inherent differences between FT and LoRA. Aiming to resemble the learning capacity of FT from the findings, we propose Weight-Decomposed Low-Rank Adaptation (DoRA) DoRA decomposes the pre-trained weight into two components, magnitude and direction, for fine-tuning.
arXiv Detail & Related papers (2024-02-14T17:59:34Z)

This list is automatically generated from the titles and abstracts of the papers in this site.