LoRA vs Full Fine-tuning: An Illusion of Equivalence
- URL: http://arxiv.org/abs/2410.21228v2
- Date: Mon, 23 Jun 2025 04:59:01 GMT
- Title: LoRA vs Full Fine-tuning: An Illusion of Equivalence
- Authors: Reece Shuttleworth, Jacob Andreas, Antonio Torralba, Pratyusha Sharma,
- Abstract summary: We study how Low-Rank Adaptation (LoRA) and full-finetuning change pre-trained models.<n>We find that LoRA and full fine-tuning yield weight matrices whose singular value decompositions exhibit very different structure.<n>We extend the finding that LoRA forgets less than full fine-tuning and find its forgetting is vastly localized to the intruder dimension.
- Score: 76.11938177294178
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Fine-tuning is a crucial paradigm for adapting pre-trained large language models to downstream tasks. Recently, methods like Low-Rank Adaptation (LoRA) have been shown to effectively fine-tune LLMs with an extreme reduction in trainable parameters. But, \emph{are their learned solutions really equivalent?} We study how LoRA and full-finetuning change pre-trained models by analyzing the model's weight matrices through the lens of their spectral properties. We find that LoRA and full fine-tuning yield weight matrices whose singular value decompositions exhibit very different structure: weight matrices trained with LoRA have new, high-ranking singular vectors, which we call \emph{intruder dimensions}, while those trained with full fine-tuning do not. Further, we extend the finding that LoRA forgets less than full fine-tuning and find its forgetting is vastly localized to the intruder dimension -- by causally intervening on the intruder dimensions by changing their associated singular values post-fine-tuning, we show that they cause forgetting. Moreover, scaling them down significantly improves modeling of the pre-training distribution with a minimal drop in downstream task performance. Given this, we should expect accumulating intruder dimensions to be harmful and lead to more forgetting. This will be amplified during continual learning because of sequentially fine-tuning, and we show that LoRA models do accumulate intruder dimensions here tend to perform worse in this setting, emphasizing the practicality of our findings.
Related papers
- LoRA Is Slower Than You Think [0.0]
Low-Rank Adaptation (LoRA) is one of the most widely used techniques for fine-tuning large language models (LLMs)<n>By introducing a small number of trainable low-rank weight matrices, LoRA substantially reduces the number of parameters that need to be updated.<n>We observed that LoRA does not consistently provide speed improvements across all model architectures and training setups.
arXiv Detail & Related papers (2025-07-06T08:36:43Z) - How Much Knowledge Can You Pack into a LoRA Adapter without Harming LLM? [55.33467849079774]
Low-rank adaptation (LoRA) is a popular and efficient training technique for updating or domain-specific adaptation of Large Language Models.<n>We investigate how new facts can be incorporated into the LLM using LoRA without compromising the previously learned knowledge.
arXiv Detail & Related papers (2025-02-20T12:31:03Z) - A Stronger Mixture of Low-Rank Experts for Fine-Tuning Foundation Models [22.457766373989365]
Low-Rank Adapters (LoRAs) have been substantially adopted across various fields, including instruction tuning and domain adaptation.<n>To address the limited expressive capacity of LoRA, the Mixture-of-Expert (MoE) has been introduced for incorporating multiple LoRA adapters.<n>We propose a new training strategy for MoE-LoRA, to stabilize and boost its feature learning procedure by multi-space projections.
arXiv Detail & Related papers (2025-02-20T05:58:53Z) - Model merging with SVD to tie the Knots [24.071148543940346]
We show that weights of LoRA finetuned models showcase a lower degree of alignment compared to their fully-finetuned counterparts.
We use the SVD to transform the weights of different LoRA models into an aligned space, where existing merging methods can be applied.
We introduce a new benchmark that explicitly evaluates whether merged models are general models.
arXiv Detail & Related papers (2024-10-25T17:59:55Z) - Randomized Asymmetric Chain of LoRA: The First Meaningful Theoretical Framework for Low-Rank Adaptation [58.288682735160585]
Low-Rank Adaptation (LoRA) is a popular technique for finetuning models.
LoRA often under performs when compared to full- parameter fine-tuning.
We present a framework that rigorously analyzes the adaptation rates of LoRA methods.
arXiv Detail & Related papers (2024-10-10T18:51:53Z) - Learning on LoRAs: GL-Equivariant Processing of Low-Rank Weight Spaces for Large Finetuned Models [38.197552424549514]
Low-rank adaptations (LoRAs) have revolutionized the finetuning of large foundation models.
LoRAs present opportunities for applying machine learning techniques that take these low-rank weights themselves as inputs.
In this paper, we investigate the potential of Learning on LoRAs (LoL), a paradigm where LoRA weights serve as input to machine learning models.
arXiv Detail & Related papers (2024-10-05T15:52:47Z) - AutoLoRA: AutoGuidance Meets Low-Rank Adaptation for Diffusion Models [0.9514837871243403]
Low-rank adaptation (LoRA) is a fine-tuning technique that can be applied to conditional generative diffusion models.
We introduce AutoLoRA, a novel guidance technique for diffusion models fine-tuned with the LoRA approach.
arXiv Detail & Related papers (2024-10-04T21:57:11Z) - EMR-Merging: Tuning-Free High-Performance Model Merging [55.03509900949149]
We show that Elect, Mask & Rescale-Merging (EMR-Merging) shows outstanding performance compared to existing merging methods.
EMR-Merging is tuning-free, thus requiring no data availability or any additional training while showing impressive performance.
arXiv Detail & Related papers (2024-05-23T05:25:45Z) - MELoRA: Mini-Ensemble Low-Rank Adapters for Parameter-Efficient Fine-Tuning [71.50432879573614]
Low-rank adaptation (LoRA) is based on the idea that the adaptation process is intrinsically low-dimensional.
We present MELoRA, a mini-ensemble low-rank adapters that uses fewer trainable parameters while maintaining a higher rank.
Our experimental results show that, compared to LoRA, MELoRA achieves better performance with 8 times fewer trainable parameters on natural language understanding tasks and 36 times fewer trainable parameters on instruction following tasks.
arXiv Detail & Related papers (2024-02-27T07:14:12Z) - PRILoRA: Pruned and Rank-Increasing Low-Rank Adaptation [65.268245109828]
We introduce PRILoRA, which linearly allocates a different rank for each layer, in an increasing manner, and performs pruning throughout the training process.
We validate the effectiveness of PRILoRA through extensive experiments on eight GLUE benchmarks, setting a new state of the art.
arXiv Detail & Related papers (2024-01-20T20:25:17Z) - Chain of LoRA: Efficient Fine-tuning of Language Models via Residual
Learning [31.036465632204663]
We introduce Chain of LoRA, an iterative optimization framework inspired by the Frank-Wolfe algorithm.
We demonstrate that COLA can consistently outperform LoRA without additional computational or memory costs.
arXiv Detail & Related papers (2024-01-08T14:26:49Z) - Sparse Low-rank Adaptation of Pre-trained Language Models [79.74094517030035]
We introduce sparse low-rank adaptation (SoRA) that enables dynamic adjustments to the intrinsic rank during the adaptation process.
Our approach strengthens the representation power of LoRA by initializing it with a higher rank, while efficiently taming a temporarily increased number of parameters.
Our experimental results demonstrate that SoRA can outperform other baselines even with 70% retained parameters and 70% training time.
arXiv Detail & Related papers (2023-11-20T11:56:25Z) - MultiLoRA: Democratizing LoRA for Better Multi-Task Learning [20.750808913757396]
LoRA achieves remarkable resource efficiency and comparable performance when adapting LLMs for specific tasks.
LoRA is dominated by a small number of top singular vectors while fine-tuning decomposes into a set of less important unitary transforms.
We propose MultiLoRA for better multi-task adaptation by reducing the dominance of top singular vectors observed in LoRA.
arXiv Detail & Related papers (2023-11-20T02:59:18Z) - The Expressive Power of Low-Rank Adaptation [11.371811534310078]
Low-Rank Adaptation, a parameter-efficient fine-tuning method, has emerged as a prevalent technique for fine-tuning pre-trained models.
This paper takes the first step to bridge the gap by theoretically analyzing the expressive power of LoRA.
For Transformer networks, we show any model can be adapted to a target model of the same size with rank-$(fractextembedding size2)$ LoRA.
arXiv Detail & Related papers (2023-10-26T16:08:33Z) - AdaLoRA: Adaptive Budget Allocation for Parameter-Efficient Fine-Tuning [143.23123791557245]
Fine-tuning large pre-trained language models on downstream tasks has become an important paradigm in NLP.
We propose AdaLoRA, which adaptively allocates the parameter budget among weight matrices according to their importance score.
We conduct extensive experiments with several pre-trained models on natural language processing, question answering, and natural language generation to validate the effectiveness of AdaLoRA.
arXiv Detail & Related papers (2023-03-18T22:36:25Z) - LoRA: Low-Rank Adaptation of Large Language Models [71.75808607987281]
Low-Rank Adaptation, or LoRA, freezes the pre-trained model weights and injects trainable rank decomposition into each layer of the Transformer architecture.
For GPT-3, LoRA can reduce the number of trainable parameters by 10,000 times and the computation hardware requirement by 3 times compared to full fine-tuning.
arXiv Detail & Related papers (2021-06-17T17:37:18Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.