LoFT: Low-Rank Adaptation That Behaves Like Full Fine-Tuning
- URL: http://arxiv.org/abs/2505.21289v1
- Date: Tue, 27 May 2025 14:54:24 GMT
- Title: LoFT: Low-Rank Adaptation That Behaves Like Full Fine-Tuning
- Authors: Nurbek Tastan, Stefanos Laskaridis, Martin Takac, Karthik Nandakumar, Samuel Horvath,
- Abstract summary: We introduce LoFT, a novel low-rank adaptation method that behaves like full fine-tuning.<n>LoFT aligns the model's internal dynamics with those of updating all model weights.<n> Empirically, this approach substantially narrows the performance gap between adapter-based tuning and full fine-tuning.
- Score: 5.980897761790243
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Large pre-trained models are commonly adapted to downstream tasks using parameter-efficient fine-tuning methods such as Low-Rank Adaptation (LoRA), which injects small trainable low-rank matrices instead of updating all weights. While LoRA dramatically reduces trainable parameters with little overhead, it can still underperform full fine-tuning in accuracy and often converges more slowly. We introduce LoFT, a novel low-rank adaptation method that behaves like full fine-tuning by aligning the optimizer's internal dynamics with those of updating all model weights. LoFT not only learns weight updates in a low-rank subspace (like LoRA) but also properly projects the optimizer's first and second moments (Adam's momentum and variance) into the same subspace, mirroring full-model updates. By aligning the low-rank update itself with the full update, LoFT eliminates the need for tuning extra hyperparameters, e.g., LoRA scaling factor $\alpha$. Empirically, this approach substantially narrows the performance gap between adapter-based tuning and full fine-tuning and consistently outperforms standard LoRA-style methods, all without increasing inference cost.
Related papers
- Localized LoRA: A Structured Low-Rank Approximation for Efficient Fine-Tuning [2.0652244993845086]
Localized LoRA is a framework that models weight updates as a composition of low-rank approximations applied to structured blocks of the weight matrix.<n>We show that our method consistently achieves lower approximation error under matched parameter budgets.
arXiv Detail & Related papers (2025-05-30T21:13:23Z) - Initialization using Update Approximation is a Silver Bullet for Extremely Efficient Low-Rank Fine-Tuning [13.823795660384262]
We propose a method, LoRA Silver Bullet or LoRA-SB, that approximates full fine-tuning within low-rank subspaces.<n>Our findings demonstrate that it is possible to simulate full fine-tuning in low-rank subspaces without sacrificing performance.
arXiv Detail & Related papers (2024-11-29T09:10:30Z) - LoRTA: Low Rank Tensor Adaptation of Large Language Models [70.32218116940393]
Low Rank Adaptation (LoRA) is a popular Efficient Fine Tuning (PEFT) method.<n>We propose a higher-order Candecomp/Parafac (CP) decomposition, enabling a more compact and flexible representation.<n>Our method can achieve a reduction in the number of parameters while maintaining comparable performance.
arXiv Detail & Related papers (2024-10-05T06:59:50Z) - NEAT: Nonlinear Parameter-efficient Adaptation of Pre-trained Models [26.808251361020066]
Fine-tuning pre-trained models often yields state-of-the-art performance but is computationally expensive when updating all parameters.<n>We propose NEAT, a nonlinear PEFT approach that employs a lightweight neural network to learn a nonlinear transformation of the pre-trained weights.<n>Our theoretical analysis shows that NEAT achieves greater efficiency than LoRA while maintaining equivalent expressivity.
arXiv Detail & Related papers (2024-10-02T17:29:23Z) - Flat-LoRA: Low-Rank Adaptation over a Flat Loss Landscape [52.98187034726091]
We introduce Flat-LoRA, which aims to identify a low-rank adaptation situated in a flat region of the full parameter space.<n>We show that Flat-LoRA improves both in-domain and out-of-domain generalization.
arXiv Detail & Related papers (2024-09-22T11:24:10Z) - LoRA$^2$ : Multi-Scale Low-Rank Approximations for Fine-Tuning Large Language Models [3.7049613588433497]
Low-Rank Adaptation (LoRA) significantly reduces the number of trainable parameters for fine-tuning.
We extend the LoRA to multiple scales, dubbed as LoRA$2$.
arXiv Detail & Related papers (2024-08-13T12:31:30Z) - LoRA-Pro: Are Low-Rank Adapters Properly Optimized? [121.0693322732454]
Low-rank adaptation, also known as LoRA, has emerged as a prominent method for parameter-efficient fine-tuning of foundation models.<n>Despite its computational efficiency, LoRA still yields inferior performance compared to full fine-tuning.<n>We introduce LoRA-Pro, a method that enhances LoRA's performance by strategically adjusting the gradients of low-rank matrices.
arXiv Detail & Related papers (2024-07-25T17:57:12Z) - SwitchLoRA: Switched Low-Rank Adaptation Can Learn Full-Rank Information [3.6859322366469933]
Methods like ReLoRA and GaLore have attempted to address this challenge by updating the low-rank subspace.<n>In this paper, we introduce SwitchLoRA, a parameter-efficient training technique that frequently and smoothly replaces the trainable parameters of LoRA with alternative parameters.
arXiv Detail & Related papers (2024-06-03T05:40:34Z) - DoRA: Weight-Decomposed Low-Rank Adaptation [57.68678247436207]
We introduce a novel weight decomposition analysis to investigate the inherent differences between FT and LoRA.
Aiming to resemble the learning capacity of FT from the findings, we propose Weight-Decomposed Low-Rank Adaptation (DoRA)
DoRA decomposes the pre-trained weight into two components, magnitude and direction, for fine-tuning.
arXiv Detail & Related papers (2024-02-14T17:59:34Z) - Flora: Low-Rank Adapters Are Secretly Gradient Compressors [30.224822087562163]
Low-rank adaptation (LoRA) is proposed to reduce the optimization states by training fewer parameters.
LoRA restricts overall weight update matrices to be low-rank, limiting the model performance.
We propose Flora, which is able to achieve high-rank updates by resampling the projection matrices.
arXiv Detail & Related papers (2024-02-05T18:50:39Z) - AdaLoRA: Adaptive Budget Allocation for Parameter-Efficient Fine-Tuning [143.23123791557245]
Fine-tuning large pre-trained language models on downstream tasks has become an important paradigm in NLP.
We propose AdaLoRA, which adaptively allocates the parameter budget among weight matrices according to their importance score.
We conduct extensive experiments with several pre-trained models on natural language processing, question answering, and natural language generation to validate the effectiveness of AdaLoRA.
arXiv Detail & Related papers (2023-03-18T22:36:25Z) - LoRA: Low-Rank Adaptation of Large Language Models [71.75808607987281]
Low-Rank Adaptation, or LoRA, freezes the pre-trained model weights and injects trainable rank decomposition into each layer of the Transformer architecture.
For GPT-3, LoRA can reduce the number of trainable parameters by 10,000 times and the computation hardware requirement by 3 times compared to full fine-tuning.
arXiv Detail & Related papers (2021-06-17T17:37:18Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.