Related papers: Stable-LoRA: Stabilizing Feature Learning of Low-Rank Adaptation

Stable-LoRA: Stabilizing Feature Learning of Low-Rank Adaptation

URL: http://arxiv.org/abs/2603.05204v1
Date: Thu, 05 Mar 2026 14:15:51 GMT
Title: Stable-LoRA: Stabilizing Feature Learning of Low-Rank Adaptation
Authors: Yize Wu, Ke Gao, Ling Li, Yanjun Wu,
Abstract summary: Low-Rank Adaptation (LoRA) is a widely adopted parameter-efficient method for fine-tuning Large Langauge Models.<n>Stable-LoRA is a weight-shrinkage optimization strategy that dynamically enhances stability of LoRA feature learning.<n> Experiments show that Stable-LoRA consistently outperforms other baselines across diverse models and tasks.
Score: 14.315981403487266
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Low-Rank Adaptation (LoRA) is a widely adopted parameter-efficient method for fine-tuning Large Langauge Models. It updates the weight matrix as $W=W_0+sBA$, where $W_0$ is the original frozen weight, $s$ is a scaling factor and $A$,$B$ are trainable low-rank matrices. Despite its robust empirical effectiveness, the theoretical foundations of LoRA remain insufficiently understood, particularly with respect to feature learning stability. In this paper, we first establish that, LoRA can, in principle, naturally achieve and sustain stable feature learning (i.e., be self-stabilized) under appropriate hyper-parameters and initializations of $A$ and $B$. However, we also uncover a fundamental limitation that the necessary non-zero initialization of $A$ compromises self-stability, leading to suboptimal performances. To address this challenge, we propose Stable-LoRA, a weight-shrinkage optimization strategy that dynamically enhances stability of LoRA feature learning. By progressively shrinking $A$ during the earliest training steps, Stable-LoRA is both theoretically and empirically validated to effectively eliminate instability of LoRA feature learning while preserving the benefits of the non-zero start. Experiments show that Stable-LoRA consistently outperforms other baselines across diverse models and tasks, with no additional memory usage and only negligible computation overheads. The code is available at https://github.com/Yize-Wu/Stable-LoRA.

Related papers

Taming Momentum: Rethinking Optimizer States Through Low-Rank Approximation [85.89510825889168]
We introduce LoRA-Pre, a novel low-rank system for efficient pre-training.<n>LoRA-Pre decomposing the momentum matrix into a compact low-rank subspace within the online linear learner.<n>We empirically validate LoRA-Pre's efficacy by pre-training models from the Llama architecture family.
arXiv Detail & Related papers (2026-02-27T18:57:06Z)
Faster Than SVD, Smarter Than SGD: The OPLoRA Alternating Update [50.36542772932594]
Low-Rank Adaptation (LoRA) fine-tunes large models by learning low-rank updates on top of frozen weights.<n>There is still a gap between full training with low-rank projections (SVDLoRA) and LoRA fine-tuning, indicating that LoRA steps can be further improved.
arXiv Detail & Related papers (2025-09-24T10:32:50Z)
Don't Forget the Nonlinearity: Unlocking Activation Functions in Efficient Fine-Tuning [82.16625951603315]
NoRA replaces fixed activations with learnable rational functions and applies structured low-rank updates to numerator and denominator coefficients.<n>On vision transformers trained on CIFAR-10 and CIFAR-100, NoRA matches or exceeds full fine-tuning while updating only 0.4% of parameters.<n>NoRA constrains adaptation to a low-dimensional functional subspace, implicitly regularizing update magnitude and direction.
arXiv Detail & Related papers (2025-09-16T16:47:03Z)
Riemannian Optimization for LoRA on the Stiefel Manifold [11.1808022633589]
Large language models (LLMs) present significant fine-tuning challenges due to their size.<n>We show that geometric constraints are key to unlocking LoRA's full potential for effective fine-tuning.
arXiv Detail & Related papers (2025-08-25T11:15:52Z)
Beyond Zero Initialization: Investigating the Impact of Non-Zero Initialization on LoRA Fine-Tuning Dynamics [23.84827135317107]
Low-rank adaptation (LoRA) is a widely used parameter-efficient fine-tuning method.<n>In standard LoRA layers, one of the matrices, $A$ or $B$, is to zero, ensuring that fine-tuning starts from the pretrained model.
arXiv Detail & Related papers (2025-05-29T07:33:03Z)
LoRA Done RITE: Robust Invariant Transformation Equilibration for LoRA Optimization [78.93425154518705]
Low-rank adaption (LoRA) is a widely used parameter-efficient finetuning method for LLM that reduces memory requirements.<n>This paper introduces LoRA-RITE, a novel adaptive matrix preconditioning method for LoRA optimization.
arXiv Detail & Related papers (2024-10-27T22:57:12Z)
CoRA: Optimizing Low-Rank Adaptation with Common Subspace of Large Language Models [7.108651381160281]
Low-Rank Adaptation (LoRA) strategy balances efficiency and performance in fine-tuning large models. We propose textbfCoRA: leveraging shared knowledge to optimize LoRA training by substituting its matrix $B$ with a common subspace from large models. Our experiments show that the first approach achieves the same efficacy as the original LoRA fine-tuning while being more efficient than halving parameters.
arXiv Detail & Related papers (2024-08-31T12:48:27Z)
SBoRA: Low-Rank Adaptation with Regional Weight Updates [19.15481369459963]
This paper introduces Standard Basis LoRA (SBoRA), a novel parameter-efficient fine-tuning approach for Large Language Models. SBoRA reduces the number of trainable parameters by half or doubles the rank with the similar number of trainable parameters as LoRA. Our results demonstrate the superiority of SBoRA-FA over LoRA in various fine-tuning tasks, including commonsense reasoning and arithmetic reasoning.
arXiv Detail & Related papers (2024-07-07T15:37:13Z)
ResLoRA: Identity Residual Mapping in Low-Rank Adaption [96.59370314485074]
We propose ResLoRA, an improved framework of low-rank adaptation (LoRA) Our method can achieve better results in fewer training steps without any extra trainable parameters or inference cost compared to LoRA. The experiments on NLG, NLU, and text-to-image tasks demonstrate the effectiveness of our method.
arXiv Detail & Related papers (2024-02-28T04:33:20Z)
DoRA: Weight-Decomposed Low-Rank Adaptation [57.68678247436207]
We introduce a novel weight decomposition analysis to investigate the inherent differences between FT and LoRA. Aiming to resemble the learning capacity of FT from the findings, we propose Weight-Decomposed Low-Rank Adaptation (DoRA) DoRA decomposes the pre-trained weight into two components, magnitude and direction, for fine-tuning.
arXiv Detail & Related papers (2024-02-14T17:59:34Z)
Chain of LoRA: Efficient Fine-tuning of Language Models via Residual Learning [31.036465632204663]
We introduce Chain of LoRA, an iterative optimization framework inspired by the Frank-Wolfe algorithm. We demonstrate that COLA can consistently outperform LoRA without additional computational or memory costs.
arXiv Detail & Related papers (2024-01-08T14:26:49Z)

This list is automatically generated from the titles and abstracts of the papers in this site.