LoRA-DA: Data-Aware Initialization for Low-Rank Adaptation via Asymptotic Analysis
- URL: http://arxiv.org/abs/2510.24561v1
- Date: Tue, 28 Oct 2025 15:55:36 GMT
- Title: LoRA-DA: Data-Aware Initialization for Low-Rank Adaptation via Asymptotic Analysis
- Authors: Qingyue Zhang, Chang Chu, Tianren Peng, Qi Li, Xiangyang Luo, Zhihao Jiang, Shao-Lun Huang,
- Abstract summary: We establish a theoretical framework for data-aware LoRA.<n>We develop an efficient algorithm, LoRA-DA, which estimates the terms in the optimization problem from a small set of target domain samples.<n>Additional studies show faster, more stable convergence, robustness across ranks, and only a small overhead for LoRA-DA.
- Score: 33.708800231646606
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: With the widespread adoption of LLMs, LoRA has become a dominant method for PEFT, and its initialization methods have attracted increasing attention. However, existing methods have notable limitations: many methods do not incorporate target-domain data, while gradient-based methods exploit data only at a shallow level by relying on one-step gradient decomposition, which remains unsatisfactory due to the weak empirical performance of the one-step fine-tuning model that serves as their basis, as well as the fact that these methods either lack a rigorous theoretical foundation or depend heavily on restrictive isotropic assumptions. In this paper, we establish a theoretical framework for data-aware LoRA initialization based on asymptotic analysis. Starting from a general optimization objective that minimizes the expectation of the parameter discrepancy between the fine-tuned and target models, we derive an optimization problem with two components: a bias term, which is related to the parameter distance between the fine-tuned and target models, and is approximated using a Fisher-gradient formulation to preserve anisotropy; and a variance term, which accounts for the uncertainty introduced by sampling stochasticity through the Fisher information. By solving this problem, we obtain an optimal initialization strategy for LoRA. Building on this theoretical framework, we develop an efficient algorithm, LoRA-DA, which estimates the terms in the optimization problem from a small set of target domain samples and obtains the optimal LoRA initialization. Empirical results across multiple benchmarks demonstrate that LoRA-DA consistently improves final accuracy over existing initialization methods. Additional studies show faster, more stable convergence, robustness across ranks, and only a small initialization overhead for LoRA-DA. The source code will be released upon publication.
Related papers
- ODELoRA: Training Low-Rank Adaptation by Solving Ordinary Differential Equations [54.886931928255564]
Low-rank adaptation (LoRA) has emerged as a widely adopted parameter-efficient fine-tuning method in deep transfer learning.<n>We propose a novel continuous-time optimization dynamic for LoRA factor matrices in the form of an ordinary differential equation (ODE)<n>We show that ODELoRA achieves stable feature learning, a property that is crucial for training deep neural networks at different scales of problem dimensionality.
arXiv Detail & Related papers (2026-02-07T10:19:36Z) - Mitigating Forgetting in Low Rank Adaptation [17.859306837144732]
We present LaLoRA, a weight-space regularization technique that applies a Laplace approximation to Low-Rank Adaptation.<n>Our approach estimates the model's confidence in each parameter and constrains updates in high-curvature directions.<n>We evaluate LaLoRA by fine-tuning a Llama model for mathematical reasoning and demonstrate an improved learning-forgetting trade-off.
arXiv Detail & Related papers (2025-12-19T15:54:36Z) - ResAD: Normalized Residual Trajectory Modeling for End-to-End Autonomous Driving [64.42138266293202]
ResAD is a Normalized Residual Trajectory Modeling framework.<n>It reframes the learning task to predict the residual deviation from an inertial reference.<n>On the NAVSIM benchmark, ResAD achieves a state-of-the-art PDMS of 88.6 using a vanilla diffusion policy.
arXiv Detail & Related papers (2025-10-09T17:59:36Z) - LoRA meets Riemannion: Muon Optimizer for Parametrization-independent Low-Rank Adapters [43.04933165005961]
We present a novel framework for Low-Rank Adaptation (LoRA)<n>LoRA geometrically treats low-rank adapters by optimizing them directly on the fixed-rank manifold.<n>Our framework integrates three key components to achieve this.
arXiv Detail & Related papers (2025-07-16T11:17:12Z) - Automatic Rank Determination for Low-Rank Adaptation via Submodular Function Maximization [56.78271181959529]
SubLoRA is a rank determination method for Low-Rank Adaptation (LoRA) based on submodular function.<n>Our method combines solid theoretical foundations, second-order accuracy, and practical computational efficiency.
arXiv Detail & Related papers (2025-07-02T15:56:40Z) - Understanding the Learning Dynamics of LoRA: A Gradient Flow Perspective on Low-Rank Adaptation in Matrix Factorization [7.940066909711888]
We analyze the learning dynamics of Low-Rank Adaptation (LoRA) for matrix factorization under gradient flow (GF)<n>Our analysis shows that the final error is affected by the misalignment between the singular spaces of the pre-trained model and the target matrix.
arXiv Detail & Related papers (2025-03-10T06:57:10Z) - Refining Adaptive Zeroth-Order Optimization at Ease [24.327161891577727]
This paper introduces Refined Adaptive Zeroth-Order Optimization (R-AdaZO)<n>We first show the untapped variance reduction effect of first moment estimate on ZO gradient estimation.<n>We then refine the second moment estimate based on these variance-reduced gradient estimates to better capture the geometry of the optimization landscape.
arXiv Detail & Related papers (2025-02-03T03:10:44Z) - Randomized Asymmetric Chain of LoRA: The First Meaningful Theoretical Framework for Low-Rank Adaptation [58.288682735160585]
Low-Rank Adaptation (LoRA) is a popular technique for finetuning models.
LoRA often under performs when compared to full- parameter fine-tuning.
We present a framework that rigorously analyzes the adaptation rates of LoRA methods.
arXiv Detail & Related papers (2024-10-10T18:51:53Z) - Generalization Bounds of Surrogate Policies for Combinatorial Optimization Problems [53.03951222945921]
We analyze smoothed (perturbed) policies, adding controlled random perturbations to the direction used by the linear oracle.<n>Our main contribution is a generalization bound that decomposes the excess risk into perturbation bias, statistical estimation error, and optimization error.<n>We illustrate the scope of the results on applications such as vehicle scheduling, highlighting how smoothing enables both tractable training and controlled generalization.
arXiv Detail & Related papers (2024-07-24T12:00:30Z) - Provably Mitigating Overoptimization in RLHF: Your SFT Loss is Implicitly an Adversarial Regularizer [52.09480867526656]
We identify the source of misalignment as a form of distributional shift and uncertainty in learning human preferences.<n>To mitigate overoptimization, we first propose a theoretical algorithm that chooses the best policy for an adversarially chosen reward model.<n>Using the equivalence between reward models and the corresponding optimal policy, the algorithm features a simple objective that combines a preference optimization loss and a supervised learning loss.
arXiv Detail & Related papers (2024-05-26T05:38:50Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.