Related papers: HRP: High-Rank Preheating for Superior LoRA Initialization

HRP: High-Rank Preheating for Superior LoRA Initialization

URL: http://arxiv.org/abs/2502.07739v2
Date: Mon, 17 Feb 2025 13:39:51 GMT
Title: HRP: High-Rank Preheating for Superior LoRA Initialization
Authors: Yuzhu Chen, Yingjie Wang, Shi Fu, Li Shen, Yongcheng Jing, Xinmei Tian, Dacheng Tao,
Abstract summary: High-Rank Preheating (HRP) is proposed to fine-tune Low-Rank Adaptation (LoRA)<n>HRP significantly enhances LoRA's generalization effectiveness across various models and tasks.
Score: 58.3319586613105
License: http://creativecommons.org/licenses/by/4.0/
Abstract: This paper studies the crucial impact of initialization on the convergence properties of Low-Rank Adaptation (LoRA). We theoretically demonstrate that random initialization, a widely used schema, will likely lead LoRA to random low-rank results, rather than the best low-rank result. While this issue can be mitigated by adjusting initialization towards a well-informed direction, it relies on prior knowledge of the target, which is typically unknown in real-world scenarios. To approximate this well-informed initial direction, we propose High-Rank Preheating (HRP), which fine-tunes high-rank LoRA for a few steps and uses the singular value decomposition of the preheated result as a superior initialization. HRP initialization is theory-supported to combine the convergence strengths of high-rank LoRA and the generalization strengths of low-rank LoRA. Extensive experiments demonstrate that HRP significantly enhances LoRA's effectiveness across various models and tasks, achieving performance comparable to full-parameter fine-tuning and outperforming other initialization strategies.

Related papers

LoRA-Squeeze: Simple and Effective Post-Tuning and In-Tuning Compression of LoRA Modules [10.00294036303927]
We introduce LoRA-Squeeze, a simple and efficient methodology that aims to improve standard LoRA learning.<n>Our approach posits that it is better to first learn an expressive, higher-rank solution and then compress it, rather than learning a constrained, low-rank solution directly.
arXiv Detail & Related papers (2026-02-11T16:19:58Z)
LoRA-DA: Data-Aware Initialization for Low-Rank Adaptation via Asymptotic Analysis [33.708800231646606]
We establish a theoretical framework for data-aware LoRA.<n>We develop an efficient algorithm, LoRA-DA, which estimates the terms in the optimization problem from a small set of target domain samples.<n>Additional studies show faster, more stable convergence, robustness across ranks, and only a small overhead for LoRA-DA.
arXiv Detail & Related papers (2025-10-28T15:55:36Z)
Beyond Low-Rank Tuning: Model Prior-Guided Rank Allocation for Effective Transfer in Low-Data and Large-Gap Regimes [9.4848188271008]
Low-Rank Adaptation (LoRA) has proven effective in reducing computational costs while maintaining performance comparable to fully fine-tuned foundation models.<n>Current adaptive LoRA methods attempt to overcome this limitation by dynamically expanding or selectively allocating ranks.<n>We introduce Stable Rank-Guided Low-Rank Adaptation (SR-LoRA), a novel framework that utilizes the stable rank of pre-trained weight matrices as a natural prior for layer-wise rank allocation.
arXiv Detail & Related papers (2025-06-30T23:54:23Z)
SRLoRA: Subspace Recomposition in Low-Rank Adaptation via Importance-Based Fusion and Reinitialization [2.594346658179846]
Low-Rank Adaptation (LoRA) constrains updates to a fixed low-rank subspace.<n>We introduce Subspace Recomposition in Low-Rank Adaptation (SRLoRA) via importance-based fusion and reinitialization.<n> SRLoRA consistently achieves faster convergence and improved accuracy over standard LoRA.
arXiv Detail & Related papers (2025-05-18T14:12:40Z)
ElaLoRA: Elastic & Learnable Low-Rank Adaptation for Efficient Model Fine-Tuning [6.657174308208715]
ElaLoRA is an adaptive low-rank adaptation framework that dynamically prunes and expands ranks based on gradient-derived importance scores. ElaLoRA consistently outperforms existing PEFT methods across different parameter budgets. By introducing a principled and adaptive rank allocation mechanism, ElaLoRA offers a scalable and efficient fine-tuning solution.
arXiv Detail & Related papers (2025-03-31T21:58:25Z)
BeamLoRA: Beam-Constraint Low-Rank Adaptation [51.52097743781401]
Low-Rank Adaptation (LoRA) has been widely adopted as one of the most effective parameter-efficient fine-tuning methods. We propose BeamLoRA, which conceptualizes each LoRA module as a beam where each rank naturally corresponds to a potential sub-solution.
arXiv Detail & Related papers (2025-02-19T10:33:22Z)
S-LoRA: Scalable Low-Rank Adaptation for Class Incremental Learning [73.93639228235622]
Continual Learning with foundation models has emerged as a promising approach to harnessing the power of pre-trained models for sequential tasks.<n>We propose a Scalable Low-Rank Adaptation (S-LoRA) method for CL (in particular class incremental learning), which incrementally decouples the learning of the direction and magnitude of LoRA parameters.<n>Our theoretical and empirical analysis demonstrates that S-LoRA tends to follow a low-loss trajectory that converges to an overlapped low-loss region, resulting in an excellent stability-plasticity trade-off in CL.
arXiv Detail & Related papers (2025-01-22T20:00:41Z)
Randomized Asymmetric Chain of LoRA: The First Meaningful Theoretical Framework for Low-Rank Adaptation [58.288682735160585]
Low-Rank Adaptation (LoRA) is a popular technique for finetuning models. LoRA often under performs when compared to full- parameter fine-tuning. We present a framework that rigorously analyzes the adaptation rates of LoRA methods.
arXiv Detail & Related papers (2024-10-10T18:51:53Z)
Task-Specific Directions: Definition, Exploration, and Utilization in Parameter Efficient Fine-Tuning [65.31677646659895]
Large language models demonstrate impressive performance on downstream tasks, yet they require extensive resource consumption when fully fine-tuning all parameters.<n>We propose a framework to clearly define task-specific directions (TSDs) and explore their properties and practical utilization challenges.<n>We then introduce a novel approach, LoRA-Dash, which aims to maximize the impact of TSDs during the fine-tuning process.
arXiv Detail & Related papers (2024-09-02T08:10:51Z)
LoRA-Pro: Are Low-Rank Adapters Properly Optimized? [121.0693322732454]
Low-rank adaptation, also known as LoRA, has emerged as a prominent method for parameter-efficient fine-tuning of foundation models. Despite its computational efficiency, LoRA still yields inferior performance compared to full fine-tuning. We introduce LoRA-Pro, a method that enhances LoRA's performance by strategically adjusting the gradients of low-rank matrices.
arXiv Detail & Related papers (2024-07-25T17:57:12Z)
Enhancing Parameter Efficiency and Generalization in Large-Scale Models: A Regularized and Masked Low-Rank Adaptation Approach [10.980433187379868]
Low-Rank Adaptation (LoRA) has been developed to reduce resource consumption while maintaining satisfactory fine-tuning results. This paper investigates the intrinsic dimension of the matrix updates approximated by the LoRA method and reveals the performance benefits of increasing this intrinsic dimension.
arXiv Detail & Related papers (2024-07-16T15:26:31Z)
ALoRA: Allocating Low-Rank Adaptation for Fine-tuning Large Language Models [8.251547772610301]
We extend the methodology of low-rank adaptation (LoRA) to an innovative approach we call allocating low-rank adaptation (ALoRA) First, we propose a novel method, AB-LoRA, that can effectively estimate the importance score of each LoRA rank. Second, guided by AB-LoRA, we gradually prune abundant and negatively impacting LoRA ranks and allocate the pruned LoRA budgets to important Transformer modules needing higher ranks.
arXiv Detail & Related papers (2024-03-24T15:09:55Z)
PRoLoRA: Partial Rotation Empowers More Parameter-Efficient LoRA [45.38491644250814]
Partially Rotation-enhanced Low-Rank Adaptation (PRoLoRA) is an intra-layer sharing mechanism. PRoLoRA retains its advantages, and effectively circumvents the drawbacks of peer parameter-sharing methods. Empirical experiments demonstrate the remarkably higher parameter efficiency of PRoLoRA.
arXiv Detail & Related papers (2024-02-24T13:39:05Z)
PRILoRA: Pruned and Rank-Increasing Low-Rank Adaptation [65.268245109828]
We introduce PRILoRA, which linearly allocates a different rank for each layer, in an increasing manner, and performs pruning throughout the training process. We validate the effectiveness of PRILoRA through extensive experiments on eight GLUE benchmarks, setting a new state of the art.
arXiv Detail & Related papers (2024-01-20T20:25:17Z)
Sparse Low-rank Adaptation of Pre-trained Language Models [79.74094517030035]
We introduce sparse low-rank adaptation (SoRA) that enables dynamic adjustments to the intrinsic rank during the adaptation process. Our approach strengthens the representation power of LoRA by initializing it with a higher rank, while efficiently taming a temporarily increased number of parameters. Our experimental results demonstrate that SoRA can outperform other baselines even with 70% retained parameters and 70% training time.
arXiv Detail & Related papers (2023-11-20T11:56:25Z)

This list is automatically generated from the titles and abstracts of the papers in this site.