Related papers: BiDoRA: Bi-level Optimization-Based Weight-Decomposed Low-Rank Adaptation

BiDoRA: Bi-level Optimization-Based Weight-Decomposed Low-Rank Adaptation

URL: http://arxiv.org/abs/2410.09758v1
Date: Sun, 13 Oct 2024 07:28:26 GMT
Title: BiDoRA: Bi-level Optimization-Based Weight-Decomposed Low-Rank Adaptation
Authors: Peijia Qin, Ruiyi Zhang, Pengtao Xie,
Abstract summary: DoRA bridges the gap between low-rank adaptation (LoRA) and full fine-tuning (FT) We propose BiDoRA, a bi-level optimization-based PEFT method.
Score: 34.1111413429869
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Parameter-efficient fine-tuning (PEFT) of large language models (LLMs) has gained considerable attention as a flexible and efficient way of adapting LLMs to downstream tasks. Among these methods, weighted decomposed low-rank adaptation (DoRA) has emerged as a promising approach. DoRA bridges the gap between low-rank adaptation (LoRA) and full fine-tuning (FT) by decomposing the weight matrices into magnitude and direction components, thereby maintaining learning behavior similar to FT. Although DoRA shows encouraging performance, it introduces additional parameters compared to LoRA, which potentially increases the risk of overfitting. Moreover, optimizing magnitude and direction simultaneously leads to a coupled gradient updating pattern for both components, limiting its learning capacity. To overcome these limitations, we propose BiDoRA, a bi-level optimization-based PEFT method. In BiDoRA, the direction and magnitude components are optimized on two distinct datasets at different optimization levels, mitigating the risk of overfitting. Additionally, the asynchronous optimization of the two components promotes their decoupling, allowing for more flexible gradient updates suitable for various downstream tasks. Evaluation of BiDoRA on fourteen datasets spanning natural language understanding, natural language generation, and token classification reveals that it significantly outperforms DoRA and other PEFT methods. The superior performance of BiDoRA underscores its effectiveness. The code for BiDoRA is available at https://anonymous.4open.science/r/BiDoRA-5D31.

Related papers

Taming Momentum: Rethinking Optimizer States Through Low-Rank Approximation [85.89510825889168]
We introduce LoRA-Pre, a novel low-rank system for efficient pre-training.<n>LoRA-Pre decomposing the momentum matrix into a compact low-rank subspace within the online linear learner.<n>We empirically validate LoRA-Pre's efficacy by pre-training models from the Llama architecture family.
arXiv Detail & Related papers (2026-02-27T18:57:06Z)
Don't Forget the Nonlinearity: Unlocking Activation Functions in Efficient Fine-Tuning [82.16625951603315]
NoRA replaces fixed activations with learnable rational functions and applies structured low-rank updates to numerator and denominator coefficients.<n>On vision transformers trained on CIFAR-10 and CIFAR-100, NoRA matches or exceeds full fine-tuning while updating only 0.4% of parameters.<n>NoRA constrains adaptation to a low-dimensional functional subspace, implicitly regularizing update magnitude and direction.
arXiv Detail & Related papers (2025-09-16T16:47:03Z)
WeightLoRA: Keep Only Necessary Adapters [79.89637596855]
Low-rank adaptation ($texttLoRA$) adds trainable adapters to selected layers.<n>We propose a novel method, $textttWeightLoRA$, which overcomes this issue by adaptive selection of the most critical $textttLoRA$ heads.<n>We conduct experiments for a series of competitive benchmarks and DeBERTa, BART, and Llama models, comparing our method with different adaptive approaches.
arXiv Detail & Related papers (2025-06-03T10:33:16Z)
FedSVD: Adaptive Orthogonalization for Private Federated Learning with LoRA [61.79405341803085]
Low-Rank Adaptation (LoRA) is widely used for efficient fine-tuning of language models in federated learning (FL)<n>Low-Rank Adaptation (LoRA) is widely used for efficient fine-tuning of language models in federated learning (FL)
arXiv Detail & Related papers (2025-05-19T07:32:56Z)
PointLoRA: Low-Rank Adaptation with Token Selection for Point Cloud Learning [54.99373314906667]
Self-supervised representation learning for point cloud has demonstrated effectiveness in improving pre-trained model performance across diverse tasks. As pre-trained models grow in complexity, fully fine-tuning them for downstream applications demands substantial computational and storage resources. We propose PointLoRA, a simple yet effective method that combines low-rank adaptation (LoRA) with multi-scale token selection to efficiently fine-tune point cloud models.
arXiv Detail & Related papers (2025-04-22T16:41:21Z)
Make LoRA Great Again: Boosting LoRA with Adaptive Singular Values and Mixture-of-Experts Optimization Alignment [20.382810396966473]
Low-Rank Adaptation (LoRA) enables parameter-efficient fine-tuning for Large Language Models (LLMs) Current methods optimize LoRA by initializing with static singular value decomposition subsets, leading to suboptimal leveraging of pre-trained knowledge. We propose underlineGreat LunderlineoRunderlineA Mixture-of-Experunderlinet (GOAT) GOAT integrates relevant priors using an SVD-structured MoE, and aligns optimization with full fine-tuned MoE by deriving a theoretical scaling factor
arXiv Detail & Related papers (2025-02-24T06:48:13Z)
BeamLoRA: Beam-Constraint Low-Rank Adaptation [51.52097743781401]
Low-Rank Adaptation (LoRA) has been widely adopted as one of the most effective parameter-efficient fine-tuning methods. We propose BeamLoRA, which conceptualizes each LoRA module as a beam where each rank naturally corresponds to a potential sub-solution.
arXiv Detail & Related papers (2025-02-19T10:33:22Z)
GoRA: Gradient-driven Adaptive Low Rank Adaptation [13.088526045902016]
Low-Rank Adaptation (LoRA) is a crucial method for efficiently fine-tuning large language models (LLMs)<n>We propose a novel framework--GoRA (Gradient-driven Adaptive Low Rank Adaptation)--that simultaneously adapts both the rank and initialization strategy within a unified framework.<n>GoRA consistently outperforms existing LoRA-based methods while preserving the efficiency of vanilla LoRA.
arXiv Detail & Related papers (2025-02-13T10:33:58Z)
DiffoRA: Enabling Parameter-Efficient LLM Fine-Tuning via Differential Low-Rank Matrix Adaptation [32.369133126167085]
We propose a new PEFT scheme called DiffoRA, which is theoretically grounded and enables module-wise adoption of LoRA. At the core of our DiffoRA lies a Differential Adaptation Matrix (DAM) to determine which module is the most suitable and essential for fine-tuning. Our approach achieves the best model accuracy over all the state-of-the-art baselines across various benchmarks.
arXiv Detail & Related papers (2025-02-13T02:41:34Z)
LoRA Done RITE: Robust Invariant Transformation Equilibration for LoRA Optimization [78.93425154518705]
Low-rank adaption (LoRA) is a widely used parameter-efficient finetuning method for LLM that reduces memory requirements. This paper introduces LoRA-RITE, a novel adaptive matrix preconditioning method for LoRA optimization.
arXiv Detail & Related papers (2024-10-27T22:57:12Z)
Less is More: Extreme Gradient Boost Rank-1 Adaption for Efficient Finetuning of LLMs [75.11449420928139]
Fine-tuning Large Language Models (LLMs) has become a crucial technique for adapting pre-trained models to downstream tasks. Low-Rank Adaptation (LoRA) has emerged as a promising solution, but there exists a gap between the practical performance of low-rank adaptations and its theoretical optimum. We propose eXtreme Gradient Boosting LoRA, a novel framework that bridges this gap by leveraging the power of ensemble learning.
arXiv Detail & Related papers (2024-10-25T17:07:13Z)
Randomized Asymmetric Chain of LoRA: The First Meaningful Theoretical Framework for Low-Rank Adaptation [58.288682735160585]
Low-Rank Adaptation (LoRA) is a popular technique for finetuning models. LoRA often under performs when compared to full- parameter fine-tuning. We present a framework that rigorously analyzes the adaptation rates of LoRA methods.
arXiv Detail & Related papers (2024-10-10T18:51:53Z)
Flat-LoRA: Low-Rank Adaption over a Flat Loss Landscape [52.98187034726091]
Low-Rank Adaptation (LoRA) is an efficient way to fine-tune models by optimizing only a low-rank matrix. A solution that appears flat in the LoRA space may exist sharp directions in the full parameter space, potentially harming generalization performance. We propose Flat-LoRA, an efficient approach that seeks a low-rank adaptation located in a flat region of the full parameter space.
arXiv Detail & Related papers (2024-09-22T11:24:10Z)
LoRA-Pro: Are Low-Rank Adapters Properly Optimized? [121.0693322732454]
Low-rank adaptation, also known as LoRA, has emerged as a prominent method for parameter-efficient fine-tuning of foundation models. Despite its computational efficiency, LoRA still yields inferior performance compared to full fine-tuning. We introduce LoRA-Pro, a method that enhances LoRA's performance by strategically adjusting the gradients of low-rank matrices.
arXiv Detail & Related papers (2024-07-25T17:57:12Z)
SBoRA: Low-Rank Adaptation with Regional Weight Updates [19.15481369459963]
This paper introduces Standard Basis LoRA (SBoRA), a novel parameter-efficient fine-tuning approach for Large Language Models. SBoRA reduces the number of trainable parameters by half or doubles the rank with the similar number of trainable parameters as LoRA. Our results demonstrate the superiority of SBoRA-FA over LoRA in various fine-tuning tasks, including commonsense reasoning and arithmetic reasoning.
arXiv Detail & Related papers (2024-07-07T15:37:13Z)
DoRA: Enhancing Parameter-Efficient Fine-Tuning with Dynamic Rank Distribution [28.589498108609202]
Low-Rank Adaptation (LoRA) relies on a bypass framework that ignores the differential parameter budget requirements across weight matrices. DoRA decomposes high-rank LoRA layers into structured single-rank components, allowing for dynamic pruning of parameter budget. Experimental results demonstrate that DoRA can achieve competitive performance compared with LoRA and full model fine-tuning.
arXiv Detail & Related papers (2024-05-27T17:02:27Z)
BiLoRA: A Bi-level Optimization Framework for Overfitting-Resilient Low-Rank Adaptation of Large Pre-trained Models [34.1111413429869]
BiLoRA is an overfitting-alleviating fine-tuning approach based on bi-level optimization (BLO) tested on ten datasets covering natural language understanding and generation tasks.
arXiv Detail & Related papers (2024-03-19T14:11:20Z)
DoRA: Weight-Decomposed Low-Rank Adaptation [57.68678247436207]
We introduce a novel weight decomposition analysis to investigate the inherent differences between FT and LoRA. Aiming to resemble the learning capacity of FT from the findings, we propose Weight-Decomposed Low-Rank Adaptation (DoRA) DoRA decomposes the pre-trained weight into two components, magnitude and direction, for fine-tuning.
arXiv Detail & Related papers (2024-02-14T17:59:34Z)
One-for-All: Generalized LoRA for Parameter-Efficient Fine-tuning [34.109808214968176]
Generalized LoRA (GLoRA) is an advanced approach for universal parameter-efficient fine-tuning tasks. It employs a generalized prompt module to optimize pre-trained model weights and adjust intermediate activations. GLoRA exhibits strong transfer learning, few-shot learning and domain generalization abilities.
arXiv Detail & Related papers (2023-06-13T17:59:32Z)
AdaLoRA: Adaptive Budget Allocation for Parameter-Efficient Fine-Tuning [143.23123791557245]
Fine-tuning large pre-trained language models on downstream tasks has become an important paradigm in NLP. We propose AdaLoRA, which adaptively allocates the parameter budget among weight matrices according to their importance score. We conduct extensive experiments with several pre-trained models on natural language processing, question answering, and natural language generation to validate the effectiveness of AdaLoRA.
arXiv Detail & Related papers (2023-03-18T22:36:25Z)

This list is automatically generated from the titles and abstracts of the papers in this site.