MAP: Revisiting Weight Decomposition for Low-Rank Adaptation
- URL: http://arxiv.org/abs/2505.23094v1
- Date: Thu, 29 May 2025 04:56:35 GMT
- Title: MAP: Revisiting Weight Decomposition for Low-Rank Adaptation
- Authors: Chongjie Si, Zhiyi Shi, Yadao Wang, Xiaokang Yang, Susanto Rahardja, Wei Shen,
- Abstract summary: We propose MAP, a novel framework that reformulates weight vectors as high-dimensional vectors.<n> MAP normalizes the pre-trained weights, learns a directional update, and two scalar coefficients to independently scale the magnitude of the base and update vectors.<n>This design enables more interpretable and flexible adaptation, and can be seamlessly integrated into existing PEFT methods.
- Score: 50.71088247281515
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The rapid development of large language models has revolutionized natural language processing, but their fine-tuning remains computationally expensive, hindering broad deployment. Parameter-efficient fine-tuning (PEFT) methods, such as LoRA, have emerged as solutions. Recent work like DoRA attempts to further decompose weight adaptation into direction and magnitude components. However, existing formulations often define direction heuristically at the column level, lacking a principled geometric foundation. In this paper, we propose MAP, a novel framework that reformulates weight matrices as high-dimensional vectors and decouples their adaptation into direction and magnitude in a rigorous manner. MAP normalizes the pre-trained weights, learns a directional update, and introduces two scalar coefficients to independently scale the magnitude of the base and update vectors. This design enables more interpretable and flexible adaptation, and can be seamlessly integrated into existing PEFT methods. Extensive experiments show that MAP significantly improves performance when coupling with existing methods, offering a simple yet powerful enhancement to existing PEFT methods. Given the universality and simplicity of MAP, we hope it can serve as a default setting for designing future PEFT methods.
Related papers
- CoSA: Compressed Sensing-Based Adaptation of Large Language Models [21.688889188355645]
CoSA (Compressed Sensing-Based Adaptation) is a new PEFT method extended from compressed sensing theory.<n>We show that CoSA provides a principled perspective for efficient and expressive multi-scale model adaptation.<n>We evaluate CoSA on 10 diverse tasks, including natural language understanding and generation, employing 5 models of different scales from RoBERTa, Llama, and Qwen families.
arXiv Detail & Related papers (2026-02-05T00:11:43Z) - Calibrating and Rotating: A Unified Framework for Weight Conditioning in PEFT [19.773848189002965]
DoRA method enhances performance by decomposing weight updates into magnitude and direction.<n>In this work, we identify that DoRA's success stems from its capacity to increase the singular value entropy of the weight update matrix.<n>We reformulate DoRA into a mathematically equivalent and more efficient matrix form, revealing it as a learnable weight conditioning method.
arXiv Detail & Related papers (2025-10-28T12:52:54Z) - TeRA: Vector-based Random Tensor Network for High-Rank Adaptation of Large Language Models [6.968486021891596]
We propose a vector-based random underlinetextbfTensor network for high-underlinetextbfRank underlinetextbfAdaptation (TeRA)<n>This is achieved by parameterizing the tensorized weight update matrix as a Tucker-like tensor network (TN)<n>Experiments demonstrate that TeRA matches or even outperforms high-rank adapters, while requiring a trainable parameter count similar to vector-based methods.
arXiv Detail & Related papers (2025-09-03T11:46:24Z) - Exploring Sparse Adapters for Scalable Merging of Parameter Efficient Experts [72.22148263683037]
We study the properties of sparse adapters, which train only a subset of weights in the base neural network, as potential building blocks of modular architectures.<n>First, we propose a simple method for training highly effective sparse adapters, which is conceptually simpler than existing methods in the literature.<n>Next, we investigate the merging properties of these sparse adapters by merging adapters for up to 20 natural language processing tasks.
arXiv Detail & Related papers (2025-07-09T03:25:45Z) - Dynamic Context-oriented Decomposition for Task-aware Low-rank Adaptation with Less Forgetting and Faster Convergence [131.41894248194995]
We propose context-oriented decomposition adaptation (CorDA), a novel method that initializes adapters in a task-aware manner.<n>Thanks to the task awareness, our method enables two optional adaptation modes, knowledge-preserved mode (KPM) and instruction-previewed mode (IPM)
arXiv Detail & Related papers (2025-06-16T07:55:14Z) - PrunePEFT: Iterative Hybrid Pruning for Parameter-Efficient Fine-tuning of LLMs [8.52711842775914]
Efficient Fine-Tuning (PEFT) methods have emerged as effective and promising approaches for fine-tuning pre-trained language models.<n>In this paper, we propose a novel approach, PrunePEFT, which formulates the PEFT strategy search as a pruning problem.
arXiv Detail & Related papers (2025-06-09T09:32:58Z) - Weight Spectra Induced Efficient Model Adaptation [54.8615621415845]
Fine-tuning large-scale foundation models incurs prohibitive computational costs.<n>We show that fine-tuning predominantly amplifies the top singular values while leaving the remainder largely intact.<n>We propose a novel method that leverages learnable rescaling of top singular directions.
arXiv Detail & Related papers (2025-05-29T05:03:29Z) - Generalized Tensor-based Parameter-Efficient Fine-Tuning via Lie Group Transformations [50.010924231754856]
Adapting pre-trained foundation models for diverse downstream tasks is a core practice in artificial intelligence.<n>To overcome this, parameter-efficient fine-tuning (PEFT) methods like LoRA have emerged and are becoming a growing research focus.<n>We propose a generalization that extends matrix-based PEFT methods to higher-dimensional parameter spaces without compromising their structural properties.
arXiv Detail & Related papers (2025-04-01T14:36:45Z) - DeLoRA: Decoupling Angles and Strength in Low-rank Adaptation [44.99833362998488]
Decoupled Low-rank Adaptation (DeLoRA) is a novel finetuning method that normalizes and scales learnable low-rank matrices.<n>We show that DeLoRA matches or surpasses performance of competing PEFT methods, while exhibiting stronger robustness.
arXiv Detail & Related papers (2025-03-23T22:00:56Z) - Training Deep Learning Models with Norm-Constrained LMOs [56.00317694850397]
We study optimization methods that leverage the linear minimization oracle (LMO) over a norm-ball.<n>We propose a new family of algorithms that uses the LMO to adapt to the geometry of the problem and, perhaps surprisingly, show that they can be applied to unconstrained problems.
arXiv Detail & Related papers (2025-02-11T13:10:34Z) - Transformed Low-rank Adaptation via Tensor Decomposition and Its Applications to Text-to-image Models [32.68721299475496]
Low-Rank Adaptation (LoRA) and its variants have gained significant attention due to their effectiveness.<n>We propose a new PEFT method that combines two classes of adaptations, namely, transform and residual adaptations.<n>Experiments are conducted on fine-tuning Stable Diffusion models in subject-driven and controllable generation.
arXiv Detail & Related papers (2025-01-15T11:10:37Z) - NEAT: Nonlinear Parameter-efficient Adaptation of Pre-trained Models [26.808251361020066]
Fine-tuning pre-trained models often yields state-of-the-art performance but is computationally expensive when updating all parameters.<n>We propose NEAT, a nonlinear PEFT approach that employs a lightweight neural network to learn a nonlinear transformation of the pre-trained weights.<n>Our theoretical analysis shows that NEAT achieves greater efficiency than LoRA while maintaining equivalent expressivity.
arXiv Detail & Related papers (2024-10-02T17:29:23Z) - Bridging The Gap between Low-rank and Orthogonal Adaptation via Householder Reflection Adaptation [32.371755315509574]
Householder reflection adaptation (HRA) is a simple but effective adaptation method based on Householder reflections.
HRA achieves superior performance with fewer learnable parameters when adapting large language models and conditional image generators.
arXiv Detail & Related papers (2024-05-24T16:18:16Z) - Scaling Forward Gradient With Local Losses [117.22685584919756]
Forward learning is a biologically plausible alternative to backprop for learning deep neural networks.
We show that it is possible to substantially reduce the variance of the forward gradient by applying perturbations to activations rather than weights.
Our approach matches backprop on MNIST and CIFAR-10 and significantly outperforms previously proposed backprop-free algorithms on ImageNet.
arXiv Detail & Related papers (2022-10-07T03:52:27Z) - AdamP: Slowing Down the Slowdown for Momentum Optimizers on
Scale-invariant Weights [53.8489656709356]
Normalization techniques are a boon for modern deep learning.
It is often overlooked, however, that the additional introduction of momentum results in a rapid reduction in effective step sizes for scale-invariant weights.
In this paper, we verify that the widely-adopted combination of the two ingredients lead to the premature decay of effective step sizes and sub-optimal model performances.
arXiv Detail & Related papers (2020-06-15T08:35:15Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.