Efficient Modular Learning through Naive LoRA Summation: Leveraging Orthogonality in High-Dimensional Models
- URL: http://arxiv.org/abs/2508.11985v1
- Date: Sat, 16 Aug 2025 08:49:02 GMT
- Title: Efficient Modular Learning through Naive LoRA Summation: Leveraging Orthogonality in High-Dimensional Models
- Authors: Zhanhao Cao, Clement Truong, Andrew Lizarraga,
- Abstract summary: Low-Rank Adaptation (LoRA) stores parameter deltas as the product of two small matrices.<n>Naive summation requires no additional training, can be applied in seconds, and achieves performance comparable to models trained on merged data.
- Score: 1.0923877073891446
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Recent advances in large language models are driven by scale, while parameter-efficient fine-tuning (PEFT) enables updating only a small fraction of parameters. Low-Rank Adaptation (LoRA) stores parameter deltas as the product of two small matrices, which makes them natural building blocks that can be composed. Motivated by the superposition principle, we hypothesize that independently trained LoRA modules on disjoint domains are approximately orthogonal and can be combined by simple addition. Using GPT-2 Small (117M) with LoRA rank 4 and alpha=64, we train adapters for three QA domains (math, medicine, finance). In pairwise tests, adding Math+Medicine adapters improves perplexity by -9.10% relative to merged-data fine-tuning, while Math+Finance and Finance+Medicine change by +4.54% and +27.56%, respectively. Across combinations, the RMS cosine similarity between LoRA deltas correlates positively and approximately linearly with the change in perplexity. Naive summation requires no additional training, can be applied in seconds, and achieves performance comparable to models trained on merged data, while clarifying when interference appears in higher-order compositions.
Related papers
- Dual LoRA: Enhancing LoRA with Magnitude and Direction Updates [14.49537642990529]
Low-rank adaptation (LoRA) is one of the most popular methods among parameter-efficient fine-tuning (PEFT)<n>We propose a novel method called Dual LoRA to improve the performance by incorporating an inductive bias into the original LoRA.<n>We show that we consistently outperform LoRA and its state-of-the-art variants with the same number of trainable parameters.
arXiv Detail & Related papers (2025-12-03T03:14:09Z) - Faster Than SVD, Smarter Than SGD: The OPLoRA Alternating Update [50.36542772932594]
Low-Rank Adaptation (LoRA) fine-tunes large models by learning low-rank updates on top of frozen weights.<n>There is still a gap between full training with low-rank projections (SVDLoRA) and LoRA fine-tuning, indicating that LoRA steps can be further improved.
arXiv Detail & Related papers (2025-09-24T10:32:50Z) - Don't Forget the Nonlinearity: Unlocking Activation Functions in Efficient Fine-Tuning [82.16625951603315]
NoRA replaces fixed activations with learnable rational functions and applies structured low-rank updates to numerator and denominator coefficients.<n>On vision transformers trained on CIFAR-10 and CIFAR-100, NoRA matches or exceeds full fine-tuning while updating only 0.4% of parameters.<n>NoRA constrains adaptation to a low-dimensional functional subspace, implicitly regularizing update magnitude and direction.
arXiv Detail & Related papers (2025-09-16T16:47:03Z) - Automatic Rank Determination for Low-Rank Adaptation via Submodular Function Maximization [56.78271181959529]
SubLoRA is a rank determination method for Low-Rank Adaptation (LoRA) based on submodular function.<n>Our method combines solid theoretical foundations, second-order accuracy, and practical computational efficiency.
arXiv Detail & Related papers (2025-07-02T15:56:40Z) - LoRA Done RITE: Robust Invariant Transformation Equilibration for LoRA Optimization [78.93425154518705]
Low-rank adaption (LoRA) is a widely used parameter-efficient finetuning method for LLM that reduces memory requirements.<n>This paper introduces LoRA-RITE, a novel adaptive matrix preconditioning method for LoRA optimization.
arXiv Detail & Related papers (2024-10-27T22:57:12Z) - Randomized Asymmetric Chain of LoRA: The First Meaningful Theoretical Framework for Low-Rank Adaptation [58.288682735160585]
Low-Rank Adaptation (LoRA) is a popular technique for finetuning models.
LoRA often under performs when compared to full- parameter fine-tuning.
We present a framework that rigorously analyzes the adaptation rates of LoRA methods.
arXiv Detail & Related papers (2024-10-10T18:51:53Z) - Balancing LoRA Performance and Efficiency with Simple Shard Sharing [8.827921242078883]
textbfOptimal textbfShard textbfSharing textbfIntegration in textbfLoRA, a novel PEFT approach that addresses this trade-off through a simple shard-sharing mechanism.<n>Fossils significantly outperforms standard LoRA and its prominent variants in both model performance metrics and computational efficiency.
arXiv Detail & Related papers (2024-09-19T10:26:42Z) - LoRA$^2$ : Multi-Scale Low-Rank Approximations for Fine-Tuning Large Language Models [3.7049613588433497]
Low-Rank Adaptation (LoRA) significantly reduces the number of trainable parameters for fine-tuning.
We extend the LoRA to multiple scales, dubbed as LoRA$2$.
arXiv Detail & Related papers (2024-08-13T12:31:30Z) - SBoRA: Low-Rank Adaptation with Regional Weight Updates [19.15481369459963]
This paper introduces Standard Basis LoRA (SBoRA), a novel parameter-efficient fine-tuning approach for Large Language Models.
SBoRA reduces the number of trainable parameters by half or doubles the rank with the similar number of trainable parameters as LoRA.
Our results demonstrate the superiority of SBoRA-FA over LoRA in various fine-tuning tasks, including commonsense reasoning and arithmetic reasoning.
arXiv Detail & Related papers (2024-07-07T15:37:13Z) - DoRA: Weight-Decomposed Low-Rank Adaptation [57.68678247436207]
We introduce a novel weight decomposition analysis to investigate the inherent differences between FT and LoRA.
Aiming to resemble the learning capacity of FT from the findings, we propose Weight-Decomposed Low-Rank Adaptation (DoRA)
DoRA decomposes the pre-trained weight into two components, magnitude and direction, for fine-tuning.
arXiv Detail & Related papers (2024-02-14T17:59:34Z) - Chain of LoRA: Efficient Fine-tuning of Language Models via Residual
Learning [31.036465632204663]
We introduce Chain of LoRA, an iterative optimization framework inspired by the Frank-Wolfe algorithm.
We demonstrate that COLA can consistently outperform LoRA without additional computational or memory costs.
arXiv Detail & Related papers (2024-01-08T14:26:49Z) - NOLA: Compressing LoRA using Linear Combination of Random Basis [22.76088132446952]
We introduce NOLA, which overcomes the rank one lower bound present in LoRA.
NOLA performs as well as LoRA models with much fewer number of parameters compared to LoRA with rank one, the best compression LoRA can archive.
arXiv Detail & Related papers (2023-10-04T03:30:24Z) - LoRA: Low-Rank Adaptation of Large Language Models [71.75808607987281]
Low-Rank Adaptation, or LoRA, freezes the pre-trained model weights and injects trainable rank decomposition into each layer of the Transformer architecture.
For GPT-3, LoRA can reduce the number of trainable parameters by 10,000 times and the computation hardware requirement by 3 times compared to full fine-tuning.
arXiv Detail & Related papers (2021-06-17T17:37:18Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.