Towards Symmetric Low-Rank Adapters
- URL: http://arxiv.org/abs/2504.03719v2
- Date: Tue, 15 Apr 2025 22:46:20 GMT
- Title: Towards Symmetric Low-Rank Adapters
- Authors: Tales Panoutsos, Rodrygo L. T. Santos, Flavio Figueiredo,
- Abstract summary: We introduce Symmetric Low-Rank Adapters, an optimized variant of LoRA with even fewer weights.<n>This method utilizes Low-Rank Symmetric Weight Matrices to learn downstream tasks more efficiently.
- Score: 3.3317825075368908
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In this paper, we introduce Symmetric Low-Rank Adapters, an optimized variant of LoRA with even fewer weights. This method utilizes Low-Rank Symmetric Weight Matrices to learn downstream tasks more efficiently. Traditional LoRA accumulates fine-tuning weights with the original pre-trained weights via a Singular Value Decomposition (SVD) like approach, i.e., model weights are fine-tuned via updates of the form $BA$ (where $B \in \mathbb{R}^{n\times r}$, $A \in \mathbb{R}^{r\times n}$, and $r$ is the rank of the merged weight matrix). In contrast, our approach, named SymLoRA, represents fine-tuning weights as a Spectral Decomposition, i.e., $Q \, diag(\Lambda)\, Q^T$, where $Q \in \mathbb{R}^{n\times r}$ and $\Lambda \in \mathbb{R}^r$. SymLoRA requires approximately half of the finetuning weights. Here, we show that this approach has negligible losses in downstream efficacy.
Related papers
- ConsNoTrainLoRA: Data-driven Weight Initialization of Low-rank Adapters using Constraints [64.35580479051208]
In previous works, low-rank adapters (LoRA) are randomly with a fixed rank across all attachment points.<n>In this paper, we improve convergence and final performance of LoRA fine-tuning using our proposed data-driven weight initialization method.
arXiv Detail & Related papers (2025-07-09T23:52:31Z) - FedSVD: Adaptive Orthogonalization for Private Federated Learning with LoRA [61.79405341803085]
Low-Rank Adaptation (LoRA) is widely used for efficient fine-tuning of language models in federated learning (FL)<n>Low-Rank Adaptation (LoRA) is widely used for efficient fine-tuning of language models in federated learning (FL)
arXiv Detail & Related papers (2025-05-19T07:32:56Z) - Gradient dynamics for low-rank fine-tuning beyond kernels [9.275532709125242]
We study low-rank fine-tuning in a student-teacher setting.
We prove under mild assumptions that a student model which is matrix at the base model and trained with online gradient descent will converge to the teacher.
arXiv Detail & Related papers (2024-11-23T00:00:28Z) - Randomized Asymmetric Chain of LoRA: The First Meaningful Theoretical Framework for Low-Rank Adaptation [58.288682735160585]
Low-Rank Adaptation (LoRA) is a popular technique for finetuning models.
LoRA often under performs when compared to full- parameter fine-tuning.
We present a framework that rigorously analyzes the adaptation rates of LoRA methods.
arXiv Detail & Related papers (2024-10-10T18:51:53Z) - CoRA: Optimizing Low-Rank Adaptation with Common Subspace of Large Language Models [7.108651381160281]
Low-Rank Adaptation (LoRA) strategy balances efficiency and performance in fine-tuning large models.
We propose textbfCoRA: leveraging shared knowledge to optimize LoRA training by substituting its matrix $B$ with a common subspace from large models.
Our experiments show that the first approach achieves the same efficacy as the original LoRA fine-tuning while being more efficient than halving parameters.
arXiv Detail & Related papers (2024-08-31T12:48:27Z) - Optimal Sketching for Residual Error Estimation for Matrix and Vector Norms [50.15964512954274]
We study the problem of residual error estimation for matrix and vector norms using a linear sketch.
We demonstrate that this gives a substantial advantage empirically, for roughly the same sketch size and accuracy as in previous work.
We also show an $Omega(k2/pn1-2/p)$ lower bound for the sparse recovery problem, which is tight up to a $mathrmpoly(log n)$ factor.
arXiv Detail & Related papers (2024-08-16T02:33:07Z) - Parameter-Efficient Fine-Tuning via Circular Convolution [29.442868470645482]
Low-Rank Adaptation (LoRA) has gained popularity for fine-tuning large foundation models.
We propose Circular Convolution Adaptation (C$3$A), which not only achieves high-rank adaptation with enhanced performance but also excels in both computational power and memory utilization.
arXiv Detail & Related papers (2024-07-27T21:12:46Z) - From GaLore to WeLore: How Low-Rank Weights Non-uniformly Emerge from Low-Rank Gradients [86.40635601953446]
We study the emergence of low-rank structures across different layers of Modern Large Language Models.
We present Weight Low-Rank Projection (WeLore) that unifies weight compression and memory-efficient fine-tuning as ONE.
arXiv Detail & Related papers (2024-07-15T21:05:20Z) - SBoRA: Low-Rank Adaptation with Regional Weight Updates [19.15481369459963]
This paper introduces Standard Basis LoRA (SBoRA), a novel parameter-efficient fine-tuning approach for Large Language Models.
SBoRA reduces the number of trainable parameters by half or doubles the rank with the similar number of trainable parameters as LoRA.
Our results demonstrate the superiority of SBoRA-FA over LoRA in various fine-tuning tasks, including commonsense reasoning and arithmetic reasoning.
arXiv Detail & Related papers (2024-07-07T15:37:13Z) - Asymmetry in Low-Rank Adapters of Foundation Models [47.310550805920585]
This paper characterizes and leverages unexpected asymmetry in the importance of low-rank adapter matrices.
We show that fine-tuning $B$ is inherently more effective than fine-tuning $A$, and that a random untrained $A$ should perform nearly as well as a fine-tuned one.
arXiv Detail & Related papers (2024-02-26T18:59:12Z) - Efficient Alternating Minimization with Applications to Weighted Low Rank Approximation [9.10876982856809]
We develop a framework for alternating minimization that allows the alternating updates to be computed approximately.<n>At the heart of our framework is a high-accuracy multiple response regression solver together with a robust analysis of alternating minimization.
arXiv Detail & Related papers (2023-06-07T05:38:55Z) - Revisiting Weighted Strategy for Non-stationary Parametric Bandits [82.1942459195896]
This paper revisits the weighted strategy for non-stationary parametric bandits.
We propose a refined analysis framework, which produces a simpler weight-based algorithm.
Our new framework can be used to improve regret bounds of other parametric bandits.
arXiv Detail & Related papers (2023-03-05T15:11:14Z) - Bounding the Width of Neural Networks via Coupled Initialization -- A
Worst Case Analysis [121.9821494461427]
We show how to significantly reduce the number of neurons required for two-layer ReLU networks.
We also prove new lower bounds that improve upon prior work, and that under certain assumptions, are best possible.
arXiv Detail & Related papers (2022-06-26T06:51:31Z) - Monarch: Expressive Structured Matrices for Efficient and Accurate
Training [64.6871423399431]
Large neural networks excel in many domains, but they are expensive to train and fine-tune.
A popular approach to reduce their compute or memory requirements is to replace dense weight matrices with structured ones.
We propose a class of matrices (Monarch) that is hardware-efficient.
arXiv Detail & Related papers (2022-04-01T17:37:29Z) - On Submodular Contextual Bandits [92.45432756301231]
We consider the problem of contextual bandits where actions are subsets of a ground set and mean rewards are modeled by an unknown monotone submodular function.
We show that our algorithm efficiently randomizes around local optima of estimated functions according to the Inverse Gap Weighting strategy.
arXiv Detail & Related papers (2021-12-03T21:42:33Z) - Optimal Robust Linear Regression in Nearly Linear Time [97.11565882347772]
We study the problem of high-dimensional robust linear regression where a learner is given access to $n$ samples from the generative model $Y = langle X,w* rangle + epsilon$
We propose estimators for this problem under two settings: (i) $X$ is L4-L2 hypercontractive, $mathbbE [XXtop]$ has bounded condition number and $epsilon$ has bounded variance and (ii) $X$ is sub-Gaussian with identity second moment and $epsilon$ is
arXiv Detail & Related papers (2020-07-16T06:44:44Z) - Sharp Statistical Guarantees for Adversarially Robust Gaussian
Classification [54.22421582955454]
We provide the first result of the optimal minimax guarantees for the excess risk for adversarially robust classification.
Results are stated in terms of the Adversarial Signal-to-Noise Ratio (AdvSNR), which generalizes a similar notion for standard linear classification to the adversarial setting.
arXiv Detail & Related papers (2020-06-29T21:06:52Z) - Learning nonlinear dynamical systems from a single trajectory [102.60042167341956]
We introduce algorithms for learning nonlinear dynamical systems of the form $x_t+1=sigma(Thetastarx_t)+varepsilon_t$.
We give an algorithm that recovers the weight matrix $Thetastar$ from a single trajectory with optimal sample complexity and linear running time.
arXiv Detail & Related papers (2020-04-30T10:42:48Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.