Related papers: A Single Linear Layer Yields Task-Adapted Low-Rank Matrices

A Single Linear Layer Yields Task-Adapted Low-Rank Matrices

URL: http://arxiv.org/abs/2403.14946v1
Date: Fri, 22 Mar 2024 04:38:42 GMT
Title: A Single Linear Layer Yields Task-Adapted Low-Rank Matrices
Authors: Hwichan Kim, Shota Sasaki, Sho Hoshino, Ukyo Honda,
Abstract summary: Low-Rank Adaptation (LoRA) is a widely used Efficient Fine-Tuning (PEFT) method that updates an initial weight matrix $W_0$ with a delta matrix $Delta W$. We show that CondLoRA maintains a performance on par with LoRA, despite the fact that the trainable parameters of CondLoRA are fewer than those of LoRA.
Score: 4.695004706877747
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Low-Rank Adaptation (LoRA) is a widely used Parameter-Efficient Fine-Tuning (PEFT) method that updates an initial weight matrix $W_0$ with a delta matrix $\Delta W$ consisted by two low-rank matrices $A$ and $B$. A previous study suggested that there is correlation between $W_0$ and $\Delta W$. In this study, we aim to delve deeper into relationships between $W_0$ and low-rank matrices $A$ and $B$ to further comprehend the behavior of LoRA. In particular, we analyze a conversion matrix that transform $W_0$ into low-rank matrices, which encapsulates information about the relationships. Our analysis reveals that the conversion matrices are similar across each layer. Inspired by these findings, we hypothesize that a single linear layer, which takes each layer's $W_0$ as input, can yield task-adapted low-rank matrices. To confirm this hypothesis, we devise a method named Conditionally Parameterized LoRA (CondLoRA) that updates initial weight matrices with low-rank matrices derived from a single linear layer. Our empirical results show that CondLoRA maintains a performance on par with LoRA, despite the fact that the trainable parameters of CondLoRA are fewer than those of LoRA. Therefore, we conclude that "a single linear layer yields task-adapted low-rank matrices."

Related papers

Uni-LoRA: One Vector is All You Need [13.938834666101679]
Low-Rank Adaptation (LoRA) has become the de facto parameter-efficient fine-tuning (PEFT) method for large language models.<n>In this paper, we show that the parameter space reduction strategies employed by these LoRA variants can be formulated within a unified framework.<n>Under the unified view of Uni-LoRA, this design requires only a single trainable vector to reconstruct LoRA parameters for the entire LLM.
arXiv Detail & Related papers (2025-06-01T03:00:09Z)
FedSVD: Adaptive Orthogonalization for Private Federated Learning with LoRA [61.79405341803085]
Low-Rank Adaptation (LoRA) is widely used for efficient fine-tuning of language models in federated learning (FL)<n>Low-Rank Adaptation (LoRA) is widely used for efficient fine-tuning of language models in federated learning (FL)
arXiv Detail & Related papers (2025-05-19T07:32:56Z)
Model-free Low-Rank Reinforcement Learning via Leveraged Entry-wise Matrix Estimation [48.92318828548911]
We present LoRa-PI (Low-Rank Policy Iteration), a model-free learning algorithm alternating between policy improvement and policy evaluation steps. LoRa-PI learns an $varepsilon$-optimal policy using $widetildeO(S+Aover mathrmpoly (1-gamma)varepsilon2)$ samples where $S$ denotes the number of states (resp. actions) and $gamma$ the discount factor.
arXiv Detail & Related papers (2024-10-30T20:22:17Z)
Selective Aggregation for Low-Rank Adaptation in Federated Learning [10.683530421910028]
We introduce Federated Share-A Low-Rank Adaptation (FedSA-LoRA), which employs two low-rank trainable matrices $A$ and $B$ to model the weight update. We extend our FedSA-LoRA method to these LoRA variants, resulting in FedSA-rsLoRA and FedSA-VeRA.
arXiv Detail & Related papers (2024-10-02T12:14:36Z)
CoRA: Optimizing Low-Rank Adaptation with Common Subspace of Large Language Models [7.108651381160281]
Low-Rank Adaptation (LoRA) strategy balances efficiency and performance in fine-tuning large models. We propose textbfCoRA: leveraging shared knowledge to optimize LoRA training by substituting its matrix $B$ with a common subspace from large models. Our experiments show that the first approach achieves the same efficacy as the original LoRA fine-tuning while being more efficient than halving parameters.
arXiv Detail & Related papers (2024-08-31T12:48:27Z)
From GaLore to WeLore: How Low-Rank Weights Non-uniformly Emerge from Low-Rank Gradients [86.40635601953446]
We study the emergence of low-rank structures across different layers of Modern Large Language Models. We present Weight Low-Rank Projection (WeLore) that unifies weight compression and memory-efficient fine-tuning as ONE.
arXiv Detail & Related papers (2024-07-15T21:05:20Z)
SBoRA: Low-Rank Adaptation with Regional Weight Updates [19.15481369459963]
This paper introduces Standard Basis LoRA (SBoRA), a novel parameter-efficient fine-tuning approach for Large Language Models. SBoRA reduces the number of trainable parameters by half or doubles the rank with the similar number of trainable parameters as LoRA. Our results demonstrate the superiority of SBoRA-FA over LoRA in various fine-tuning tasks, including commonsense reasoning and arithmetic reasoning.
arXiv Detail & Related papers (2024-07-07T15:37:13Z)
Asymmetry in Low-Rank Adapters of Foundation Models [47.310550805920585]
This paper characterizes and leverages unexpected asymmetry in the importance of low-rank adapter matrices. We show that fine-tuning $B$ is inherently more effective than fine-tuning $A$, and that a random untrained $A$ should perform nearly as well as a fine-tuned one.
arXiv Detail & Related papers (2024-02-26T18:59:12Z)
Delta-LoRA: Fine-Tuning High-Rank Parameters with the Delta of Low-Rank Matrices [27.693028578653394]
Delta-LoRA is a novel parameter-efficient approach to fine-tune large language models (LLMs) In contrast to LoRA and other low-rank adaptation methods such as AdaLoRA, Delta-LoRA not only updates the low-rank matrices $bA$ and $bB$, but also propagate the learning to the pre-trained weights $bW$.
arXiv Detail & Related papers (2023-09-05T17:40:34Z)
Non-PSD Matrix Sketching with Applications to Regression and Optimization [56.730993511802865]
We present dimensionality reduction methods for non-PSD and square-roots" matrices. We show how these techniques can be used for multiple downstream tasks.
arXiv Detail & Related papers (2021-06-16T04:07:48Z)
Linear-Sample Learning of Low-Rank Distributions [56.59844655107251]
We show that learning $ktimes k$, rank-$r$, matrices to normalized $L_1$ distance requires $Omega(frackrepsilon2)$ samples. We propose an algorithm that uses $cal O(frackrepsilon2log2fracepsilon)$ samples, a number linear in the high dimension, and nearly linear in the matrices, typically low, rank proofs.
arXiv Detail & Related papers (2020-09-30T19:10:32Z)
Supervised Quantile Normalization for Low-rank Matrix Approximation [50.445371939523305]
We learn the parameters of quantile normalization operators that can operate row-wise on the values of $X$ and/or of its factorization $UV$ to improve the quality of the low-rank representation of $X$ itself. We demonstrate the applicability of these techniques on synthetic and genomics datasets.
arXiv Detail & Related papers (2020-02-08T21:06:02Z)

This list is automatically generated from the titles and abstracts of the papers in this site.