A Single Linear Layer Yields Task-Adapted Low-Rank Matrices
- URL: http://arxiv.org/abs/2403.14946v1
- Date: Fri, 22 Mar 2024 04:38:42 GMT
- Title: A Single Linear Layer Yields Task-Adapted Low-Rank Matrices
- Authors: Hwichan Kim, Shota Sasaki, Sho Hoshino, Ukyo Honda,
- Abstract summary: Low-Rank Adaptation (LoRA) is a widely used Efficient Fine-Tuning (PEFT) method that updates an initial weight matrix $W_0$ with a delta matrix $Delta W$.
We show that CondLoRA maintains a performance on par with LoRA, despite the fact that the trainable parameters of CondLoRA are fewer than those of LoRA.
- Score: 4.695004706877747
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Low-Rank Adaptation (LoRA) is a widely used Parameter-Efficient Fine-Tuning (PEFT) method that updates an initial weight matrix $W_0$ with a delta matrix $\Delta W$ consisted by two low-rank matrices $A$ and $B$. A previous study suggested that there is correlation between $W_0$ and $\Delta W$. In this study, we aim to delve deeper into relationships between $W_0$ and low-rank matrices $A$ and $B$ to further comprehend the behavior of LoRA. In particular, we analyze a conversion matrix that transform $W_0$ into low-rank matrices, which encapsulates information about the relationships. Our analysis reveals that the conversion matrices are similar across each layer. Inspired by these findings, we hypothesize that a single linear layer, which takes each layer's $W_0$ as input, can yield task-adapted low-rank matrices. To confirm this hypothesis, we devise a method named Conditionally Parameterized LoRA (CondLoRA) that updates initial weight matrices with low-rank matrices derived from a single linear layer. Our empirical results show that CondLoRA maintains a performance on par with LoRA, despite the fact that the trainable parameters of CondLoRA are fewer than those of LoRA. Therefore, we conclude that "a single linear layer yields task-adapted low-rank matrices."
Related papers
- From GaLore to WeLore: How Low-Rank Weights Non-uniformly Emerge from Low-Rank Gradients [86.40635601953446]
We study the emergence of low-rank structures across different layers of Modern Large Language Models.
We present Weight Low-Rank Projection (WeLore) that unifies weight compression and memory-efficient fine-tuning as ONE.
arXiv Detail & Related papers (2024-07-15T21:05:20Z) - MiLoRA: Harnessing Minor Singular Components for Parameter-Efficient LLM Finetuning [16.32872017587485]
Efficient finetuning of large language models (LLMs) aims to adapt the LLMs with reduced computation and memory cost.
Previous LoRA-based approaches initialize the low-rank matrices with gaussian distribution and zero values, while keeping the original weight matrices frozen.
We propose MiLoRA, a simple yet effective LLM finetuning approach that only updates the minor singular components of the weight matrix while keeping the principle singular components frozen.
arXiv Detail & Related papers (2024-06-13T12:30:02Z) - Computational Limits of Low-Rank Adaptation (LoRA) for Transformer-Based Models [10.827800772359844]
We study the computational limits of Low-Rank Adaptation (LoRA) update for finetuning transformer-based models.
Our key observation is that the existence of low-rank decompositions within the gradient computation of LoRA adaptation leads to possible algorithmic speedup.
We prove the existence of nearly linear approximation algorithms for LoRA adaptation by utilizing the hierarchical low-rank structures of LoRA gradients.
arXiv Detail & Related papers (2024-06-05T10:44:08Z) - Asymmetry in Low-Rank Adapters of Foundation Models [47.310550805920585]
This paper characterizes and leverages unexpected asymmetry in the importance of low-rank adapter matrices.
We show that fine-tuning $B$ is inherently more effective than fine-tuning $A$, and that a random untrained $A$ should perform nearly as well as a fine-tuned one.
arXiv Detail & Related papers (2024-02-26T18:59:12Z) - NOLA: Compressing LoRA using Linear Combination of Random Basis [22.76088132446952]
We introduce NOLA, which overcomes the rank one lower bound present in LoRA.
NOLA performs as well as LoRA models with much fewer number of parameters compared to LoRA with rank one, the best compression LoRA can archive.
arXiv Detail & Related papers (2023-10-04T03:30:24Z) - Delta-LoRA: Fine-Tuning High-Rank Parameters with the Delta of Low-Rank
Matrices [27.693028578653394]
Delta-LoRA is a novel parameter-efficient approach to fine-tune large language models (LLMs)
In contrast to LoRA and other low-rank adaptation methods such as AdaLoRA, Delta-LoRA not only updates the low-rank matrices $bA$ and $bB$, but also propagate the learning to the pre-trained weights $bW$.
arXiv Detail & Related papers (2023-09-05T17:40:34Z) - One-sided Matrix Completion from Two Observations Per Row [95.87811229292056]
We propose a natural algorithm that involves imputing the missing values of the matrix $XTX$.
We evaluate our algorithm on one-sided recovery of synthetic data and low-coverage genome sequencing.
arXiv Detail & Related papers (2023-06-06T22:35:16Z) - Non-PSD Matrix Sketching with Applications to Regression and
Optimization [56.730993511802865]
We present dimensionality reduction methods for non-PSD and square-roots" matrices.
We show how these techniques can be used for multiple downstream tasks.
arXiv Detail & Related papers (2021-06-16T04:07:48Z) - Linear-Sample Learning of Low-Rank Distributions [56.59844655107251]
We show that learning $ktimes k$, rank-$r$, matrices to normalized $L_1$ distance requires $Omega(frackrepsilon2)$ samples.
We propose an algorithm that uses $cal O(frackrepsilon2log2fracepsilon)$ samples, a number linear in the high dimension, and nearly linear in the matrices, typically low, rank proofs.
arXiv Detail & Related papers (2020-09-30T19:10:32Z) - Supervised Quantile Normalization for Low-rank Matrix Approximation [50.445371939523305]
We learn the parameters of quantile normalization operators that can operate row-wise on the values of $X$ and/or of its factorization $UV$ to improve the quality of the low-rank representation of $X$ itself.
We demonstrate the applicability of these techniques on synthetic and genomics datasets.
arXiv Detail & Related papers (2020-02-08T21:06:02Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.