LoTR: Low Tensor Rank Weight Adaptation
- URL: http://arxiv.org/abs/2402.01376v2
- Date: Mon, 5 Feb 2024 12:42:52 GMT
- Title: LoTR: Low Tensor Rank Weight Adaptation
- Authors: Daniel Bershatsky, Daria Cherniuk, Talgat Daulbaev, Aleksandr Mikhalev
and Ivan Oseledets
- Abstract summary: We introduce LoTR, a novel approach for parameter-efficient fine-tuning of large language models (LLMs)
LoTR represents a gradient update to parameters in a form of tensor decomposition.
Simultaneous compression of a sequence of layers with low-rank tensor representation allows LoTR to archive even better parameter efficiency then LoRA especially for deep models.
- Score: 47.4904143988667
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In this paper we generalize and extend an idea of low-rank adaptation (LoRA)
of large language models (LLMs) based on Transformer architecture. Widely used
LoRA-like methods of fine-tuning LLMs are based on matrix factorization of
gradient update. We introduce LoTR, a novel approach for parameter-efficient
fine-tuning of LLMs which represents a gradient update to parameters in a form
of tensor decomposition. Low-rank adapter for each layer is constructed as a
product of three matrices, and tensor structure arises from sharing left and
right multipliers of this product among layers. Simultaneous compression of a
sequence of layers with low-rank tensor representation allows LoTR to archive
even better parameter efficiency then LoRA especially for deep models.
Moreover, the core tensor does not depend on original weight dimension and can
be made arbitrary small, which allows for extremely cheap and fast downstream
fine-tuning.
Related papers
- From GaLore to WeLore: How Low-Rank Weights Non-uniformly Emerge from Low-Rank Gradients [86.40635601953446]
We study the emergence of low-rank structures across different layers of Modern Large Language Models.
We present Weight Low-Rank Projection (WeLore) that unifies weight compression and memory-efficient fine-tuning as ONE.
arXiv Detail & Related papers (2024-07-15T21:05:20Z) - RankAdaptor: Hierarchical Dynamic Low-Rank Adaptation for Structural Pruned LLMs [3.3424221693424014]
In this paper, we introduce RankAdaptor, an efficient fine-tuning method with hierarchical dynamic rank scheduling for pruned LLMs.
Experiments show that RankAdaptor consistently outperforms standard LoRA with structural pruning over different pruning settings.
Without increasing the trainable parameters, RankAdaptor further reduces the accuracy performance gap between the recovery of the pruned model and the original model.
arXiv Detail & Related papers (2024-06-22T04:52:58Z) - LoRAP: Transformer Sub-Layers Deserve Differentiated Structured Compression for Large Language Models [9.244526043014098]
Large language models (LLMs) show excellent performance in difficult tasks, but they often require massive memories and computational resources.
In this study, we make an important observation that the multi-head self-attention (MHA) sub-layer of Transformer exhibits noticeable low-rank structure.
We propose a mixed compression model, which organically combines Low-Rank matrix And structured Pruning (LoRAP)
arXiv Detail & Related papers (2024-04-15T11:53:22Z) - Flora: Low-Rank Adapters Are Secretly Gradient Compressors [30.224822087562163]
Low-rank adaptation (LoRA) is proposed to reduce the optimization states by training fewer parameters.
LoRA restricts overall weight update matrices to be low-rank, limiting the model performance.
We propose Flora, which is able to achieve high-rank updates by resampling the projection matrices.
arXiv Detail & Related papers (2024-02-05T18:50:39Z) - Run LoRA Run: Faster and Lighter LoRA Implementations [50.347242693025336]
LoRA is a technique that reduces the number of trainable parameters in a neural network by introducing low-rank adapters to linear layers.
This paper presents the RunLoRA framework for efficient implementations of LoRA.
Experiments show up to 28% speedup on language modeling networks.
arXiv Detail & Related papers (2023-12-06T10:54:34Z) - LQ-LoRA: Low-rank Plus Quantized Matrix Decomposition for Efficient Language Model Finetuning [66.85589263870702]
Our approach uses an iterative algorithm to decompose each pretrained matrix into a high-precision low-rank component and a memory-efficient quantized component.
Experiments on finetuning RoBERTa and LLaMA-2 demonstrate that our low-rank plus quantized matrix decomposition approach (LQ-LoRA) outperforms strong QLoRA and GPTQ-LoRA baselines.
arXiv Detail & Related papers (2023-11-20T18:57:41Z) - The Expressive Power of Low-Rank Adaptation [11.371811534310078]
Low-Rank Adaptation, a parameter-efficient fine-tuning method, has emerged as a prevalent technique for fine-tuning pre-trained models.
This paper takes the first step to bridge the gap by theoretically analyzing the expressive power of LoRA.
For Transformer networks, we show any model can be adapted to a target model of the same size with rank-$(fractextembedding size2)$ LoRA.
arXiv Detail & Related papers (2023-10-26T16:08:33Z) - LoRAPrune: Pruning Meets Low-Rank Parameter-Efficient Fine-Tuning [56.88751562302793]
Low-rank adaption (LoRA) has emerged to fine-tune large language models (LLMs)
LoRAPrune is a new framework that delivers an accurate structured pruned model in a highly memory-efficient manner.
LoRAPrune achieves a reduction in perplexity by 4.81 on WikiText2 and 3.46 on PTB, while also decreasing memory usage by 52.6%.
arXiv Detail & Related papers (2023-05-28T15:15:48Z) - Scaling Pre-trained Language Models to Deeper via Parameter-efficient
Architecture [68.13678918660872]
We design a more capable parameter-sharing architecture based on matrix product operator (MPO)
MPO decomposition can reorganize and factorize the information of a parameter matrix into two parts.
Our architecture shares the central tensor across all layers for reducing the model size.
arXiv Detail & Related papers (2023-03-27T02:34:09Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.