Vertical LoRA: Dense Expectation-Maximization Interpretation of Transformers
- URL: http://arxiv.org/abs/2406.09315v1
- Date: Thu, 13 Jun 2024 16:51:33 GMT
- Title: Vertical LoRA: Dense Expectation-Maximization Interpretation of Transformers
- Authors: Zhuolin Fu,
- Abstract summary: We show how Transformers can be interpreted as dense Expectation-Maximization algorithms performed on Bayesian Nets.
We propose a new model design paradigm, namely Vertical LoRA, which reduces the parameter count dramatically while preserving performance.
The results show that 1) with VLoRA, the Transformer model parameter count can be reduced dramatically and 2) the performance of the original model is preserved.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In this paper, we show how Transformers can be interpreted as dense Expectation-Maximization algorithms performed on Bayesian Nets. Based on the above interpretation, we propose a new model design paradigm, namely Vertical LoRA (VLoRA), which reduces the parameter count dramatically while preserving performance. In VLoRA, a model consists of layers, each of which recursively learns an increment based on the previous layer. We then apply LoRA decomposition to the increments. VLoRA works on the base model, which is orthogonal to LoRA, meaning they can be used together. We do experiments on various tasks and models. The results show that 1) with VLoRA, the Transformer model parameter count can be reduced dramatically and 2) the performance of the original model is preserved. The source code is available at \url{https://github.com/neverUseThisName/vlora}
Related papers
- ResLoRA: Identity Residual Mapping in Low-Rank Adaption [96.59370314485074]
We propose ResLoRA, an improved framework of low-rank adaptation (LoRA)
Our method can achieve better results in fewer training steps without any extra trainable parameters or inference cost compared to LoRA.
The experiments on NLG, NLU, and text-to-image tasks demonstrate the effectiveness of our method.
arXiv Detail & Related papers (2024-02-28T04:33:20Z) - Riemannian Preconditioned LoRA for Fine-Tuning Foundation Models [45.72323731094864]
Low-Rank Adaptation (LoRA) emerges as a popular parameter-efficient fine-tuning (PEFT) method.
In this work, we study the enhancement of LoRA training by introducing an $r times r$ preconditioner in each gradient step.
arXiv Detail & Related papers (2024-02-04T05:05:43Z) - LoTR: Low Tensor Rank Weight Adaptation [47.4904143988667]
We introduce LoTR, a novel approach for parameter-efficient fine-tuning of large language models (LLMs)
LoTR represents a gradient update to parameters in a form of tensor decomposition.
Simultaneous compression of a sequence of layers with low-rank tensor representation allows LoTR to archive even better parameter efficiency then LoRA especially for deep models.
arXiv Detail & Related papers (2024-02-02T13:00:38Z) - Chain of LoRA: Efficient Fine-tuning of Language Models via Residual
Learning [31.036465632204663]
We introduce Chain of LoRA, an iterative optimization framework inspired by the Frank-Wolfe algorithm.
We demonstrate that COLA can consistently outperform LoRA without additional computational or memory costs.
arXiv Detail & Related papers (2024-01-08T14:26:49Z) - S-LoRA: Serving Thousands of Concurrent LoRA Adapters [59.490751234925206]
Low-Rank Adaptation (LoRA), a parameter-efficient fine-tuning method, is often employed to adapt a base model to a multitude of tasks.
We present S-LoRA, a system designed for the scalable serving of many LoRA adapters.
arXiv Detail & Related papers (2023-11-06T17:26:17Z) - The Expressive Power of Low-Rank Adaptation [11.371811534310078]
Low-Rank Adaptation, a parameter-efficient fine-tuning method, has emerged as a prevalent technique for fine-tuning pre-trained models.
This paper takes the first step to bridge the gap by theoretically analyzing the expressive power of LoRA.
For Transformer networks, we show any model can be adapted to a target model of the same size with rank-$(fractextembedding size2)$ LoRA.
arXiv Detail & Related papers (2023-10-26T16:08:33Z) - LoftQ: LoRA-Fine-Tuning-Aware Quantization for Large Language Models [104.23434818428062]
We focus on the scenario where quantization and LoRA fine-tuning are applied together on a pre-trained model.
We propose LoftQ (LoRA-Fine-Tuning-aware Quantization), a novel quantization framework.
Experiments show that our method is highly effective and outperforms existing quantization methods.
arXiv Detail & Related papers (2023-10-12T18:34:08Z) - NOLA: Compressing LoRA using Linear Combination of Random Basis [22.76088132446952]
We introduce NOLA, which overcomes the rank one lower bound present in LoRA.
NOLA performs as well as LoRA models with much fewer number of parameters compared to LoRA with rank one, the best compression LoRA can archive.
arXiv Detail & Related papers (2023-10-04T03:30:24Z) - LoRAPrune: Pruning Meets Low-Rank Parameter-Efficient Fine-Tuning [56.88751562302793]
Low-rank adaption (LoRA) has emerged to fine-tune large language models (LLMs)
LoRAPrune is a new framework that delivers an accurate structured pruned model in a highly memory-efficient manner.
LoRAPrune achieves a reduction in perplexity by 4.81 on WikiText2 and 3.46 on PTB, while also decreasing memory usage by 52.6%.
arXiv Detail & Related papers (2023-05-28T15:15:48Z) - DyLoRA: Parameter Efficient Tuning of Pre-trained Models using Dynamic
Search-Free Low-Rank Adaptation [18.922066770467914]
Low-rank adapters (LoRA) keep the main pretrained weights of the model frozen and just introduce some learnable truncated SVD modules to the model.
While LoRA blocks are parameter-efficient, they suffer from two major problems: first, the size of these blocks is fixed and cannot be modified after training.
We introduce a dynamic low-rank adaptation (DyLoRA) technique to address these two problems together.
arXiv Detail & Related papers (2022-10-14T06:29:22Z) - LoRA: Low-Rank Adaptation of Large Language Models [71.75808607987281]
Low-Rank Adaptation, or LoRA, freezes the pre-trained model weights and injects trainable rank decomposition into each layer of the Transformer architecture.
For GPT-3, LoRA can reduce the number of trainable parameters by 10,000 times and the computation hardware requirement by 3 times compared to full fine-tuning.
arXiv Detail & Related papers (2021-06-17T17:37:18Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.