Related papers: RoLoRA: Fine-tuning Rotated Outlier-free LLMs for Effective Weight-Activation Quantization

RoLoRA: Fine-tuning Rotated Outlier-free LLMs for Effective Weight-Activation Quantization

URL: http://arxiv.org/abs/2407.08044v2
Date: Thu, 26 Sep 2024 23:47:03 GMT
Title: RoLoRA: Fine-tuning Rotated Outlier-free LLMs for Effective Weight-Activation Quantization
Authors: Xijie Huang, Zechun Liu, Shih-Yang Liu, Kwang-Ting Cheng,
Abstract summary: We propose RoLoRA, the first LoRA-based scheme for effective weight-activation quantization. We evaluate RoLoRA across LLaMA2-7B/13B, LLaMA3-8B models, achieving up to 29.5% absolute accuracy gain of 4-bit weight-activation quantized LLaMA2- 13B.
Score: 38.23587031169402
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: Low-Rank Adaptation (LoRA), as a representative Parameter-Efficient Fine-Tuning (PEFT)method, significantly enhances the training efficiency by updating only a small portion of the weights in Large Language Models (LLMs). Recently, weight-only quantization techniques have also been applied to LoRA methods to reduce the memory footprint of fine-tuning. However, applying weight-activation quantization to the LoRA pipeline is under-explored, and we observe substantial performance degradation primarily due to the presence of activation outliers. In this work, we propose RoLoRA, the first LoRA-based scheme for effective weight-activation quantization. RoLoRA utilizes rotation for outlier elimination and proposes rotation-aware fine-tuning to preserve the outlier-free characteristics in rotated LLMs. Experimental results show RoLoRA consistently improves low-bit LoRA convergence and post-training quantization robustness in weight-activation settings. We evaluate RoLoRA across LLaMA2-7B/13B, LLaMA3-8B models, achieving up to 29.5% absolute accuracy gain of 4-bit weight-activation quantized LLaMA2- 13B on commonsense reasoning tasks compared to LoRA baseline. We further demonstrate its effectiveness on Large Multimodal Models (LLaVA-1.5-7B). Codes are available at https://github.com/HuangOwen/RoLoRA

Related papers

RefLoRA: Refactored Low-Rank Adaptation for Efficient Fine-Tuning of Large Models [39.656014609027494]
Low-Rank Adaptation (LoRA) lowers the computational and memory overhead of fine-tuning large models by updating a low-dimensional subspace of the pre-trained weight matrix.<n>This article identifies the optimal low-rank factorization per step that minimizes an upper bound on the loss.<n>The resultanted low-rank adaptation (RefLoRA) method promotes a flatter loss landscape, along with consistent and balanced weight updates.
arXiv Detail & Related papers (2025-05-24T21:33:16Z)
BeamLoRA: Beam-Constraint Low-Rank Adaptation [51.52097743781401]
Low-Rank Adaptation (LoRA) has been widely adopted as one of the most effective parameter-efficient fine-tuning methods. We propose BeamLoRA, which conceptualizes each LoRA module as a beam where each rank naturally corresponds to a potential sub-solution.
arXiv Detail & Related papers (2025-02-19T10:33:22Z)
RepLoRA: Reparameterizing Low-Rank Adaptation via the Perspective of Mixture of Experts [37.43961020113692]
Low-rank adaptation (LoRA) has emerged as a powerful method for fine-tuning large-scale foundation models. This paper presents a theoretical analysis of LoRA by examining its connection to the Mixture of Experts models.
arXiv Detail & Related papers (2025-02-05T10:03:09Z)
Robust Federated Finetuning of LLMs via Alternating Optimization of LoRA [14.789886179102425]
BERT-Efficient Fine-Tuning (PEFT) methods like Low-Rank Adaptation (LoRA) optimize federated training by reducing computational and communication costs. We propose RoLoRA, a federated framework using alternating optimization to fine-tune LoRA adapters.
arXiv Detail & Related papers (2025-02-03T19:02:00Z)
RoRA: Efficient Fine-Tuning of LLM with Reliability Optimization for Rank Adaptation [59.34193580856381]
Low-Rank Adaptation (LoRA) is widely used and effective for fine-tuning large language models. We propose RoRA (Rank-adaptive Reliability Optimization), a simple yet effective method for optimizing LoRA's scaling factor. RoRA ensures improved performance as rank size increases and excels in the more challenging task of accuracy recovery when fine-tuning pruned models.
arXiv Detail & Related papers (2025-01-08T07:13:52Z)
LoRA Done RITE: Robust Invariant Transformation Equilibration for LoRA Optimization [78.93425154518705]
Low-rank adaption (LoRA) is a widely used parameter-efficient finetuning method for LLM that reduces memory requirements. This paper introduces LoRA-RITE, a novel adaptive matrix preconditioning method for LoRA optimization.
arXiv Detail & Related papers (2024-10-27T22:57:12Z)
Randomized Asymmetric Chain of LoRA: The First Meaningful Theoretical Framework for Low-Rank Adaptation [58.288682735160585]
Low-Rank Adaptation (LoRA) is a popular technique for finetuning models. LoRA often under performs when compared to full- parameter fine-tuning. We present a framework that rigorously analyzes the adaptation rates of LoRA methods.
arXiv Detail & Related papers (2024-10-10T18:51:53Z)
Bone: Block-Affine Adaptation of Large Language Models [0.0]
Low-Rank Adaptation (LoRA) has achieved remarkable training results by freezing the original weights and training only low-rank matrices. This paper introduces a novel PEFT technique distinct from LoRA, called Block-Affine Adaptation (Bone) Bone significantly reduces memory usage and achieves faster computation.
arXiv Detail & Related papers (2024-09-19T10:26:42Z)
ResLoRA: Identity Residual Mapping in Low-Rank Adaption [96.59370314485074]
We propose ResLoRA, an improved framework of low-rank adaptation (LoRA) Our method can achieve better results in fewer training steps without any extra trainable parameters or inference cost compared to LoRA. The experiments on NLG, NLU, and text-to-image tasks demonstrate the effectiveness of our method.
arXiv Detail & Related papers (2024-02-28T04:33:20Z)
DoRA: Weight-Decomposed Low-Rank Adaptation [57.68678247436207]
We introduce a novel weight decomposition analysis to investigate the inherent differences between FT and LoRA. Aiming to resemble the learning capacity of FT from the findings, we propose Weight-Decomposed Low-Rank Adaptation (DoRA) DoRA decomposes the pre-trained weight into two components, magnitude and direction, for fine-tuning.
arXiv Detail & Related papers (2024-02-14T17:59:34Z)
LoRA-drop: Efficient LoRA Parameter Pruning based on Output Evaluation [27.123271324468657]
Low-Rank Adaptation (LoRA) is currently the most commonly used. efficient fine-tuning (PEFT) method. It introduces auxiliary parameters for each layer to fine-tune the pre-trained model under limited computing resources. However, it still faces resource consumption challenges when scaling up to larger models.
arXiv Detail & Related papers (2024-02-12T15:34:56Z)
LoRA-FA: Memory-efficient Low-rank Adaptation for Large Language Models Fine-tuning [19.08716369943138]
We present LoRA-FA, a memory-efficient fine-tuning method that reduces the activation memory without performance degradation and expensive recomputation. Our results show that LoRA-FA can always achieve close fine-tuning accuracy across different tasks compared to full parameter fine-tuning and LoRA.
arXiv Detail & Related papers (2023-08-07T05:12:27Z)
LoRAPrune: Structured Pruning Meets Low-Rank Parameter-Efficient Fine-Tuning [56.88751562302793]
Low-rank adaption (LoRA) has emerged to fine-tune large language models (LLMs) LoRAPrune is a new framework that delivers an accurate structured pruned model in a highly memory-efficient manner. LoRAPrune achieves a reduction in perplexity by 4.81 on WikiText2 and 3.46 on PTB, while also decreasing memory usage by 52.6%.
arXiv Detail & Related papers (2023-05-28T15:15:48Z)

This list is automatically generated from the titles and abstracts of the papers in this site.