MiLoRA: Harnessing Minor Singular Components for Parameter-Efficient LLM Finetuning
- URL: http://arxiv.org/abs/2406.09044v3
- Date: Sun, 02 Mar 2025 04:45:56 GMT
- Title: MiLoRA: Harnessing Minor Singular Components for Parameter-Efficient LLM Finetuning
- Authors: Hanqing Wang, Yixia Li, Shuo Wang, Guanhua Chen, Yun Chen,
- Abstract summary: We propose MiLoRA, a simple yet effective LLM finetuning approach that only updates the minor singular components of the weight matrix.<n>It is observed that the minor matrix corresponds to the noisy or long-tail information, while the principal matrix contains important knowledge.<n>During finetuning, MiLoRA makes the most use of the less-optimized subspace for learning the labeled dataset.
- Score: 16.67302585857681
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Efficient finetuning of large language models (LLMs) aims to adapt the LLMs with reduced computational and memory cost. Previous LoRA-based approaches initialize the low-rank matrices with Gaussian distribution and zero values while keeping the original weight matrices frozen. However, the trainable model parameters optimized in an unguided subspace might interfere with the well-learned subspace of the pretrained weight matrices. In this paper, we propose MiLoRA, a simple yet effective LLM finetuning approach that only updates the minor singular components of the weight matrix while keeping the principal singular components frozen. It is observed that the minor matrix corresponds to the noisy or long-tail information, while the principal matrix contains important knowledge. The MiLoRA initializes the low-rank matrices within a subspace that is orthogonal to the principal matrix, thus the pretrained knowledge is expected to be well preserved. During finetuning, MiLoRA makes the most use of the less-optimized subspace for learning the labeled dataset. Extensive experiments on commonsense reasoning, math reasoning, instruction following and visual instruction following benchmarks present the superior performance of our method.
Related papers
- ConsNoTrainLoRA: Data-driven Weight Initialization of Low-rank Adapters using Constraints [64.35580479051208]
In previous works, low-rank adapters (LoRA) are randomly with a fixed rank across all attachment points.<n>In this paper, we improve convergence and final performance of LoRA fine-tuning using our proposed data-driven weight initialization method.
arXiv Detail & Related papers (2025-07-09T23:52:31Z) - DiffoRA: Enabling Parameter-Efficient LLM Fine-Tuning via Differential Low-Rank Matrix Adaptation [32.369133126167085]
We propose a new PEFT scheme called DiffoRA, which is theoretically grounded and enables module-wise adoption of LoRA.
At the core of our DiffoRA lies a Differential Adaptation Matrix (DAM) to determine which module is the most suitable and essential for fine-tuning.
Our approach achieves the best model accuracy over all the state-of-the-art baselines across various benchmarks.
arXiv Detail & Related papers (2025-02-13T02:41:34Z) - DoTA: Weight-Decomposed Tensor Adaptation for Large Language Models [33.4538652558253]
Low-rank adaptation (LoRA) reduces the computational and memory demands of fine-tuning large language models (LLMs) by approximating updates with low-rank matrices.
We propose Weight-Decomposed Adaptation (DoTA), which leverages the Matrix Product Operator (MPO) decomposition of pre-trained weights.
We also introduce QDoTA, a quantized version of DoTA designed for 4-bit quantization.
arXiv Detail & Related papers (2024-12-30T12:00:47Z) - Expanding Sparse Tuning for Low Memory Usage [103.43560327427647]
We propose a method named SNELL (Sparse tuning with kerNELized LoRA) for sparse tuning with low memory usage.
To achieve low memory usage, SNELL decomposes the tunable matrix for sparsification into two learnable low-rank matrices.
A competition-based sparsification mechanism is further proposed to avoid the storage of tunable weight indexes.
arXiv Detail & Related papers (2024-11-04T04:58:20Z) - Locating Information in Large Language Models via Random Matrix Theory [0.0]
We analyze the weight matrices of pretrained transformer models BERT and Llama.
deviations emerge after training, allowing us to locate learned structures within the models.
Our findings reveal that, after fine-tuning, small singular values play a crucial role in the models' capabilities.
arXiv Detail & Related papers (2024-10-23T11:19:08Z) - One Initialization to Rule them All: Fine-tuning via Explained Variance Adaptation [13.585425242072173]
Most commonly used fine-tuning method is to update the pre-trained weights via a low-rank adaptation (LoRA)
We propose to enhance LoRA by initializing the new weights in a data-driven manner by computing singular value decomposition on minibatches of activation.
We apply EVA to a variety of fine-tuning tasks ranging from language generation and understanding to image classification and reinforcement learning.
arXiv Detail & Related papers (2024-10-09T17:59:06Z) - SVFit: Parameter-Efficient Fine-Tuning of Large Pre-Trained Models Using Singular Values [12.137869917556415]
Large pre-trained models (LPMs) have demonstrated exceptional performance in diverse natural language processing and computer vision tasks.
fully fine-tuning these models poses substantial memory challenges, particularly in resource-constrained environments.
We propose SVFit, a novel PEFT approach that leverages singular value decomposition (SVD) to low-rank matrices using critical singular values as trainable parameters.
arXiv Detail & Related papers (2024-09-09T08:44:53Z) - Memory-Efficient LLM Training with Online Subspace Descent [8.393403749426097]
We propose Online Subspace Descent, a new family of subspace descent without singular value decomposition (SVD)
Online Subspace Descent is flexible and introduces only minimum overhead to training.
We show that for the task of pretraining LLaMA models ranging from 60M to 7B parameters on the C4 dataset, Online Subspace Descent achieves lower perplexity and better downstream tasks performance.
arXiv Detail & Related papers (2024-08-23T05:54:53Z) - From GaLore to WeLore: How Low-Rank Weights Non-uniformly Emerge from Low-Rank Gradients [86.40635601953446]
We study the emergence of low-rank structures across different layers of Modern Large Language Models.
We present Weight Low-Rank Projection (WeLore) that unifies weight compression and memory-efficient fine-tuning as ONE.
arXiv Detail & Related papers (2024-07-15T21:05:20Z) - MoRA: High-Rank Updating for Parameter-Efficient Fine-Tuning [105.11844150736536]
Low-rank adaptation is a popular parameter-efficient fine-tuning method for large language models.
We propose a new method called MoRA, which employs a square matrix to achieve high-rank updating while maintaining the same number of trainable parameters.
Our method outperforms LoRA on memory-intensive tasks and achieves comparable performance on other tasks.
arXiv Detail & Related papers (2024-05-20T15:48:32Z) - Curated LLM: Synergy of LLMs and Data Curation for tabular augmentation in low-data regimes [57.62036621319563]
We introduce CLLM, which leverages the prior knowledge of Large Language Models (LLMs) for data augmentation in the low-data regime.
We demonstrate the superior performance of CLLM in the low-data regime compared to conventional generators.
arXiv Detail & Related papers (2023-12-19T12:34:46Z) - LQ-LoRA: Low-rank Plus Quantized Matrix Decomposition for Efficient Language Model Finetuning [66.85589263870702]
Our approach uses an iterative algorithm to decompose each pretrained matrix into a high-precision low-rank component and a memory-efficient quantized component.
Experiments on finetuning RoBERTa and LLaMA-2 demonstrate that our low-rank plus quantized matrix decomposition approach (LQ-LoRA) outperforms strong QLoRA and GPTQ-LoRA baselines.
arXiv Detail & Related papers (2023-11-20T18:57:41Z) - Spectral Entry-wise Matrix Estimation for Low-Rank Reinforcement
Learning [53.445068584013896]
We study matrix estimation problems arising in reinforcement learning (RL) with low-rank structure.
In low-rank bandits, the matrix to be recovered specifies the expected arm rewards, and for low-rank Markov Decision Processes (MDPs), it may for example characterize the transition kernel of the MDP.
We show that simple spectral-based matrix estimation approaches efficiently recover the singular subspaces of the matrix and exhibit nearly-minimal entry-wise error.
arXiv Detail & Related papers (2023-10-10T17:06:41Z) - Weighted Low Rank Matrix Approximation and Acceleration [0.5177947445379687]
Low-rank matrix approximation is one of the central concepts in machine learning.
Low-rank matrix completion (LRMC) solves the LRMA problem when some observations are missing.
We propose an algorithm for solving the weighted problem, as well as two acceleration techniques.
arXiv Detail & Related papers (2021-09-22T22:03:48Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.