Parameter Efficient Fine-tuning via Explained Variance Adaptation
- URL: http://arxiv.org/abs/2410.07170v4
- Date: Wed, 21 May 2025 07:57:40 GMT
- Title: Parameter Efficient Fine-tuning via Explained Variance Adaptation
- Authors: Fabian Paischer, Lukas Hauzenberger, Thomas Schmied, Benedikt Alkin, Marc Peter Deisenroth, Sepp Hochreiter,
- Abstract summary: We introduce Explained Variance Adaptation (EVA), a scheme that uses the directions capturing the most activation variance.<n>We apply EVA to a variety of fine-tuning tasks as language generation and understanding, image classification, and reinforcement learning.
- Score: 13.585425242072173
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Foundation models (FMs) are pre-trained on large-scale datasets and then fine-tuned for a specific downstream task. The most common fine-tuning method is to update pretrained weights via low-rank adaptation (LoRA). Existing initialization strategies for LoRA often rely on singular value decompositions (SVD) of gradients or weight matrices. However, they do not provably maximize the expected gradient signal, which is critical for fast adaptation. To this end, we introduce Explained Variance Adaptation (EVA), an initialization scheme that uses the directions capturing the most activation variance, provably maximizing the expected gradient signal and accelerating fine-tuning. EVA performs incremental SVD on minibatches of activation vectors and selects the right-singular vectors for initialization once they converged. Further, by selecting the directions that capture the most activation-variance for a given rank budget, EVA accommodates adaptive ranks that reduce the number of trainable parameters, while maintaining or improving downstream performance. We apply EVA to a variety of fine-tuning tasks as language generation and understanding, image classification, and reinforcement learning. EVA exhibits faster convergence than competitors and achieves the highest average score across a multitude of tasks per domain while reducing the number of trainable parameters through rank redistribution.
Related papers
- OSoRA: Output-Dimension and Singular-Value Initialized Low-Rank Adaptation [9.048461365342204]
We present OSoRA, a novel PEFT method for Large Language Models (LLMs)<n>OSoRA substantially reduces computational resource requirements by minimizing the number of trainable parameters during fine-tuning.<n> Comprehensive evaluations across mathematical reasoning, common sense reasoning, and other benchmarks demonstrate that OSoRA achieves comparable or superior performance to state-of-the-art methods.
arXiv Detail & Related papers (2025-05-20T13:34:06Z) - FedSVD: Adaptive Orthogonalization for Private Federated Learning with LoRA [61.79405341803085]
Low-Rank Adaptation (LoRA) is widely used for efficient fine-tuning of language models in federated learning (FL)<n>Low-Rank Adaptation (LoRA) is widely used for efficient fine-tuning of language models in federated learning (FL)
arXiv Detail & Related papers (2025-05-19T07:32:56Z) - DoTA: Weight-Decomposed Tensor Adaptation for Large Language Models [33.4538652558253]
Low-rank adaptation (LoRA) reduces the computational and memory demands of fine-tuning large language models (LLMs) by approximating updates with low-rank matrices.
We propose Weight-Decomposed Adaptation (DoTA), which leverages the Matrix Product Operator (MPO) decomposition of pre-trained weights.
We also introduce QDoTA, a quantized version of DoTA designed for 4-bit quantization.
arXiv Detail & Related papers (2024-12-30T12:00:47Z) - Revisiting the Initial Steps in Adaptive Gradient Descent Optimization [6.468625143772815]
Adaptive gradient optimization methods, such as Adam, are prevalent in training deep neural networks across diverse machine learning tasks.<n>These methods often suffer from suboptimal generalization compared to descent gradient (SGD) and exhibit instability.<n>We introduce simple yet effective solutions: initializing the second-order moment estimation with non-zero values.
arXiv Detail & Related papers (2024-12-03T04:28:14Z) - NEAT: Nonlinear Parameter-efficient Adaptation of Pre-trained Models [26.808251361020066]
Fine-tuning pre-trained models is resource-intensive and laborious.
One widely adopted PEFT technique, Low-Rank Adaptation (LoRA), freezes the pre-trained model weights.
NEAT introduces a lightweight neural network that takes pre-trained weights as input and learns a nonlinear transformation to approximate cumulative weight updates.
arXiv Detail & Related papers (2024-10-02T17:29:23Z) - SARA: Singular-Value Based Adaptive Low-Rank Adaption [4.135688713311511]
LoRA as a parameter-efficient fine-tuning(PEFT) method is widely used for not adding inference overhead.
In this work, we first analyze the relationship between the performance of different layers and their ranks using SVD.
Based on this, we design the Singular-Value Based Adaptive Low-Rank Adaption(SARA)
arXiv Detail & Related papers (2024-08-06T16:39:42Z) - Bayesian-LoRA: LoRA based Parameter Efficient Fine-Tuning using Optimal Quantization levels and Rank Values trough Differentiable Bayesian Gates [21.811889512977924]
It is a common practice in natural language processing to pre-train a single model and then fine-tune it for downstream tasks.
B-LoRA is able to fine-tune a pre-trained model on a specific downstream task, finding the optimal rank values and quantization levels for every low-rank matrix.
B-LoRA performs on par with or better than the baselines while reducing the total number of bit operations by roughly 70%.
arXiv Detail & Related papers (2024-06-18T20:26:30Z) - CorDA: Context-Oriented Decomposition Adaptation of Large Language Models for Task-Aware Parameter-Efficient Fine-tuning [101.81127587760831]
Current fine-tuning methods build adapters widely of the context of downstream task to learn, or the context of important knowledge to maintain.
We propose CorDA, a Context-oriented Decomposition Adaptation method that builds learnable task-aware adapters.
Our method enables two options, the knowledge-preserved adaptation and the instruction-previewed adaptation.
arXiv Detail & Related papers (2024-06-07T19:10:35Z) - AutoLoRA: Automatically Tuning Matrix Ranks in Low-Rank Adaptation Based on Meta Learning [31.975038164401404]
Low-rank adaptation (LoRA) finetunes low-rank incremental update matrices on top of frozen pretrained weights.
We introduce AutoLoRA, a framework for automatically identifying the optimal rank of each LoRA layer.
Our experiments on natural language understanding, generation, and sequence labeling demonstrate the effectiveness of AutoLoRA.
arXiv Detail & Related papers (2024-03-14T05:29:35Z) - ConvLoRA and AdaBN based Domain Adaptation via Self-Training [4.006331916849688]
We propose Convolutional Low-Rank Adaptation (ConvLoRA) for multi-target domain adaptation.
ConvLoRA freezes pre-trained model weights, adds trainable low-rank decomposition matrices to convolutional layers, and backpropagates the gradient.
Our method has fewer trainable parameters and performs better or on-par with large independent fine-tuned networks.
arXiv Detail & Related papers (2024-02-07T15:43:50Z) - Riemannian Preconditioned LoRA for Fine-Tuning Foundation Models [45.72323731094864]
Low-Rank Adaptation (LoRA) emerges as a popular parameter-efficient fine-tuning (PEFT) method.
In this work, we study the enhancement of LoRA training by introducing an $r times r$ preconditioner in each gradient step.
arXiv Detail & Related papers (2024-02-04T05:05:43Z) - PRILoRA: Pruned and Rank-Increasing Low-Rank Adaptation [65.268245109828]
We introduce PRILoRA, which linearly allocates a different rank for each layer, in an increasing manner, and performs pruning throughout the training process.
We validate the effectiveness of PRILoRA through extensive experiments on eight GLUE benchmarks, setting a new state of the art.
arXiv Detail & Related papers (2024-01-20T20:25:17Z) - Consensus-Adaptive RANSAC [104.87576373187426]
We propose a new RANSAC framework that learns to explore the parameter space by considering the residuals seen so far via a novel attention layer.
The attention mechanism operates on a batch of point-to-model residuals, and updates a per-point estimation state to take into account the consensus found through a lightweight one-step transformer.
arXiv Detail & Related papers (2023-07-26T08:25:46Z) - InRank: Incremental Low-Rank Learning [85.6380047359139]
gradient-based training implicitly regularizes neural networks towards low-rank solutions through a gradual increase of the rank during training.
Existing training algorithms do not exploit the low-rank property to improve computational efficiency.
We design a new training algorithm Incremental Low-Rank Learning (InRank), which explicitly expresses cumulative weight updates as low-rank matrices.
arXiv Detail & Related papers (2023-06-20T03:03:04Z) - AdaLoRA: Adaptive Budget Allocation for Parameter-Efficient Fine-Tuning [143.23123791557245]
Fine-tuning large pre-trained language models on downstream tasks has become an important paradigm in NLP.
We propose AdaLoRA, which adaptively allocates the parameter budget among weight matrices according to their importance score.
We conduct extensive experiments with several pre-trained models on natural language processing, question answering, and natural language generation to validate the effectiveness of AdaLoRA.
arXiv Detail & Related papers (2023-03-18T22:36:25Z) - Adaptive Distribution Calibration for Few-Shot Learning with
Hierarchical Optimal Transport [78.9167477093745]
We propose a novel distribution calibration method by learning the adaptive weight matrix between novel samples and base classes.
Experimental results on standard benchmarks demonstrate that our proposed plug-and-play model outperforms competing approaches.
arXiv Detail & Related papers (2022-10-09T02:32:57Z) - Adan: Adaptive Nesterov Momentum Algorithm for Faster Optimizing Deep Models [134.83964935755964]
In deep learning, different kinds of deep networks typically need different extrapolations, which have to be chosen after multiple trials.<n>To relieve this issue and consistently improve the model training speed deep networks, we propose the ADAtive Nesterov momentum Transformer.
arXiv Detail & Related papers (2022-08-13T16:04:39Z) - Learning to Re-weight Examples with Optimal Transport for Imbalanced
Classification [74.62203971625173]
Imbalanced data pose challenges for deep learning based classification models.
One of the most widely-used approaches for tackling imbalanced data is re-weighting.
We propose a novel re-weighting method based on optimal transport (OT) from a distributional point of view.
arXiv Detail & Related papers (2022-08-05T01:23:54Z) - Disentangling Generative Factors of Physical Fields Using Variational
Autoencoders [0.0]
This work explores the use of variational autoencoders (VAEs) for non-linear dimension reduction.
A disentangled decomposition is interpretable and can be transferred to a variety of tasks including generative modeling.
arXiv Detail & Related papers (2021-09-15T16:02:43Z) - Direct Evolutionary Optimization of Variational Autoencoders With Binary
Latents [0.0]
We show that it is possible to train Variational Autoencoders (VAEs) with discrete latents without sampling-based approximation and re parameterization.
In contrast to large supervised networks, the here investigated VAEs can, e.g., denoise a single image without previous training on clean data and/or training on large image datasets.
arXiv Detail & Related papers (2020-11-27T12:42:12Z) - MaxVA: Fast Adaptation of Step Sizes by Maximizing Observed Variance of
Gradients [112.00379151834242]
We propose adaptive learning rate principle, in which the running mean of squared gradient in Adam is replaced by a weighted mean, with weights chosen to maximize the estimated variance each coordinate.
This results in faster adaptation, which leads more desirable empirical convergence behaviors.
arXiv Detail & Related papers (2020-06-21T21:47:43Z) - Dynamic Scale Training for Object Detection [111.33112051962514]
We propose a Dynamic Scale Training paradigm (abbreviated as DST) to mitigate scale variation challenge in object detection.
Experimental results demonstrate the efficacy of our proposed DST towards scale variation handling.
It does not introduce inference overhead and could serve as a free lunch for general detection configurations.
arXiv Detail & Related papers (2020-04-26T16:48:17Z) - Learning Low-rank Deep Neural Networks via Singular Vector Orthogonality
Regularization and Singular Value Sparsification [53.50708351813565]
We propose SVD training, the first method to explicitly achieve low-rank DNNs during training without applying SVD on every step.
We empirically show that SVD training can significantly reduce the rank of DNN layers and achieve higher reduction on computation load under the same accuracy.
arXiv Detail & Related papers (2020-04-20T02:40:43Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.