Bridging The Gap between Low-rank and Orthogonal Adaptation via Householder Reflection Adaptation
- URL: http://arxiv.org/abs/2405.17484v3
- Date: Fri, 15 Nov 2024 08:02:03 GMT
- Title: Bridging The Gap between Low-rank and Orthogonal Adaptation via Householder Reflection Adaptation
- Authors: Shen Yuan, Haotian Liu, Hongteng Xu,
- Abstract summary: Householder reflection adaptation (HRA) is a simple but effective adaptation method based on Householder reflections.
HRA achieves superior performance with fewer learnable parameters when adapting large language models and conditional image generators.
- Score: 32.371755315509574
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: While following different technical routes, both low-rank and orthogonal adaptation techniques can efficiently adapt large-scale pre-training models in specific tasks or domains based on a small piece of trainable parameters. In this study, we bridge the gap between these two techniques, proposing a simple but effective adaptation method based on Householder reflections. Given a pre-trained model, our method fine-tunes its layers by multiplying each frozen weight matrix with an orthogonal matrix constructed by a chain of learnable Householder reflections (HRs). This HR-based orthogonal fine-tuning is equivalent to an adaptive low-rank adaptation. Moreover, we show that the orthogonality of the reflection planes corresponding to the HRs impacts the model capacity and regularity. The analysis motivates us to regularize the orthogonality of the HRs, leading to different implementations of the proposed Householder reflection adaptation (HRA) method. Compared with state-of-the-art methods, HRA achieves superior performance with fewer learnable parameters when adapting large language models and conditional image generators. The code of the experiments is available at \url{https://github.com/DaShenZi721/HRA}, and the method has been merged into the \href{https://github.com/huggingface/peft}{PEFT} package.
Related papers
- ODELoRA: Training Low-Rank Adaptation by Solving Ordinary Differential Equations [54.886931928255564]
Low-rank adaptation (LoRA) has emerged as a widely adopted parameter-efficient fine-tuning method in deep transfer learning.<n>We propose a novel continuous-time optimization dynamic for LoRA factor matrices in the form of an ordinary differential equation (ODE)<n>We show that ODELoRA achieves stable feature learning, a property that is crucial for training deep neural networks at different scales of problem dimensionality.
arXiv Detail & Related papers (2026-02-07T10:19:36Z) - Leveraging Parameter Space Symmetries for Reasoning Skill Transfer in LLMs [27.978175136002005]
Task arithmetic is a powerful technique for transferring skills between Large Language Models (LLMs)<n>We first align the models' parameter spaces, leveraging the inherent permutation, rotation, and scaling symmetries of Transformer architectures.<n>We successfully transfer advanced reasoning skills to a non-reasoning model.
arXiv Detail & Related papers (2025-11-13T23:20:57Z) - DoRAN: Stabilizing Weight-Decomposed Low-Rank Adaptation via Noise Injection and Auxiliary Networks [47.58150560549918]
Weight-Decomposed Low-Rank Adaptation (DoRA) has been shown to improve both the learning capacity and training stability of the vanilla Low-Rank Adaptation (LoRA) method.<n>We propose DoRAN, a new variant of DoRA designed to further stabilize training and boost the sample efficiency of DoRA.
arXiv Detail & Related papers (2025-10-05T19:27:48Z) - Aligning Distributionally Robust Optimization with Practical Deep Learning Needs [70.87757502315293]
While traditional Learning (DL) methods treat all samples equally, a significant gap exists between DRO and current DL practices.<n>This paper aims to bridge the gap by introducing an adaptive algorithm for a modified DRO objective that can handle weight assignment groups.
arXiv Detail & Related papers (2025-08-22T18:17:44Z) - HOFT: Householder Orthogonal Fine-tuning [45.8130844084218]
Householder Orthogonal Fine-tuning (HOFT) and Scaled Householder Orthogonal Fine-tuning (SHOFT) are evaluated.<n>Compared with state-of-the-art adaptation methods, HOFT and SHOFT show comparable or better results.
arXiv Detail & Related papers (2025-05-22T11:20:35Z) - Training Deep Learning Models with Norm-Constrained LMOs [56.00317694850397]
We study optimization methods that leverage the linear minimization oracle (LMO) over a norm-ball.
We propose a new family of algorithms that uses the LMO to adapt to the geometry of the problem and, perhaps surprisingly, show that they can be applied to unconstrained problems.
arXiv Detail & Related papers (2025-02-11T13:10:34Z) - Data-Parallel Neural Network Training via Nonlinearly Preconditioned Trust-Region Method [0.0]
We propose a variant of the Additively Preconditioned Trust-Region Strategy (APTS) for training deep neural networks (DNNs)
The proposed APTS method utilizes a data-parallel approach to construct a nonlinear preconditioner employed in the nonlinear optimization strategy.
We demonstrate the performance of the proposed APTS variant using the MNIST and CIFAR-10 datasets.
arXiv Detail & Related papers (2025-02-07T18:11:33Z) - ALoRE: Efficient Visual Adaptation via Aggregating Low Rank Experts [71.91042186338163]
ALoRE is a novel PETL method that reuses the hypercomplex parameterized space constructed by Kronecker product to Aggregate Low Rank Experts.
Thanks to the artful design, ALoRE maintains negligible extra parameters and can be effortlessly merged into the frozen backbone.
arXiv Detail & Related papers (2024-12-11T12:31:30Z) - Joint Demonstration and Preference Learning Improves Policy Alignment with Human Feedback [58.049113055986375]
We develop a single stage approach named Alignment with Integrated Human Feedback (AIHF) to train reward models and the policy.
The proposed approach admits a suite of efficient algorithms, which can easily reduce to, and leverage, popular alignment algorithms.
We demonstrate the efficiency of the proposed solutions with extensive experiments involving alignment problems in LLMs and robotic control problems in MuJoCo.
arXiv Detail & Related papers (2024-06-11T01:20:53Z) - Spectrum-Aware Parameter Efficient Fine-Tuning for Diffusion Models [73.88009808326387]
We propose a novel spectrum-aware adaptation framework for generative models.
Our method adjusts both singular values and their basis vectors of pretrained weights.
We introduce Spectral Ortho Decomposition Adaptation (SODA), which balances computational efficiency and representation capacity.
arXiv Detail & Related papers (2024-05-31T17:43:35Z) - Efficient Adaptation of Large Vision Transformer via Adapter
Re-Composing [8.88477151877883]
High-capacity pre-trained models have revolutionized problem-solving in computer vision.
We propose a novel Adapter Re-Composing (ARC) strategy that addresses efficient pre-trained model adaptation.
Our approach considers the reusability of adaptation parameters and introduces a parameter-sharing scheme.
arXiv Detail & Related papers (2023-10-10T01:04:15Z) - Interpretable Ensembles of Hyper-Rectangles as Base Models [5.482532589225552]
A new ensemble-based model with uniformly generated axis-parallel hyper-rectangles as base models (HRBM) is proposed.
It is proposed to incorporate HRBMs into the gradient boosting machine (GBM)
arXiv Detail & Related papers (2023-03-15T13:50:36Z) - Orthogonal SVD Covariance Conditioning and Latent Disentanglement [65.67315418971688]
Inserting an SVD meta-layer into neural networks is prone to make the covariance ill-conditioned.
We propose Nearest Orthogonal Gradient (NOG) and Optimal Learning Rate (OLR)
Experiments on visual recognition demonstrate that our methods can simultaneously improve covariance conditioning and generalization.
arXiv Detail & Related papers (2022-12-11T20:31:31Z) - Improving Covariance Conditioning of the SVD Meta-layer by Orthogonality [65.67315418971688]
Nearest Orthogonal Gradient (NOG) and Optimal Learning Rate (OLR) are proposed.
Experiments on visual recognition demonstrate that our methods can simultaneously improve the covariance conditioning and generalization.
arXiv Detail & Related papers (2022-07-05T15:39:29Z) - Distributionally Robust Models with Parametric Likelihood Ratios [123.05074253513935]
Three simple ideas allow us to train models with DRO using a broader class of parametric likelihood ratios.
We find that models trained with the resulting parametric adversaries are consistently more robust to subpopulation shifts when compared to other DRO approaches.
arXiv Detail & Related papers (2022-04-13T12:43:12Z) - Parameter-efficient Model Adaptation for Vision Transformers [45.3460867776953]
We study parameter-efficient model adaptation strategies for vision transformers on the image classification task.
We propose a parameter-efficient model adaptation framework, which first selects submodules by measuring local intrinsic dimensions.
Our method performs the best in terms of the tradeoff between accuracy and parameter efficiency across 20 image classification datasets.
arXiv Detail & Related papers (2022-03-29T05:30:09Z) - Adapting by Pruning: A Case Study on BERT [9.963251767416967]
We propose a novel model adaptation paradigm, adapting by pruning, which prunes neural connections in the pre-trained model to optimise the performance on the target task.
We formulate adapting-by-pruning as an optimisation problem with a differentiable loss and propose an efficient algorithm to prune the model.
Results suggest that our method can prune up to 50% weights in BERT while yielding similar performance compared to the fine-tuned full model.
arXiv Detail & Related papers (2021-05-07T15:51:08Z) - AI-SARAH: Adaptive and Implicit Stochastic Recursive Gradient Methods [7.486132958737807]
We present an adaptive variance reduced method with an implicit approach for adaptivity.
We provide convergence guarantees for finite-sum minimization problems and show a faster convergence than SARAH can be achieved if local geometry permits.
This algorithm implicitly computes step-size and efficiently estimates local Lipschitz smoothness of functions.
arXiv Detail & Related papers (2021-02-19T01:17:15Z) - Adaptive Gradient Method with Resilience and Momentum [120.83046824742455]
We propose an Adaptive Gradient Method with Resilience and Momentum (AdaRem)
AdaRem adjusts the parameter-wise learning rate according to whether the direction of one parameter changes in the past is aligned with the direction of the current gradient.
Our method outperforms previous adaptive learning rate-based algorithms in terms of the training speed and the test error.
arXiv Detail & Related papers (2020-10-21T14:49:00Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.