Bridging The Gap between Low-rank and Orthogonal Adaptation via Householder Reflection Adaptation
- URL: http://arxiv.org/abs/2405.17484v3
- Date: Fri, 15 Nov 2024 08:02:03 GMT
- Title: Bridging The Gap between Low-rank and Orthogonal Adaptation via Householder Reflection Adaptation
- Authors: Shen Yuan, Haotian Liu, Hongteng Xu,
- Abstract summary: Householder reflection adaptation (HRA) is a simple but effective adaptation method based on Householder reflections.
HRA achieves superior performance with fewer learnable parameters when adapting large language models and conditional image generators.
- Score: 32.371755315509574
- License:
- Abstract: While following different technical routes, both low-rank and orthogonal adaptation techniques can efficiently adapt large-scale pre-training models in specific tasks or domains based on a small piece of trainable parameters. In this study, we bridge the gap between these two techniques, proposing a simple but effective adaptation method based on Householder reflections. Given a pre-trained model, our method fine-tunes its layers by multiplying each frozen weight matrix with an orthogonal matrix constructed by a chain of learnable Householder reflections (HRs). This HR-based orthogonal fine-tuning is equivalent to an adaptive low-rank adaptation. Moreover, we show that the orthogonality of the reflection planes corresponding to the HRs impacts the model capacity and regularity. The analysis motivates us to regularize the orthogonality of the HRs, leading to different implementations of the proposed Householder reflection adaptation (HRA) method. Compared with state-of-the-art methods, HRA achieves superior performance with fewer learnable parameters when adapting large language models and conditional image generators. The code of the experiments is available at \url{https://github.com/DaShenZi721/HRA}, and the method has been merged into the \href{https://github.com/huggingface/peft}{PEFT} package.
Related papers
- Training Deep Learning Models with Norm-Constrained LMOs [56.00317694850397]
We study optimization methods that leverage the linear minimization oracle (LMO) over a norm-ball.
We propose a new family of algorithms that uses the LMO to adapt to the geometry of the problem and, perhaps surprisingly, show that they can be applied to unconstrained problems.
arXiv Detail & Related papers (2025-02-11T13:10:34Z) - Data-Parallel Neural Network Training via Nonlinearly Preconditioned Trust-Region Method [0.0]
We propose a variant of the Additively Preconditioned Trust-Region Strategy (APTS) for training deep neural networks (DNNs)
The proposed APTS method utilizes a data-parallel approach to construct a nonlinear preconditioner employed in the nonlinear optimization strategy.
We demonstrate the performance of the proposed APTS variant using the MNIST and CIFAR-10 datasets.
arXiv Detail & Related papers (2025-02-07T18:11:33Z) - ALoRE: Efficient Visual Adaptation via Aggregating Low Rank Experts [71.91042186338163]
ALoRE is a novel PETL method that reuses the hypercomplex parameterized space constructed by Kronecker product to Aggregate Low Rank Experts.
Thanks to the artful design, ALoRE maintains negligible extra parameters and can be effortlessly merged into the frozen backbone.
arXiv Detail & Related papers (2024-12-11T12:31:30Z) - Spectrum-Aware Parameter Efficient Fine-Tuning for Diffusion Models [73.88009808326387]
We propose a novel spectrum-aware adaptation framework for generative models.
Our method adjusts both singular values and their basis vectors of pretrained weights.
We introduce Spectral Ortho Decomposition Adaptation (SODA), which balances computational efficiency and representation capacity.
arXiv Detail & Related papers (2024-05-31T17:43:35Z) - Efficient Adaptation of Large Vision Transformer via Adapter
Re-Composing [8.88477151877883]
High-capacity pre-trained models have revolutionized problem-solving in computer vision.
We propose a novel Adapter Re-Composing (ARC) strategy that addresses efficient pre-trained model adaptation.
Our approach considers the reusability of adaptation parameters and introduces a parameter-sharing scheme.
arXiv Detail & Related papers (2023-10-10T01:04:15Z) - Orthogonal SVD Covariance Conditioning and Latent Disentanglement [65.67315418971688]
Inserting an SVD meta-layer into neural networks is prone to make the covariance ill-conditioned.
We propose Nearest Orthogonal Gradient (NOG) and Optimal Learning Rate (OLR)
Experiments on visual recognition demonstrate that our methods can simultaneously improve covariance conditioning and generalization.
arXiv Detail & Related papers (2022-12-11T20:31:31Z) - Improving Covariance Conditioning of the SVD Meta-layer by Orthogonality [65.67315418971688]
Nearest Orthogonal Gradient (NOG) and Optimal Learning Rate (OLR) are proposed.
Experiments on visual recognition demonstrate that our methods can simultaneously improve the covariance conditioning and generalization.
arXiv Detail & Related papers (2022-07-05T15:39:29Z) - Distributionally Robust Models with Parametric Likelihood Ratios [123.05074253513935]
Three simple ideas allow us to train models with DRO using a broader class of parametric likelihood ratios.
We find that models trained with the resulting parametric adversaries are consistently more robust to subpopulation shifts when compared to other DRO approaches.
arXiv Detail & Related papers (2022-04-13T12:43:12Z) - Parameter-efficient Model Adaptation for Vision Transformers [45.3460867776953]
We study parameter-efficient model adaptation strategies for vision transformers on the image classification task.
We propose a parameter-efficient model adaptation framework, which first selects submodules by measuring local intrinsic dimensions.
Our method performs the best in terms of the tradeoff between accuracy and parameter efficiency across 20 image classification datasets.
arXiv Detail & Related papers (2022-03-29T05:30:09Z) - Adapting by Pruning: A Case Study on BERT [9.963251767416967]
We propose a novel model adaptation paradigm, adapting by pruning, which prunes neural connections in the pre-trained model to optimise the performance on the target task.
We formulate adapting-by-pruning as an optimisation problem with a differentiable loss and propose an efficient algorithm to prune the model.
Results suggest that our method can prune up to 50% weights in BERT while yielding similar performance compared to the fine-tuned full model.
arXiv Detail & Related papers (2021-05-07T15:51:08Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.