HOFT: Householder Orthogonal Fine-tuning
- URL: http://arxiv.org/abs/2505.16531v1
- Date: Thu, 22 May 2025 11:20:35 GMT
- Title: HOFT: Householder Orthogonal Fine-tuning
- Authors: Alejandro Moreno Arcas, Albert Sanchis, Jorge Civera, Alfons Juan,
- Abstract summary: Householder Orthogonal Fine-tuning (HOFT) and Scaled Householder Orthogonal Fine-tuning (SHOFT) are evaluated.<n>Compared with state-of-the-art adaptation methods, HOFT and SHOFT show comparable or better results.
- Score: 45.8130844084218
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Adaptation of foundation models using low-rank methods is a widespread approach. Another way to adapt these models is to employ orthogonal fine-tuning methods, which are less time and memory efficient despite their good generalization properties. In this work, we propose Householder Orthogonal Fine-tuning (HOFT), a novel orthogonal fine-tuning method that aims to alleviate time and space complexity. Moreover, some theoretical properties of the orthogonal fine-tuning paradigm are explored. From this exploration, Scaled Householder Orthogonal Fine-tuning (SHOFT) is proposed. Both HOFT and SHOFT are evaluated in downstream tasks, namely commonsense reasoning, machine translation, subject-driven generation and mathematical reasoning. Compared with state-of-the-art adaptation methods, HOFT and SHOFT show comparable or better results.
Related papers
- Divergence Minimization Preference Optimization for Diffusion Model Alignment [58.651951388346525]
Divergence Minimization Preference Optimization (DMPO) is a principled method for aligning diffusion models by minimizing reverse KL divergence.<n>Our results show that diffusion models fine-tuned with DMPO can consistently outperform or match existing techniques.<n>DMPO unlocks a robust and elegant pathway for preference alignment, bridging principled theory with practical performance in diffusion models.
arXiv Detail & Related papers (2025-07-10T07:57:30Z) - Guiding Time-Varying Generative Models with Natural Gradients on Exponential Family Manifold [5.000311680307273]
We show that the evolution of time-varying generative models can be projected onto an exponential family manifold.<n>We then train the generative model by moving its projection on the manifold according to the natural gradient descent scheme.<n>We propose particle versions of the algorithm, which feature closed-form update rules for any parametric model within the exponential family.
arXiv Detail & Related papers (2025-02-11T15:39:47Z) - Training Deep Learning Models with Norm-Constrained LMOs [56.00317694850397]
We study optimization methods that leverage the linear minimization oracle (LMO) over a norm-ball.<n>We propose a new family of algorithms that uses the LMO to adapt to the geometry of the problem and, perhaps surprisingly, show that they can be applied to unconstrained problems.
arXiv Detail & Related papers (2025-02-11T13:10:34Z) - Faster WIND: Accelerating Iterative Best-of-$N$ Distillation for LLM Alignment [81.84950252537618]
This paper reveals a unified game-theoretic connection between iterative BOND and self-play alignment.<n>We establish a novel framework, WIN rate Dominance (WIND), with a series of efficient algorithms for regularized win rate dominance optimization.
arXiv Detail & Related papers (2024-10-28T04:47:39Z) - Bridging The Gap between Low-rank and Orthogonal Adaptation via Householder Reflection Adaptation [32.371755315509574]
Householder reflection adaptation (HRA) is a simple but effective adaptation method based on Householder reflections.
HRA achieves superior performance with fewer learnable parameters when adapting large language models and conditional image generators.
arXiv Detail & Related papers (2024-05-24T16:18:16Z) - Parameter Efficient Quasi-Orthogonal Fine-Tuning via Givens Rotation [20.47507483613317]
One representative line of fine-tuning methods is Orthogonal Fine-tuning (OFT)
OFT rigorously preserves the angular distances within the parameter space to preserve the pretrained knowledge.
We propose quasi-Givens Orthogonal Fine-Tuning (qGOFT) to address the problems.
arXiv Detail & Related papers (2024-04-05T15:28:44Z) - An Optimization-based Deep Equilibrium Model for Hyperspectral Image
Deconvolution with Convergence Guarantees [71.57324258813675]
We propose a novel methodology for addressing the hyperspectral image deconvolution problem.
A new optimization problem is formulated, leveraging a learnable regularizer in the form of a neural network.
The derived iterative solver is then expressed as a fixed-point calculation problem within the Deep Equilibrium framework.
arXiv Detail & Related papers (2023-06-10T08:25:16Z) - MACE: An Efficient Model-Agnostic Framework for Counterfactual
Explanation [132.77005365032468]
We propose a novel framework of Model-Agnostic Counterfactual Explanation (MACE)
In our MACE approach, we propose a novel RL-based method for finding good counterfactual examples and a gradient-less descent method for improving proximity.
Experiments on public datasets validate the effectiveness with better validity, sparsity and proximity.
arXiv Detail & Related papers (2022-05-31T04:57:06Z) - Recent advances in Bayesian optimization with applications to parameter
reconstruction in optical nano-metrology [0.0]
reconstruction is a common problem in optical nano metrology.
We present a Bayesian Target Vector Optimization scheme which combines two approaches.
We find that the presented method generally uses fewer calls of the model function than any of the competing schemes to achieve similar reconstruction performance.
arXiv Detail & Related papers (2021-07-12T15:32:15Z) - Deep Contrastive Graph Representation via Adaptive Homotopy Learning [76.22904270821778]
Homotopy model is an excellent tool exploited by diverse research works in the field of machine learning.
We propose a novel adaptive homotopy framework (AH) in which the Maclaurin duality is employed.
AH can be widely utilized to enhance the homotopy-based algorithm.
arXiv Detail & Related papers (2021-06-17T04:46:04Z) - Adapting by Pruning: A Case Study on BERT [9.963251767416967]
We propose a novel model adaptation paradigm, adapting by pruning, which prunes neural connections in the pre-trained model to optimise the performance on the target task.
We formulate adapting-by-pruning as an optimisation problem with a differentiable loss and propose an efficient algorithm to prune the model.
Results suggest that our method can prune up to 50% weights in BERT while yielding similar performance compared to the fine-tuned full model.
arXiv Detail & Related papers (2021-05-07T15:51:08Z) - Scaling Hidden Markov Language Models [118.55908381553056]
This work revisits the challenge of scaling HMMs to language modeling datasets.
We propose methods for scaling HMMs to massive state spaces while maintaining efficient exact inference, a compact parameterization, and effective regularization.
arXiv Detail & Related papers (2020-11-09T18:51:55Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.