Memory-Efficient Orthogonal Fine-Tuning with Principal Subspace Adaptation
- URL: http://arxiv.org/abs/2505.11235v1
- Date: Fri, 16 May 2025 13:26:48 GMT
- Title: Memory-Efficient Orthogonal Fine-Tuning with Principal Subspace Adaptation
- Authors: Fei Wu, Jia Hu, Geyong Min, Shiqiang Wang,
- Abstract summary: We propose Memory-efficient Orthogonal Fine-Tuning (MOFT) with principal subspace adaptation.<n>We show that MOFT consistently outperforms key baselines while significantly reducing the memory footprint of orthogonal fine-tuning.
- Score: 40.69348434971122
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Driven by the relentless growth in model parameters, which renders full fine-tuning prohibitively expensive for large-scale deployment, parameter-efficient fine-tuning (PEFT) has emerged as a crucial approach for rapidly adapting large models to a wide range of downstream tasks. Among the PEFT family, orthogonal fine-tuning and its variants have demonstrated remarkable performance by preserving hyperspherical energy, which encodes pairwise angular similarity between neurons. However, these methods are inherently memory-inefficient due to the need to store intermediate activations from multiple full-dimensional sparse matrices. To address this limitation, we propose Memory-efficient Orthogonal Fine-Tuning (MOFT) with principal subspace adaptation. Specifically, we first establish a theoretical condition under which orthogonal transformations within a low-rank subspace preserve hyperspherical energy. Based on this insight, we constrain orthogonal fine-tuning to the principal subspace defined by the top-r components obtained through singular value decomposition and impose an additional constraint on the projection matrix to satisfy the preservation condition. To enhance MOFT's flexibility across tasks, we relax strict orthogonality by introducing two learnable scaling vectors. Extensive experiments on 37 diverse tasks and four models across NLP and CV demonstrate that MOFT consistently outperforms key baselines while significantly reducing the memory footprint of orthogonal fine-tuning.
Related papers
- Adaptive Linear Embedding for Nonstationary High-Dimensional Optimization [0.0]
Self-Adaptive embedding REMBO (SA-REMBO) is a novel framework that generalizes Random EMbedding Bayesian Optimization (REMBO) to support multiple random Gaussian embeddings.<n>An index variable governs the embedding choice and is jointly modeled with the latent latent via a product kernel in a surrogate.<n>We empirically demonstrate the advantage of our method across synthetic and real-world high-dimensional benchmarks, where traditional REMBO and other low-rank BO methods fail.
arXiv Detail & Related papers (2025-05-16T14:18:19Z) - Generalized Tensor-based Parameter-Efficient Fine-Tuning via Lie Group Transformations [50.010924231754856]
Adapting pre-trained foundation models for diverse downstream tasks is a core practice in artificial intelligence.<n>To overcome this, parameter-efficient fine-tuning (PEFT) methods like LoRA have emerged and are becoming a growing research focus.<n>We propose a generalization that extends matrix-based PEFT methods to higher-dimensional parameter spaces without compromising their structural properties.
arXiv Detail & Related papers (2025-04-01T14:36:45Z) - Variationally optimizing infinite projected entangled-pair states at large bond dimensions: A split corner transfer matrix renormalization group approach [0.2796197251957244]
We introduce an alternative "split-CTMRG" algorithm, which maintains separate PEPS layers and leverages new environment tensors, reducing computational complexity while preserving accuracy.<n> Benchmarks on quantum lattice models demonstrate substantial speedups for variational energy optimization, rendering this method valuable for large-scale PEPS simulations.
arXiv Detail & Related papers (2025-02-14T16:59:33Z) - Sparse Gradient Compression for Fine-Tuning Large Language Models [58.44973963468691]
Fine-tuning large language models (LLMs) for downstream tasks has become increasingly crucial due to their widespread use and the growing availability of open-source models.<n>High memory costs associated with fine-tuning remain a significant challenge, especially as models increase in size.<n>We propose sparse compression gradient (SGC) to address these limitations.
arXiv Detail & Related papers (2025-02-01T04:18:28Z) - Efficient Adaptive Optimization via Subset-Norm and Subspace-Momentum: Fast, Memory-Reduced Training with Convergence Guarantees [5.399838579600896]
We introduce two complementary techniques for memory optimization.
One technique, Subset-Norm, reduces the momentum state's memory footprint by a low-dimensional subspace.
The other technique, Subspace-Momentum, reduces the momentum state's memory footprint by a low-dimensional subspace.
arXiv Detail & Related papers (2024-11-11T16:48:07Z) - Lineshape Optimization in Inhomogeneous $Λ$-type Quantum Memory [0.0]
Photonic quantum memory is a crucial elementary operation in photonic quantum information processing.
We focus on inhomogeneously broadened ensembles of $Lambda$-type quantum emitters, which have long coherence lifetimes and broad bandwidth compatibility.
We investigate the properties of electromagnetically induced transparency (EIT) for a survey of inhomogeneous lineshapes that are straightforward to realize experimentally.
We compare the optimal EIT efficiency to the well-known atomic frequency comb (AFC) protocol, which also relies on spectral shaping of the inhomogeneous broadening.
arXiv Detail & Related papers (2024-05-22T21:43:15Z) - Double Duality: Variational Primal-Dual Policy Optimization for
Constrained Reinforcement Learning [132.7040981721302]
We study the Constrained Convex Decision Process (MDP), where the goal is to minimize a convex functional of the visitation measure.
Design algorithms for a constrained convex MDP faces several challenges, including handling the large state space.
arXiv Detail & Related papers (2024-02-16T16:35:18Z) - Parameter Efficient Fine-tuning via Cross Block Orchestration for Segment Anything Model [81.55141188169621]
We equip PEFT with a cross-block orchestration mechanism to enable the adaptation of the Segment Anything Model (SAM) to various downstream scenarios.
We propose an intra-block enhancement module, which introduces a linear projection head whose weights are generated from a hyper-complex layer.
Our proposed approach consistently improves the segmentation performance significantly on novel scenarios with only around 1K additional parameters.
arXiv Detail & Related papers (2023-11-28T11:23:34Z) - Parameter-Efficient Orthogonal Finetuning via Butterfly Factorization [102.92240148504774]
We study a principled finetuning paradigm -- Orthogonal Finetuning (OFT) -- for downstream task adaptation.
Despite demonstrating good generalizability, OFT still uses a fairly large number of trainable parameters.
We apply this parameterization to OFT, creating a novel parameter-efficient finetuning method, called Orthogonal Butterfly (BOFT)
arXiv Detail & Related papers (2023-11-10T18:59:54Z) - Parametric Level-sets Enhanced To Improve Reconstruction (PaLEnTIR) [0.0]
We introduce PaLEnTIR, a significantly enhanced parametric level-set (PaLS) method addressing the restoration and reconstruction of piecewise constant objects.
Our key contribution involves a unique PaLS formulation utilizing a single level-set function to restore scenes containing multi-contrast piecewise-constant objects.
arXiv Detail & Related papers (2022-04-21T00:03:44Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.