Related papers: Efficient Orthogonal Fine-Tuning with Principal Subspace Adaptation

Efficient Orthogonal Fine-Tuning with Principal Subspace Adaptation

URL: http://arxiv.org/abs/2505.11235v2
Date: Fri, 26 Sep 2025 16:35:01 GMT
Title: Efficient Orthogonal Fine-Tuning with Principal Subspace Adaptation
Authors: Fei Wu, Jia Hu, Geyong Min, Shiqiang Wang,
Abstract summary: We propose efficient Orthogonal Fine-Tuning with Principal Subspace adaptation (PSOFT) for parameter-efficient fine-tuning.<n>Experiments on 35 NLP and CV tasks demonstrate that PSOFT offers a practical and scalable solution to simultaneously achieve semantic preservation, expressiveness, and multi-dimensional efficiency in PEFT.
Score: 43.719298075378425
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Driven by the rapid growth of model parameters, parameter-efficient fine-tuning (PEFT) has become essential for adapting large models to diverse downstream tasks under constrained computational resources. Within this paradigm, orthogonal fine-tuning and its variants preserve semantic representations of pre-trained models, but struggle to achieve both expressiveness and efficiency in terms of parameter counts, memory, and computation. To overcome this limitation, we propose efficient Orthogonal Fine-Tuning with Principal Subspace adaptation (PSOFT), which confines orthogonal transformations to the principal subspace of pre-trained weights. Specifically, PSOFT constructs this subspace via matrix decomposition to enable compatible transformations with higher effective rank, establishes a theoretical condition that strictly maintains the geometry of this subspace for essential semantic preservation, and introduces efficient tunable vectors that gradually relax orthogonality during training to enhance adaptability. Extensive experiments on 35 NLP and CV tasks across four representative models demonstrate that PSOFT offers a practical and scalable solution to simultaneously achieve semantic preservation, expressiveness, and multi-dimensional efficiency in PEFT. The code is publicly available at https://github.com/fei407/PSOFT.

Related papers

Adaptive Linear Embedding for Nonstationary High-Dimensional Optimization [0.0]
Self-Adaptive embedding REMBO (SA-REMBO) is a novel framework that generalizes Random EMbedding Bayesian Optimization (REMBO) to support multiple random Gaussian embeddings.<n>An index variable governs the embedding choice and is jointly modeled with the latent latent via a product kernel in a surrogate.<n>We empirically demonstrate the advantage of our method across synthetic and real-world high-dimensional benchmarks, where traditional REMBO and other low-rank BO methods fail.
arXiv Detail & Related papers (2025-05-16T14:18:19Z)
Generalized Tensor-based Parameter-Efficient Fine-Tuning via Lie Group Transformations [50.010924231754856]
Adapting pre-trained foundation models for diverse downstream tasks is a core practice in artificial intelligence.<n>To overcome this, parameter-efficient fine-tuning (PEFT) methods like LoRA have emerged and are becoming a growing research focus.<n>We propose a generalization that extends matrix-based PEFT methods to higher-dimensional parameter spaces without compromising their structural properties.
arXiv Detail & Related papers (2025-04-01T14:36:45Z)
Variationally optimizing infinite projected entangled-pair states at large bond dimensions: A split corner transfer matrix renormalization group approach [0.2796197251957244]
We introduce an alternative "split-CTMRG" algorithm, which maintains separate PEPS layers and leverages new environment tensors, reducing computational complexity while preserving accuracy.<n> Benchmarks on quantum lattice models demonstrate substantial speedups for variational energy optimization, rendering this method valuable for large-scale PEPS simulations.
arXiv Detail & Related papers (2025-02-14T16:59:33Z)
Sparse Gradient Compression for Fine-Tuning Large Language Models [58.44973963468691]
Fine-tuning large language models (LLMs) for downstream tasks has become increasingly crucial due to their widespread use and the growing availability of open-source models.<n>High memory costs associated with fine-tuning remain a significant challenge, especially as models increase in size.<n>We propose sparse compression gradient (SGC) to address these limitations.
arXiv Detail & Related papers (2025-02-01T04:18:28Z)
Transformed Low-rank Adaptation via Tensor Decomposition and Its Applications to Text-to-image Models [32.68721299475496]
Low-Rank Adaptation (LoRA) and its variants have gained significant attention due to their effectiveness.<n>We propose a new PEFT method that combines two classes of adaptations, namely, transform and residual adaptations.<n>Experiments are conducted on fine-tuning Stable Diffusion models in subject-driven and controllable generation.
arXiv Detail & Related papers (2025-01-15T11:10:37Z)
tCURLoRA: Tensor CUR Decomposition Based Low-Rank Parameter Adaptation and Its Application in Medical Image Segmentation [1.3037269687250654]
Transfer learning, by leveraging knowledge from pre-trained models, has significantly enhanced the performance of target tasks.<n>As deep neural networks scale up, full fine-tuning introduces substantial computational and storage challenges.<n>We propose tCURLoRA, a novel fine-tuning method based on tensor CUR decomposition.
arXiv Detail & Related papers (2025-01-04T08:25:32Z)
Efficient Adaptive Optimization via Subset-Norm and Subspace-Momentum: Fast, Memory-Reduced Training with Convergence Guarantees [5.399838579600896]
We introduce two complementary techniques for memory optimization. One technique, Subset-Norm, reduces the momentum state's memory footprint by a low-dimensional subspace. The other technique, Subspace-Momentum, reduces the momentum state's memory footprint by a low-dimensional subspace.
arXiv Detail & Related papers (2024-11-11T16:48:07Z)
TOAST: Transformer Optimization using Adaptive and Simple Transformations [40.311292704886235]
We introduce TOAST, a framework that exploits redundancies to approximate entire transformer blocks with lightweight closed-form mappings.<n>Results show that large portions of transformer depth can be replaced by trivial functions, opening a new perspective on efficient foundation models.
arXiv Detail & Related papers (2024-10-07T11:35:24Z)
Spectrum-Aware Parameter Efficient Fine-Tuning for Diffusion Models [73.88009808326387]
We propose a novel spectrum-aware adaptation framework for generative models. Our method adjusts both singular values and their basis vectors of pretrained weights. We introduce Spectral Ortho Decomposition Adaptation (SODA), which balances computational efficiency and representation capacity.
arXiv Detail & Related papers (2024-05-31T17:43:35Z)
Lineshape Optimization in Inhomogeneous $Λ$-type Quantum Memory [0.0]
Photonic quantum memory is a crucial elementary operation in photonic quantum information processing. We focus on inhomogeneously broadened ensembles of $Lambda$-type quantum emitters, which have long coherence lifetimes and broad bandwidth compatibility. We investigate the properties of electromagnetically induced transparency (EIT) for a survey of inhomogeneous lineshapes that are straightforward to realize experimentally. We compare the optimal EIT efficiency to the well-known atomic frequency comb (AFC) protocol, which also relies on spectral shaping of the inhomogeneous broadening.
arXiv Detail & Related papers (2024-05-22T21:43:15Z)
Double Duality: Variational Primal-Dual Policy Optimization for Constrained Reinforcement Learning [132.7040981721302]
We study the Constrained Convex Decision Process (MDP), where the goal is to minimize a convex functional of the visitation measure. Design algorithms for a constrained convex MDP faces several challenges, including handling the large state space.
arXiv Detail & Related papers (2024-02-16T16:35:18Z)
Sparse is Enough in Fine-tuning Pre-trained Large Language Models [98.46493578509039]
We propose a gradient-based sparse fine-tuning algorithm, named Sparse Increment Fine-Tuning (SIFT) We validate its effectiveness on a range of tasks including the GLUE Benchmark and Instruction-tuning.
arXiv Detail & Related papers (2023-12-19T06:06:30Z)
Parameter Efficient Fine-tuning via Cross Block Orchestration for Segment Anything Model [81.55141188169621]
We equip PEFT with a cross-block orchestration mechanism to enable the adaptation of the Segment Anything Model (SAM) to various downstream scenarios. We propose an intra-block enhancement module, which introduces a linear projection head whose weights are generated from a hyper-complex layer. Our proposed approach consistently improves the segmentation performance significantly on novel scenarios with only around 1K additional parameters.
arXiv Detail & Related papers (2023-11-28T11:23:34Z)
Parameter-Efficient Orthogonal Finetuning via Butterfly Factorization [102.92240148504774]
We study a principled finetuning paradigm -- Orthogonal Finetuning (OFT) -- for downstream task adaptation. Despite demonstrating good generalizability, OFT still uses a fairly large number of trainable parameters. We apply this parameterization to OFT, creating a novel parameter-efficient finetuning method, called Orthogonal Butterfly (BOFT)
arXiv Detail & Related papers (2023-11-10T18:59:54Z)
Parametric Level-sets Enhanced To Improve Reconstruction (PaLEnTIR) [0.0]
We introduce PaLEnTIR, a significantly enhanced parametric level-set (PaLS) method addressing the restoration and reconstruction of piecewise constant objects. Our key contribution involves a unique PaLS formulation utilizing a single level-set function to restore scenes containing multi-contrast piecewise-constant objects.
arXiv Detail & Related papers (2022-04-21T00:03:44Z)
Efficient Semantic Image Synthesis via Class-Adaptive Normalization [116.63715955932174]
Class-adaptive normalization (CLADE) is a lightweight but equally-effective variant that is only adaptive to semantic class. We introduce intra-class positional map encoding calculated from semantic layouts to modulate the normalization parameters of CLADE. The proposed CLADE can be generalized to different SPADE-based methods while achieving comparable generation quality compared to SPADE.
arXiv Detail & Related papers (2020-12-08T18:59:32Z)

This list is automatically generated from the titles and abstracts of the papers in this site.