Related papers: PoLAR: Polar-Decomposed Low-Rank Adapter Representation

PoLAR: Polar-Decomposed Low-Rank Adapter Representation

URL: http://arxiv.org/abs/2506.03133v2
Date: Fri, 31 Oct 2025 14:49:07 GMT
Title: PoLAR: Polar-Decomposed Low-Rank Adapter Representation
Authors: Kai Lion, Liang Zhang, Bingcong Li, Niao He,
Abstract summary: We show that low-rank adaptation of large-scale models suffers from a low stable rank that is well below the linear algebraic rank of the subspace.<n>To mitigate the underutilization of the allocated subspace, we propose PoLAR, a parameterization inspired by the polar decomposition.<n>Our theory shows that PoLAR yields an exponentially faster convergence rate on a canonical low-rank adaptation problem.
Score: 33.33809836042973
License: http://creativecommons.org/licenses/by/4.0/
Abstract: We show that low-rank adaptation of large-scale models suffers from a low stable rank that is well below the linear algebraic rank of the subspace, degrading fine-tuning performance. To mitigate the underutilization of the allocated subspace, we propose PoLAR, a parameterization inspired by the polar decomposition that factorizes the low-rank update into two direction matrices constrained to Stiefel manifolds and an unconstrained scale matrix. Our theory shows that PoLAR yields an exponentially faster convergence rate on a canonical low-rank adaptation problem. Pairing the parameterization with Riemannian optimization leads to consistent gains on three different benchmarks testing general language understanding, commonsense reasoning, and mathematical problem solving with base model sizes ranging from 350M to 27B.

Related papers

ODELoRA: Training Low-Rank Adaptation by Solving Ordinary Differential Equations [54.886931928255564]
Low-rank adaptation (LoRA) has emerged as a widely adopted parameter-efficient fine-tuning method in deep transfer learning.<n>We propose a novel continuous-time optimization dynamic for LoRA factor matrices in the form of an ordinary differential equation (ODE)<n>We show that ODELoRA achieves stable feature learning, a property that is crucial for training deep neural networks at different scales of problem dimensionality.
arXiv Detail & Related papers (2026-02-07T10:19:36Z)
Manifold constrained steepest descent [0.0]
We propose emphManifold Constrained Steepest Descent (MCSD), a single-loop framework for optimization over manifold.<n>We also introduce emphSPEL, the spectral specialization of MCSD on the Stiefel manifold.
arXiv Detail & Related papers (2026-01-29T10:08:37Z)
LoRA meets Riemannion: Muon Optimizer for Parametrization-independent Low-Rank Adapters [43.04933165005961]
We present a novel framework for Low-Rank Adaptation (LoRA)<n>LoRA geometrically treats low-rank adapters by optimizing them directly on the fixed-rank manifold.<n>Our framework integrates three key components to achieve this.
arXiv Detail & Related papers (2025-07-16T11:17:12Z)
LSR-Adapt: Ultra-Efficient Parameter Tuning with Matrix Low Separation Rank Kernel Adaptation [3.9426000822656224]
Low rank based adaptation has become increasingly challenging due to the sheer scale of modern large language models.<n>We propose an effective kernelization to further reduce the number of parameters required for adaptation tasks.<n>We achieve state-of-the-art performance with even higher accuracy with almost half the number of parameters as compared to conventional low rank based methods.
arXiv Detail & Related papers (2025-02-19T09:20:47Z)
Zeroth-Order Fine-Tuning of LLMs in Random Subspaces [66.27334633749734]
As language models grow in size, memory demands for backpropagation increase. Zeroth-order (ZOZO) optimization methods offer a memory-efficient alternative. We show that SubZero enhances fine-tuning and achieves faster results compared to standard ZOZO approaches.
arXiv Detail & Related papers (2024-10-11T17:01:43Z)
LoRTA: Low Rank Tensor Adaptation of Large Language Models [70.32218116940393]
Low Rank Adaptation (LoRA) is a popular Efficient Fine Tuning (PEFT) method.<n>We propose a higher-order Candecomp/Parafac (CP) decomposition, enabling a more compact and flexible representation.<n>Our method can achieve a reduction in the number of parameters while maintaining comparable performance.
arXiv Detail & Related papers (2024-10-05T06:59:50Z)
Convergence of the majorized PAM method with subspace correction for low-rank composite factorization model [0.44241702149260353]
This paper focuses on the convergence certificates of the majorized proximal alternating minimization (PAM) method with subspace correction.<n>We establish the full convergence of the sequence and column subspace sequences of factor pairs generated by the PAM.<n> Numerical comparison with the popular proximal alternating linearized minimization (PALM) method is conducted on one-bit matrix completion problems.
arXiv Detail & Related papers (2024-06-07T02:33:22Z)
Stable Nonconvex-Nonconcave Training via Linear Interpolation [51.668052890249726]
This paper presents a theoretical analysis of linearahead as a principled method for stabilizing (large-scale) neural network training. We argue that instabilities in the optimization process are often caused by the nonmonotonicity of the loss landscape and show how linear can help by leveraging the theory of nonexpansive operators.
arXiv Detail & Related papers (2023-10-20T12:45:12Z)
Low-Rank Mirror-Prox for Nonsmooth and Low-Rank Matrix Optimization Problems [17.384717824118255]
Low-rank and nonsmooth matrix optimization problems capture many fundamental tasks in statistics and machine learning.<n>In this paper we consider standard convex relaxations for such problems.<n>We prove that under a textitstrict complementarity condition and under the relatively mild assumption that the nonsmooth objective can be written as a maximum of smooth functions, approximated variants of two popular textitmirror-prox methods.
arXiv Detail & Related papers (2022-06-23T08:10:54Z)
Semi-Supervised Subspace Clustering via Tensor Low-Rank Representation [64.49871502193477]
We propose a novel semi-supervised subspace clustering method, which is able to simultaneously augment the initial supervisory information and construct a discriminative affinity matrix. Comprehensive experimental results on six commonly-used benchmark datasets demonstrate the superiority of our method over state-of-the-art methods.
arXiv Detail & Related papers (2022-05-21T01:47:17Z)
Intermediate Layer Optimization for Inverse Problems using Deep Generative Models [86.29330440222199]
ILO is a novel optimization algorithm for solving inverse problems with deep generative models. We empirically show that our approach outperforms state-of-the-art methods introduced in StyleGAN-2 and PULSE for a wide range of inverse problems.
arXiv Detail & Related papers (2021-02-15T06:52:22Z)
Mixed-Projection Conic Optimization: A New Paradigm for Modeling Rank Constraints [3.179831861897336]
We provide a framework for solving low-rank optimization problems to certifiable optimality. Our framework also provides near-optimal solutions through rounding and local search techniques.
arXiv Detail & Related papers (2020-09-22T08:59:06Z)
Multi-View Spectral Clustering Tailored Tensor Low-Rank Representation [105.33409035876691]
This paper explores the problem of multi-view spectral clustering (MVSC) based on tensor low-rank modeling. We design a novel structured tensor low-rank norm tailored to MVSC. We show that the proposed method outperforms state-of-the-art methods to a significant extent.
arXiv Detail & Related papers (2020-04-30T11:52:12Z)
Support recovery and sup-norm convergence rates for sparse pivotal estimation [79.13844065776928]
In high dimensional sparse regression, pivotal estimators are estimators for which the optimal regularization parameter is independent of the noise level. We show minimax sup-norm convergence rates for non smoothed and smoothed, single task and multitask square-root Lasso-type estimators.
arXiv Detail & Related papers (2020-01-15T16:11:04Z)

This list is automatically generated from the titles and abstracts of the papers in this site.