Zero Sum SVD: Balancing Loss Sensitivity for Low Rank LLM Compression
- URL: http://arxiv.org/abs/2602.02848v1
- Date: Mon, 02 Feb 2026 21:51:01 GMT
- Title: Zero Sum SVD: Balancing Loss Sensitivity for Low Rank LLM Compression
- Authors: Ali Abbasi, Chayne Thrash, Haoran Qin, Shansita Sharma, Sepehr Seifi, Soheil Kolouri,
- Abstract summary: We propose textbfZero Sum SVD (textbfZS-SVD), a post-training method that performs singular component selection in whitened coordinates.<n>textbfZS-SVD prunes components across the whole model with a textbfzero sum rule that keeps the cumulative predicted loss change near zero.<n>Experiments show consistent gains across diverse benchmarks and compression ratios.
- Score: 11.908793753919745
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Advances in large language models have driven strong performance across many tasks, but their memory and compute costs still hinder deployment. SVD-based compression reduces storage and can speed up inference via low-rank factors, yet performance depends on how rank is allocated under a global compression ratio. Prior methods often use homogeneous ranks for similarly sized matrices, despite large differences in loss sensitivity, or rely on expensive iterative pre-truncation optimization to determine per matrix ranks. We propose \textbf{Zero Sum SVD} (\textbf{ZS-SVD}), a post-training method that performs \emph{global} singular component selection using activation whitening and first-order calibration loss estimates in whitened coordinates. \textbf{ZS-SVD} prunes components across the whole model with a \textbf{zero sum} rule that keeps the cumulative predicted loss change near zero, automatically yielding heterogeneous ranks without solving a rank allocation optimization. Motivated by evidence that gradients near pretrained solutions exhibit low rank structure, we also introduce an optional lightweight correction that applies a \textbf{single} projected gradient update after truncation, followed by re-truncation. Extensive experiments across multiple LLM architectures show consistent gains across diverse benchmarks and compression ratios. Code is available at https://github.com/mint-vu/Zero-Sum-SVD
Related papers
- Don't be so Stief! Learning KV Cache low-rank approximation over the Stiefel manifold [7.162701793686856]
StiefAttention is a KV-cache compression method that learns emphorthonormal projection bases by directly minimizing output reconstruction error.<n>It outperforms EigenAttention by $11.9$ points on C4 perplexity and $5.4%$ on 0-shot MMLU accuracy at iso-compression, lower relative error and higher cosine similarity with respect to the original decoder-layer outputs.
arXiv Detail & Related papers (2026-01-29T13:19:24Z) - Low-Rank Compression of Language Models via Differentiable Rank Selection [22.99526059495007]
We propose Learning to Low-Rank Compress (LLRC), a gradient-based approach which directly learns the weights of masks that select singular values in a fine-tuning-free setting.<n>Our approach outperforms competing ranking selection methods that similarly require no post-compression fine-tuning across various compression rates on common-sense reasoning and open-domain question-answering tasks.
arXiv Detail & Related papers (2025-12-14T07:20:57Z) - ARA: Adaptive Rank Allocation for Efficient Large Language Model SVD Compression [23.58843227762227]
In large language model (LLM) compression, singular value decomposition (SVD) is a widely studied and adopted low-rank decomposition technique.<n>Under a global compression ratio constraint, determining the appropriate rank for different linear modules becomes a critical problem.<n>We propose an Adaptive Rank Allocation (ARA) method to address this problem.
arXiv Detail & Related papers (2025-10-22T09:05:47Z) - Logits Replay + MoClip: Stabilized, Low-Cost Post-Training with Minimal Forgetting [6.653834890554154]
We introduce Logits Replay + MoClip, a framework that compresses supervision in the logit space and stabilizes optimization at the update level.<n> Empirically, our method improves domain performance on Communication Technology tasks while mitigating forgetting on general benchmarks.
arXiv Detail & Related papers (2025-10-10T08:55:32Z) - PLUMAGE: Probabilistic Low rank Unbiased Min Variance Gradient Estimator for Efficient Large Model Training [21.695928776150808]
Accelerator memory and networking constraints have emerged as dominant bottlenecks when training large language models.<n>We propose PLUMAGE: Probabilistic Low rank Unbiased Minimum vAriance Gradient Estor.<n>We empirically demonstrate that PLUMAGE shrinks the full rank optimization's gap over the pre training evaluation loss by 33% on average across models and the average training loss across the GLUE benchmark by 28% within a similar computational and memory footprint as GaloRE.
arXiv Detail & Related papers (2025-05-23T19:17:55Z) - FFT-based Dynamic Subspace Selection for Low-Rank Adaptive Optimization of Large Language Models [49.397861654088636]
We propose a two-step procedure to approximate SVD/QR-based gradient projections into lower-dimensional spaces.<n>We show that our strategy achieves faster runtime and reduced memory usage by up to $25%$ across different model sizes.
arXiv Detail & Related papers (2025-05-23T14:37:00Z) - FedSVD: Adaptive Orthogonalization for Private Federated Learning with LoRA [68.44043212834204]
Low-Rank Adaptation (LoRA) is widely used for efficient fine-tuning of language models in learning (FL)<n>Low-Rank Adaptation (LoRA) is widely used for efficient fine-tuning of language models in learning (FL)
arXiv Detail & Related papers (2025-05-19T07:32:56Z) - AdaSVD: Adaptive Singular Value Decomposition for Large Language Models [75.1196637934987]
Singular Value Decomposition (SVD) has emerged as a promising compression technique for large language models (LLMs)<n>Existing SVD-based methods often struggle to effectively mitigate the errors introduced by SVD truncation.<n>We propose AdaSVD, an adaptive SVD-based LLM compression approach.
arXiv Detail & Related papers (2025-02-03T14:34:37Z) - Zeroth-Order Fine-Tuning of LLMs in Random Subspaces [63.10833446782114]
As language models grow in size, memory demands for backpropagation increase.<n>Zeroth-order (ZO) optimization methods offer a memory-efficient alternative.<n>In this paper, we propose Subspace Zero-order optimization to address the challenges posed by posed by high dimensionality perturbations.
arXiv Detail & Related papers (2024-10-11T17:01:43Z) - Why Approximate Matrix Square Root Outperforms Accurate SVD in Global
Covariance Pooling? [59.820507600960745]
We propose a new GCP meta-layer that uses SVD in the forward pass, and Pad'e Approximants in the backward propagation to compute the gradients.
The proposed meta-layer has been integrated into different CNN models and achieves state-of-the-art performances on both large-scale and fine-grained datasets.
arXiv Detail & Related papers (2021-05-06T08:03:45Z) - Learning Low-rank Deep Neural Networks via Singular Vector Orthogonality
Regularization and Singular Value Sparsification [53.50708351813565]
We propose SVD training, the first method to explicitly achieve low-rank DNNs during training without applying SVD on every step.
We empirically show that SVD training can significantly reduce the rank of DNN layers and achieve higher reduction on computation load under the same accuracy.
arXiv Detail & Related papers (2020-04-20T02:40:43Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.