Related papers: Globally optimized SVD compression of LLMs via Fermi-function-based rank selection and gauge fixing

Globally optimized SVD compression of LLMs via Fermi-function-based rank selection and gauge fixing

URL: http://arxiv.org/abs/2512.03062v1
Date: Wed, 26 Nov 2025 10:54:01 GMT
Title: Globally optimized SVD compression of LLMs via Fermi-function-based rank selection and gauge fixing
Authors: Roman Rausch, David Jansen, Sukhbinder Singh, Román Orús,
Abstract summary: Low-rank decompositions of Large Language Models (LLMs) are very demanding in terms of their computational resources.<n>We present two physics-inspired improvements to SVD compression: textbfFermiGrad, a gradient-descent algorithm that determines globally optimal layer-wise ranks, and textbfPivGa, an additional textitlossless compression of the low-rank factors.
Score: 0.0
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Large Language Models (LLMs) are very demanding in terms of their computational resources. Low-rank decompositions of LLM weights, e.g. via Singular Value Decomposition (SVD), is a promising approach for LLM compression, but presents several practical hurdles, e.g. selecting appropriate layer-wise ranks and getting rid of its parameter redundancy. In this work, we present two physics-inspired improvements to SVD LLM compression: (1) \textbf{FermiGrad}, a gradient-descent algorithm that determines globally optimal layer-wise ranks by relaxing the discrete singular-value truncation into a continuous optimization using the Fermi function; (2) \textbf{PivGa}, an additional \textit{lossless} compression of the low-rank factors that exploits the intrinsic gauge freedom in their parametrization.

Related papers

Parallel Diffusion Solver via Residual Dirichlet Policy Optimization [88.7827307535107]
Diffusion models (DMs) have achieved state-of-the-art generative performance but suffer from high sampling latency due to their sequential denoising nature.<n>Existing solver-based acceleration methods often face significant image quality degradation under a low-dimensional budget.<n>We propose the Ensemble Parallel Direction solver (dubbed as EPD-EPr), a novel ODE solver that mitigates these errors by incorporating multiple gradient parallel evaluations in each step.
arXiv Detail & Related papers (2025-12-28T05:48:55Z)
1+1>2: A Synergistic Sparse and Low-Rank Compression Method for Large Language Models [15.798945727818753]
We introduce underlineSynergistic underlineSparse and underlineCompression (SSLC) methods for Large Language Models (LLMs)<n>Low-rank approximation compresses the model by retaining its essential structure with minimal information loss, whereas sparse optimization eliminates non-essential weights, preserving those crucial for generalization.<n>Experiments on LLaMA and Qwen2.5 models (7B-70B) show that SSLC, without any additional training steps, consistently surpasses standalone methods, achieving state-of-the-arts results.
arXiv Detail & Related papers (2025-10-30T12:50:30Z)
FLRC: Fine-grained Low-Rank Compressor for Efficient LLM Inference [7.784124271824854]
Large language models (LLM) have enormous counts hinder deployment on resource-constrained hardware.<n>Low-rank compression can reduce both memory usage and computational demand, but applying a uniform compression ratio across all layers often leads to significant performance degradation.<n>To address these issues, we propose the Fine-grained Low-Rank parameter (FLRC) which efficiently determines an optimal rank allocation for each layer.
arXiv Detail & Related papers (2025-10-10T12:35:09Z)
MGAA: Multi-Granular Adaptive Allocation fof Low-Rank Compression of LLMs [9.244526043014098]
Multi-Granular Adaptive Allocation (MGAA) method can adaptively allocate parameters between and within sublayers without task-specific evaluations in the compression process.<n> Comprehensive evaluations of MGAA across multiple LLMs backbone models and benchmark datasets demonstrate its superior performance.
arXiv Detail & Related papers (2025-07-04T04:54:01Z)
Alignment of large language models with constrained learning [93.2264691508005]
We study the problem of computing an optimal large language model (LLM) policy for a constrained alignment problem.<n>We employ Lagrangian duality to develop an iterative dual-based alignment method that alternates between updating the policy via Lagrangian and updating a dual variable via dual descent.
arXiv Detail & Related papers (2025-05-26T01:04:56Z)
Large Language Model Compression with Global Rank and Sparsity Optimization [12.078838412963083]
Low-rank and sparse composite approximation is a natural idea to compress Large Language Models.<n>We propose a novel two-stage compression method with the capability of global rank and sparsity optimization.<n>Our method significantly surpasses state-of-the-art techniques for sparsification and composite approximation.
arXiv Detail & Related papers (2025-05-02T08:00:48Z)
Dobi-SVD: Differentiable SVD for LLM Compression and Some New Perspectives [59.46211685419206]
We argue that the optimal use of SVD lies in truncating activations, rather than merely using activations as an optimization distance.<n>We propose Dobi-SVD, which establishes a new, principled approach to SVD-based LLM compression.
arXiv Detail & Related papers (2025-02-04T21:17:51Z)
Zeroth-Order Fine-Tuning of LLMs in Random Subspaces [63.10833446782114]
As language models grow in size, memory demands for backpropagation increase.<n>Zeroth-order (ZO) optimization methods offer a memory-efficient alternative.<n>In this paper, we propose Subspace Zero-order optimization to address the challenges posed by posed by high dimensionality perturbations.
arXiv Detail & Related papers (2024-10-11T17:01:43Z)
Sketch to Adapt: Fine-Tunable Sketches for Efficient LLM Adaptation [33.05581803204543]
Adapting pre-trained large language models (LLMs) is crucial but challenging due to their enormous size.<n>We introduce SketchTune, a compressive adaptation strategy that compresses weights into compact fine-tunable sketches.<n>SketchTune is supported by mathematical insights into matrix classes that are better approximated using sketching rather than low-rank methods.
arXiv Detail & Related papers (2024-10-08T20:58:24Z)
Bypass Back-propagation: Optimization-based Structural Pruning for Large Language Models via Policy Gradient [57.9629676017527]
We propose an optimization-based structural pruning that learns the pruning masks in a probabilistic space directly by optimizing the loss of the pruned model.<n>We achieve this by learning an underlying Bernoulli distribution to sample binary pruning masks.<n>Experiments conducted on LLaMA, LLaMA-2, LLaMA-3, Vicuna, and Mistral models demonstrate the promising performance of our method in efficiency and effectiveness.
arXiv Detail & Related papers (2024-06-15T09:31:03Z)
Adaptive Feature-based Low-Rank Compression of Large Language Models via Bayesian Optimization [42.53133823994923]
Low-rank compression is a promising technique to reduce non-essential parameters in large language models.<n>We conduct empirical research on the low-rank characteristics of large models.<n>We propose a low-rank compression method suitable for large language models.
arXiv Detail & Related papers (2024-05-17T08:27:12Z)
Revisiting Zeroth-Order Optimization for Memory-Efficient LLM Fine-Tuning: A Benchmark [166.40879020706151]
This paper proposes a shift towards BP-free, zeroth-order (ZO) optimization as a solution for reducing memory costs during fine-tuning. Unlike traditional ZO-SGD methods, our work expands the exploration to a wider array of ZO optimization techniques. Our study unveils previously overlooked optimization principles, highlighting the importance of task alignment, the role of the forward gradient method, and the balance between algorithm complexity and fine-tuning performance.
arXiv Detail & Related papers (2024-02-18T14:08:48Z)

This list is automatically generated from the titles and abstracts of the papers in this site.