DipSVD: Dual-importance Protected SVD for Efficient LLM Compression
- URL: http://arxiv.org/abs/2506.20353v1
- Date: Wed, 25 Jun 2025 12:04:53 GMT
- Title: DipSVD: Dual-importance Protected SVD for Efficient LLM Compression
- Authors: Xuan Ding, Rui Sun, Yunjian Zhang, Xiu Yan, Yueqi Zhou, Kaihao Huang, Suzhong Fu, Chuanlong Xie, Yao Zhu,
- Abstract summary: DipSVD outperforms existing SVD-based compression approaches across multiple benchmarks.<n>This paper proposes a dual-level importance protection mechanism to enhance SVD-based compression methods.
- Score: 12.997409692786848
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The ever-increasing computational demands and deployment costs of large language models (LLMs) have spurred numerous compressing methods. Compared to quantization and unstructured pruning, SVD compression offers superior hardware compatibility and theoretical guarantees. However, existing SVD-based methods focus on the overall discrepancy between the original and compressed matrices while overlooking the protection of critical components within the matrix, which leads to inferior performance in the compressed models. This paper proposes a dual-level importance protection mechanism to enhance SVD-based compression methods: (1) local importance protection: preserving the most critical singular vectors within each weight matrix through channel-weighted data whitening; and (2) global importance protection: enabling less important layers to bear a greater portion of the compression burden through either a heuristic or optimization-based approach, thereby minimizing the impact of compression on critical layers. Extensive experiments demonstrate that DipSVD outperforms existing SVD-based compression approaches across multiple benchmarks, achieving superior model performance especially at high model compression ratios.
Related papers
- ReCalKV: Low-Rank KV Cache Compression via Head Reordering and Offline Calibration [81.81027217759433]
Large language models (LLMs) are often constrained by the excessive memory required to store the Key-Value ( KV) cache.<n>Recent methods have explored reducing the hidden dimensions of the KV cache, but many introduce additional computation through projection layers.<n>We propose ReCalKV, a post-training KV cache compression method that reduces the hidden dimensions of the KV cache.
arXiv Detail & Related papers (2025-05-30T08:49:27Z) - ResSVD: Residual Compensated SVD for Large Language Model Compression [12.539815070352116]
Large language models (LLMs) have demonstrated impressive capabilities in a wide range of downstream natural language processing tasks.<n>We propose ResSVD, a new post-training SVD-based LLM compression method.<n>We leverage the residual matrix generated during the truncation process to reduce truncation loss.
arXiv Detail & Related papers (2025-05-26T15:14:54Z) - Breaking the Compression Ceiling: Data-Free Pipeline for Ultra-Efficient Delta Compression [53.08742231761896]
UltraDelta is a data-free delta compression pipeline that achieves both ultra-high compression and strong performance.<n>UltraDelta is designed to minimize redundancy, maximize information, and stabilize performance across inter-layer, intra-layer, and global dimensions.
arXiv Detail & Related papers (2025-05-19T10:37:22Z) - SVD-LLM V2: Optimizing Singular Value Truncation for Large Language Model Compression [10.991519727445231]
Singular Value Decomposition (SVD) is a promising compression technique for Large Language Models (LLMs)<n>Existing SVD-based compression methods fall short in reducing truncation losses, leading to less competitive performance in compressed models.<n>We introduce SVD-LLM V2, a SVD-based LLM compression method that optimize singular value truncation in SVD compression with two techniques.
arXiv Detail & Related papers (2025-03-16T03:27:12Z) - Optimizing Singular Spectrum for Large Language Model Compression [95.7621116637755]
We introduce SoCo, a novel compression framework that learns to rescale the decomposed components of SVD in a data-driven manner.<n>Thanks to the learnable singular spectrum, SoCo adaptively prunes components according to the sparsified importance scores.<n> Experimental evaluations across multiple LLMs and benchmarks demonstrate that SoCo surpasses the state-of-the-art methods in model compression.
arXiv Detail & Related papers (2025-02-20T23:18:39Z) - Can LLMs Maintain Fundamental Abilities under KV Cache Compression? [29.510433427184385]
We present a benchmark KVFundaBench to evaluate the effects of KV cache compression across diverse fundamental language models.<n>We propose ShotKV, a novel compression approach that handles prefill and decoding phases while maintaining shot-level semantic coherence.
arXiv Detail & Related papers (2025-02-04T02:23:06Z) - AdaSVD: Adaptive Singular Value Decomposition for Large Language Models [84.60646883395454]
Singular Value Decomposition (SVD) has emerged as a promising compression technique for large language models (LLMs)<n>Existing SVD-based methods often struggle to effectively mitigate the errors introduced by SVD truncation.<n>We propose AdaSVD, an adaptive SVD-based LLM compression approach.
arXiv Detail & Related papers (2025-02-03T14:34:37Z) - LoRC: Low-Rank Compression for LLMs KV Cache with a Progressive Compression Strategy [59.1298692559785]
Key-Value ( KV) cache is crucial component in serving transformer-based autoregressive large language models (LLMs)
Existing approaches to mitigate this issue include: (1) efficient attention variants integrated in upcycling stages; (2) KV cache compression at test time; and (3) KV cache compression at test time.
We propose a low-rank approximation of KV weight matrices, allowing plug-in integration with existing transformer-based LLMs without model retraining.
Our method is designed to function without model tuning in upcycling stages or task-specific profiling in test stages.
arXiv Detail & Related papers (2024-10-04T03:10:53Z) - MoDeGPT: Modular Decomposition for Large Language Model Compression [59.361006801465344]
This paper introduces textbfModular bfDecomposition (MoDeGPT), a novel structured compression framework.<n>MoDeGPT partitions the Transformer block into modules comprised of matrix pairs and reduces the hidden dimensions.<n>Our experiments show MoDeGPT, without backward propagation, matches or surpasses previous structured compression methods.
arXiv Detail & Related papers (2024-08-19T01:30:14Z) - SVD-LLM: Truncation-aware Singular Value Decomposition for Large Language Model Compression [14.818355326032538]
Singular Value Decomposition (SVD) offers a promising solution for Large Language Models (LLMs) compression.<n>However, truncating smaller singular values may lead to higher compression loss, and the lack of update on the compressed weights after SVD truncation.<n>We propose SVD-LLM, a SVD-based post-training LLM compression method that addresses the limitations of existing methods.
arXiv Detail & Related papers (2024-03-12T07:31:18Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.