Related papers: AdaSVD: Adaptive Singular Value Decomposition for Large Language Models

AdaSVD: Adaptive Singular Value Decomposition for Large Language Models

URL: http://arxiv.org/abs/2502.01403v2
Date: Tue, 04 Feb 2025 03:51:28 GMT
Title: AdaSVD: Adaptive Singular Value Decomposition for Large Language Models
Authors: Zhiteng Li, Mingyuan Xia, Jingyuan Zhang, Zheng Hui, Linghe Kong, Yulun Zhang, Xiaokang Yang,
Abstract summary: Singular Value Decomposition (SVD) has emerged as a promising compression technique for large language models (LLMs)<n>Existing SVD-based methods often struggle to effectively mitigate the errors introduced by SVD truncation.<n>We propose AdaSVD, an adaptive SVD-based LLM compression approach.
Score: 84.60646883395454
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Large language models (LLMs) have achieved remarkable success in natural language processing (NLP) tasks, yet their substantial memory requirements present significant challenges for deployment on resource-constrained devices. Singular Value Decomposition (SVD) has emerged as a promising compression technique for LLMs, offering considerable reductions in memory overhead. However, existing SVD-based methods often struggle to effectively mitigate the errors introduced by SVD truncation, leading to a noticeable performance gap when compared to the original models. Furthermore, applying a uniform compression ratio across all transformer layers fails to account for the varying importance of different layers. To address these challenges, we propose AdaSVD, an adaptive SVD-based LLM compression approach. Specifically, AdaSVD introduces adaComp, which adaptively compensates for SVD truncation errors by alternately updating the singular matrices U and V^T. Additionally, AdaSVD introduces adaCR, which adaptively assigns layer-specific compression ratios based on the relative importance of each layer. Extensive experiments across multiple LLM families and evaluation metrics demonstrate that AdaSVD consistently outperforms state-of-the-art (SOTA) SVD-based methods, achieving superior performance with significantly reduced memory requirements. The code and models will be available at https://github.com/ZHITENGLI/AdaSVD.

Related papers

SVD-LLM V2: Optimizing Singular Value Truncation for Large Language Model Compression [10.991519727445231]
Singular Value Decomposition (SVD) is a promising compression technique for Large Language Models (LLMs) Existing SVD-based compression methods fall short in reducing truncation losses, leading to less competitive performance in compressed models. We introduce SVD-LLM V2, a SVD-based LLM compression method that optimize singular value truncation in SVD compression with two techniques.
arXiv Detail & Related papers (2025-03-16T03:27:12Z)
Optimizing Singular Spectrum for Large Language Model Compression [95.7621116637755]
We introduce SoCo, a novel compression framework that learns to rescale the decomposed components of SVD in a data-driven manner. Thanks to the learnable singular spectrum, SoCo adaptively prunes components according to the sparsified importance scores. Experimental evaluations across multiple LLMs and benchmarks demonstrate that SoCo surpasses the state-of-the-art methods in model compression.
arXiv Detail & Related papers (2025-02-20T23:18:39Z)
Dobi-SVD: Differentiable SVD for LLM Compression and Some New Perspectives [59.46211685419206]
We argue that the optimal use of SVD lies in truncating activations, rather than merely using activations as an optimization distance. We propose Dobi-SVD, which establishes a new, principled approach to SVD-based LLM compression.
arXiv Detail & Related papers (2025-02-04T21:17:51Z)
VLsI: Verbalized Layers-to-Interactions from Large to Small Vision Language Models [63.27511432647797]
We propose VLsI: Verbalized Layers-to-Interactions, a new VLM family in 2B and 7B model sizes.<n>We validate VLsI across ten challenging vision-language benchmarks, achieving notable performance gains (11.0% for 2B and 17.4% for 7B) over GPT-4V.
arXiv Detail & Related papers (2024-12-02T18:58:25Z)
LoRC: Low-Rank Compression for LLMs KV Cache with a Progressive Compression Strategy [59.1298692559785]
Key-Value ( KV) cache is crucial component in serving transformer-based autoregressive large language models (LLMs) Existing approaches to mitigate this issue include: (1) efficient attention variants integrated in upcycling stages; (2) KV cache compression at test time; and (3) KV cache compression at test time. We propose a low-rank approximation of KV weight matrices, allowing plug-in integration with existing transformer-based LLMs without model retraining. Our method is designed to function without model tuning in upcycling stages or task-specific profiling in test stages.
arXiv Detail & Related papers (2024-10-04T03:10:53Z)
Basis Sharing: Cross-Layer Parameter Sharing for Large Language Model Compression [5.206085750261924]
Large Language Models (LLMs) require significant amount of memory storage in inference. In this paper, we take a step further to explore parameter sharing across different layers with singular value decomposition. Comprehensive experiments demonstrate that Basis Sharing outperforms state-of-the-art SVD-based compression approaches.
arXiv Detail & Related papers (2024-10-02T14:30:02Z)
SVD-LLM: Truncation-aware Singular Value Decomposition for Large Language Model Compression [14.818355326032538]
We propose SVD-LLM, a new SVD-based compression method for Large Language Models (LLMs) SVD-LLM incorporates a truncation-aware data whitening strategy to ensure a direct mapping between singular values and compression loss. Our results demonstrate the superiority of SVD-LLM over state-of-the-arts, especially at high model compression ratios.
arXiv Detail & Related papers (2024-03-12T07:31:18Z)
Numerical Optimizations for Weighted Low-rank Estimation on Language Model [73.12941276331316]
Singular value decomposition (SVD) is one of the most popular compression methods that approximates a target matrix with smaller matrices. Standard SVD treats the parameters within the matrix with equal importance, which is a simple but unrealistic assumption. We show that our method can perform better than current SOTA methods in neural-based language models.
arXiv Detail & Related papers (2022-11-02T00:58:02Z)
Language model compression with weighted low-rank factorization [73.61874728240568]
We introduce Fisher information to weigh the importance of parameters affecting the model prediction. We find that our resulting task accuracy is much closer to the original model's performance. Our method can directly compress a task-specific model while achieving better performance than other compact model strategies.
arXiv Detail & Related papers (2022-06-30T21:57:07Z)
Learning Low-rank Deep Neural Networks via Singular Vector Orthogonality Regularization and Singular Value Sparsification [53.50708351813565]
We propose SVD training, the first method to explicitly achieve low-rank DNNs during training without applying SVD on every step. We empirically show that SVD training can significantly reduce the rank of DNN layers and achieve higher reduction on computation load under the same accuracy.
arXiv Detail & Related papers (2020-04-20T02:40:43Z)

This list is automatically generated from the titles and abstracts of the papers in this site.