Dobi-SVD: Differentiable SVD for LLM Compression and Some New Perspectives
- URL: http://arxiv.org/abs/2502.02723v1
- Date: Tue, 04 Feb 2025 21:17:51 GMT
- Title: Dobi-SVD: Differentiable SVD for LLM Compression and Some New Perspectives
- Authors: Qinsi Wang, Jinghan Ke, Masayoshi Tomizuka, Yiran Chen, Kurt Keutzer, Chenfeng Xu,
- Abstract summary: We argue that the optimal use of SVD lies in truncating activations, rather than merely using activations as an optimization distance.
We propose Dobi-SVD, which establishes a new, principled approach to SVD-based LLM compression.
- Score: 59.46211685419206
- License:
- Abstract: We provide a new LLM-compression solution via SVD, unlocking new possibilities for LLM compression beyond quantization and pruning. We point out that the optimal use of SVD lies in truncating activations, rather than merely using activations as an optimization distance. Building on this principle, we address three critical challenges in SVD-based LLM compression: including (1) How can we determine the optimal activation truncation position for each weight matrix in LLMs? (2) How can we efficiently reconstruct the weight matrices based on truncated activations? (3) How can we address the inherent "injection" nature that results in the information loss of the SVD? We propose Dobi-SVD, which establishes a new, principled approach to SVD-based LLM compression.
Related papers
- AdaSVD: Adaptive Singular Value Decomposition for Large Language Models [84.60646883395454]
Singular Value Decomposition (SVD) has emerged as a promising compression technique for large language models (LLMs)
Existing SVD-based methods often struggle to effectively mitigate the errors introduced by SVD truncation.
We propose AdaSVD, an adaptive SVD-based LLM compression approach.
arXiv Detail & Related papers (2025-02-03T14:34:37Z) - AlphaPruning: Using Heavy-Tailed Self Regularization Theory for Improved Layer-wise Pruning of Large Language Models [94.82766517752418]
We propose AlphaPruning, which uses shape metrics to allocate layerwise sparsity ratios in a more theoretically principled manner.
Our results show that AlphaPruning prunes LLaMA-7B to 80% sparsity while maintaining reasonable perplexity, marking a first in the literature on LLMs.
arXiv Detail & Related papers (2024-10-14T03:35:11Z) - Basis Sharing: Cross-Layer Parameter Sharing for Large Language Model Compression [5.206085750261924]
Large Language Models (LLMs) require significant amount of memory storage in inference.
In this paper, we take a step further to explore parameter sharing across different layers with singular value decomposition.
Comprehensive experiments demonstrate that Basis Sharing outperforms state-of-the-art SVD-based compression approaches.
arXiv Detail & Related papers (2024-10-02T14:30:02Z) - STBLLM: Breaking the 1-Bit Barrier with Structured Binary LLMs [28.70239743254508]
We present the first structural binarization method for LLM compression to less than 1-bit precision.
We observe that some weights in binarized LLMs can be randomly flipped without significant performance degradation.
Our approach performs better than other compressed binarization methods while significantly reducing memory requirements.
arXiv Detail & Related papers (2024-08-03T15:07:44Z) - Bypass Back-propagation: Optimization-based Structural Pruning for Large Language Models via Policy Gradient [57.9629676017527]
We propose an optimization-based structural pruning on Large-Language Models.
We learn the pruning masks in a probabilistic space directly by optimizing the loss of the pruned model.
Our method operates for 2.7 hours with around 35GB memory for the 13B models on a single A100 GPU.
arXiv Detail & Related papers (2024-06-15T09:31:03Z) - SVD-LLM: Truncation-aware Singular Value Decomposition for Large Language Model Compression [14.818355326032538]
We propose SVD-LLM, a new SVD-based compression method for Large Language Models (LLMs)
SVD-LLM incorporates a truncation-aware data whitening strategy to ensure a direct mapping between singular values and compression loss.
Our results demonstrate the superiority of SVD-LLM over state-of-the-arts, especially at high model compression ratios.
arXiv Detail & Related papers (2024-03-12T07:31:18Z) - ReLU$^2$ Wins: Discovering Efficient Activation Functions for Sparse
LLMs [91.31204876440765]
We introduce a general method that defines neuron activation through neuron output magnitudes and a tailored magnitude threshold.
To find the most efficient activation function for sparse computation, we propose a systematic framework.
We conduct thorough experiments on LLMs utilizing different activation functions, including ReLU, SwiGLU, ReGLU, and ReLU$2$.
arXiv Detail & Related papers (2024-02-06T08:45:51Z) - ASVD: Activation-aware Singular Value Decomposition for Compressing Large Language Models [28.231997641388343]
We introduce a new post-training compression paradigm for Large Language Models (LLMs)
We find that the challenges of this task stem from the distribution variance in the LLM activations and the sensitivity difference among various kinds of layers.
We propose a training-free approach called Activation-aware Singular Value Decomposition (ASVD)
arXiv Detail & Related papers (2023-12-10T08:41:24Z) - LLM-Pruner: On the Structural Pruning of Large Language Models [65.02607075556742]
Large language models (LLMs) have shown remarkable capabilities in language understanding and generation.
We tackle the compression of LLMs within the bound of two constraints: being task-agnostic and minimizing the reliance on the original training dataset.
Our method, named LLM-Pruner, adopts structural pruning that selectively removes non-critical coupled structures.
arXiv Detail & Related papers (2023-05-19T12:10:53Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.