Related papers: Rethinking Compression: Reduced Order Modelling of Latent Features in Large Language Models

Rethinking Compression: Reduced Order Modelling of Latent Features in Large Language Models

URL: http://arxiv.org/abs/2312.07046v1
Date: Tue, 12 Dec 2023 07:56:57 GMT
Title: Rethinking Compression: Reduced Order Modelling of Latent Features in Large Language Models
Authors: Arnav Chavan, Nahush Lele and Deepak Gupta
Abstract summary: This paper introduces an innovative approach for the parametric and practical compression of Large Language Models (LLMs) based on reduced order modelling. Our method represents a significant advancement in model compression by leveraging matrix decomposition, demonstrating superior efficacy compared to the prevailing state-of-the-art structured pruning method.
Score: 9.91972450276408
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Due to the substantial scale of Large Language Models (LLMs), the direct application of conventional compression methodologies proves impractical. The computational demands associated with even minimal gradient updates present challenges, particularly on consumer-grade hardware. This paper introduces an innovative approach for the parametric and practical compression of LLMs based on reduced order modelling, which entails low-rank decomposition within the feature space and re-parameterization in the weight space. Notably, this compression technique operates in a layer-wise manner, obviating the need for a GPU device and enabling the compression of billion-scale models within stringent constraints of both memory and time. Our method represents a significant advancement in model compression by leveraging matrix decomposition, demonstrating superior efficacy compared to the prevailing state-of-the-art structured pruning method.

Related papers

CompGS++: Compressed Gaussian Splatting for Static and Dynamic Scene Representation [60.712165339762116]
CompGS++ is a novel framework that leverages compact Gaussian primitives to achieve accurate 3D modeling. Our design is based on the principle of eliminating redundancy both between and within primitives. Our implementation will be made publicly available on GitHub to facilitate further research.
arXiv Detail & Related papers (2025-04-17T15:33:01Z)
MambaIC: State Space Models for High-Performance Learned Image Compression [53.991726013454695]
A high-performance image compression algorithm is crucial for real-time information transmission across numerous fields. Inspired by the effectiveness of state space models (SSMs) in capturing long-range dependencies, we leverage SSMs to address computational inefficiency in existing methods. We propose an enhanced image compression approach through refined context modeling, which we term MambaIC.
arXiv Detail & Related papers (2025-03-16T11:32:34Z)
Choose Your Model Size: Any Compression by a Single Gradient Descent [9.074689052563878]
We present Any Compression via Iterative Pruning (ACIP) ACIP is an algorithmic approach to determine a compression-performance trade-off from a single gradient descent run. We show that ACIP seamlessly complements common quantization-based compression techniques.
arXiv Detail & Related papers (2025-02-03T18:40:58Z)
SEE: Sememe Entanglement Encoding for Transformer-bases Models Compression [20.824040486029354]
Transformer-based large language models exhibit groundbreaking capabilities, but their storage and computational costs are high, limiting their application in resource-constrained scenarios. An effective approach is to eliminate redundant model parameters and computational costs while incorporating efficient expert-derived knowledge structures to achieve a balance between compression and performance.
arXiv Detail & Related papers (2024-12-15T12:01:43Z)
Pushing the Limits of Large Language Model Quantization via the Linearity Theorem [71.3332971315821]
We present a "line theoremarity" establishing a direct relationship between the layer-wise $ell$ reconstruction error and the model perplexity increase due to quantization. This insight enables two novel applications: (1) a simple data-free LLM quantization method using Hadamard rotations and MSE-optimal grids, dubbed HIGGS, and (2) an optimal solution to the problem of finding non-uniform per-layer quantization levels.
arXiv Detail & Related papers (2024-11-26T15:35:44Z)
Language Models as Zero-shot Lossless Gradient Compressors: Towards General Neural Parameter Prior Models [66.1595537904019]
Large language models (LLMs) can act as gradient priors in a zero-shot setting. We introduce LM-GC, a novel method that integrates LLMs with arithmetic coding.
arXiv Detail & Related papers (2024-09-26T13:38:33Z)
MoDeGPT: Modular Decomposition for Large Language Model Compression [59.361006801465344]
This paper introduces textbfModular bfDecomposition (MoDeGPT), a novel structured compression framework. MoDeGPT partitions the Transformer block into modules comprised of matrix pairs and reduces the hidden dimensions. Our experiments show MoDeGPT, without backward propagation, matches or surpasses previous structured compression methods.
arXiv Detail & Related papers (2024-08-19T01:30:14Z)
MCNC: Manifold Constrained Network Compression [21.70510507535041]
We present MCNC as a novel model compression method that constrains the parameter space to low-dimensional pre-defined and frozen nonlinear manifold. We show that our method, MCNC, significantly outperforms state-of-the-art baselines in terms of compression, accuracy, and/or model reconstruction time.
arXiv Detail & Related papers (2024-06-27T16:17:26Z)
Feature-based Low-Rank Compression of Large Language Models via Bayesian Optimization [40.15915011575071]
Low-rank compression is a promising technique to reduce non-essential parameters in large language models. We conduct empirical research on the low-rank characteristics of large models. We propose a low-rank compression method suitable for large language models.
arXiv Detail & Related papers (2024-05-17T08:27:12Z)
Data-freeWeight Compress and Denoise for Large Language Models [101.53420111286952]
We propose a novel approach termed Data-free Joint Rank-k Approximation for compressing the parameter matrices. We achieve a model pruning of 80% parameters while retaining 93.43% of the original performance without any calibration data.
arXiv Detail & Related papers (2024-02-26T05:51:47Z)
Efficient Compression of Overparameterized Deep Models through Low-Dimensional Learning Dynamics [10.673414267895355]
We present a novel approach for compressing over parameterized models. Our algorithm improves the training efficiency by more than 2x, without compromising generalization.
arXiv Detail & Related papers (2023-11-08T23:57:03Z)
Retrieval-based Knowledge Transfer: An Effective Approach for Extreme Large Language Model Compression [64.07696663255155]
Large-scale pre-trained language models (LLMs) have demonstrated exceptional performance in various natural language processing (NLP) tasks. However, the massive size of these models poses huge challenges for their deployment in real-world applications. We introduce a novel compression paradigm called Retrieval-based Knowledge Transfer (RetriKT) which effectively transfers the knowledge of LLMs to extremely small-scale models.
arXiv Detail & Related papers (2023-10-24T07:58:20Z)
Just CHOP: Embarrassingly Simple LLM Compression [27.64461490974072]
Large language models (LLMs) enable unparalleled few- and zero-shot reasoning capabilities but at a high computational footprint. We show that simple layer pruning coupled with an extended language model pretraining produces state-of-the-art results against structured and even semi-structured compression of models at a 7B scale. We also show how distillation, which has been super effective in task-agnostic compression of smaller BERT-style models, becomes inefficient against our simple pruning technique.
arXiv Detail & Related papers (2023-05-24T08:18:35Z)
What do Compressed Large Language Models Forget? Robustness Challenges in Model Compression [68.82486784654817]
We study two popular model compression techniques including knowledge distillation and pruning. We show that compressed models are significantly less robust than their PLM counterparts on adversarial test sets. We develop a regularization strategy for model compression based on sample uncertainty.
arXiv Detail & Related papers (2021-10-16T00:20:04Z)
A Model Compression Method with Matrix Product Operators for Speech Enhancement [15.066942043773267]
We propose a model compression method based on matrix product operators (MPO) to substantially reduce the number of parameters in neural network models for speech enhancement. Our proposal provides an effective model compression method for speech enhancement, especially in cloud-free application.
arXiv Detail & Related papers (2020-10-10T08:53:25Z)

This list is automatically generated from the titles and abstracts of the papers in this site.