Data-freeWeight Compress and Denoise for Large Language Models
- URL: http://arxiv.org/abs/2402.16319v1
- Date: Mon, 26 Feb 2024 05:51:47 GMT
- Title: Data-freeWeight Compress and Denoise for Large Language Models
- Authors: Runyu Peng, Yunhua Zhou, Qipeng Guo, Yang Gao, Hang Yan, Xipeng Qiu,
Dahua Lin
- Abstract summary: We propose a novel approach termed Data-free Joint Rank-k Approximation for compressing the parameter matrices.
We achieve a model pruning of 80% parameters while retaining 93.43% of the original performance without any calibration data.
- Score: 101.53420111286952
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Large Language Models (LLMs) are reshaping the research landscape in
artificial intelligence, particularly as model parameters scale up
significantly, unlocking remarkable capabilities across various domains.
Nevertheless, the scalability of model parameters faces constraints due to
limitations in GPU memory and computational speed. To address these
constraints, various weight compression methods have emerged, such as Pruning
and Quantization. Given the low-rank nature of weight matrices in language
models, the reduction of weights through matrix decomposition undoubtedly holds
significant potential and promise. In this paper, drawing upon the intrinsic
structure of LLMs, we propose a novel approach termed Data-free Joint Rank-k
Approximation for compressing the parameter matrices. Significantly, our method
is characterized by without necessitating additional involvement of any corpus,
while simultaneously preserving orthogonality in conjunction with pruning and
quantization methods. We achieve a model pruning of 80% parameters while
retaining 93.43% of the original performance without any calibration data.
Additionally, we explore the fundamental properties of the weight matrix of
LLMs undergone Rank-k Approximation and conduct comprehensive experiments to
elucidate our hypothesis.
Related papers
- Language Models as Zero-shot Lossless Gradient Compressors: Towards
General Neural Parameter Prior Models [66.1595537904019]
Large language models (LLMs) can act as gradient priors in a zero-shot setting.
We introduce LM-GC, a novel method that integrates LLMs with arithmetic coding.
arXiv Detail & Related papers (2024-09-26T13:38:33Z) - SMILE: Zero-Shot Sparse Mixture of Low-Rank Experts Construction From Pre-Trained Foundation Models [85.67096251281191]
We present an innovative approach to model fusion called zero-shot Sparse MIxture of Low-rank Experts (SMILE) construction.
SMILE allows for the upscaling of source models into an MoE model without extra data or further training.
We conduct extensive experiments across diverse scenarios, such as image classification and text generation tasks, using full fine-tuning and LoRA fine-tuning.
arXiv Detail & Related papers (2024-08-19T17:32:15Z) - Uncertainty Quantification in Large Language Models Through Convex Hull Analysis [0.36832029288386137]
This study proposes a novel geometric approach to uncertainty quantification using convex hull analysis.
The proposed method leverages the spatial properties of response embeddings to measure the dispersion and variability of model outputs.
arXiv Detail & Related papers (2024-06-28T07:47:34Z) - Compressible Dynamics in Deep Overparameterized Low-Rank Learning & Adaptation [12.07880147193174]
We show that by leveraging the inherent low-dimensional structures of data and compressible dynamics within the model parameters, we can reap the benefits of over parameterization without the computational burdens.
We demonstrate the effectiveness of this approach for deep low-rank matrix completion as well as fine-tuning language models.
arXiv Detail & Related papers (2024-06-06T14:29:49Z) - FineQuant: Unlocking Efficiency with Fine-Grained Weight-Only
Quantization for LLMs [9.072821427818557]
Large Language Models (LLMs) have achieved state-of-the-art performance across various language tasks but pose challenges for practical deployment.
We propose an efficient weight-only quantization method that reduces memory consumption and accelerates inference for LLMs.
We evaluate our approach on large-scale open source models such as OPT-175B and internal MoE models, showcasing minimal accuracy loss while achieving up to 3.65 times higher throughput.
arXiv Detail & Related papers (2023-08-16T23:57:41Z) - Scaling Pre-trained Language Models to Deeper via Parameter-efficient
Architecture [68.13678918660872]
We design a more capable parameter-sharing architecture based on matrix product operator (MPO)
MPO decomposition can reorganize and factorize the information of a parameter matrix into two parts.
Our architecture shares the central tensor across all layers for reducing the model size.
arXiv Detail & Related papers (2023-03-27T02:34:09Z) - Bayesian multiscale deep generative model for the solution of
high-dimensional inverse problems [0.0]
A novel multiscale Bayesian inference approach is introduced based on deep probabilistic generative models.
The method allows high-dimensional parameter estimation while exhibiting stability, efficiency and accuracy.
arXiv Detail & Related papers (2021-02-04T11:47:21Z) - Intrinsic Dimensionality Explains the Effectiveness of Language Model
Fine-Tuning [52.624194343095304]
We argue that analyzing fine-tuning through the lens of intrinsic dimension provides us with empirical and theoretical intuitions.
We empirically show that common pre-trained models have a very low intrinsic dimension.
arXiv Detail & Related papers (2020-12-22T07:42:30Z) - Multiplicative noise and heavy tails in stochastic optimization [62.993432503309485]
empirical optimization is central to modern machine learning, but its role in its success is still unclear.
We show that it commonly arises in parameters of discrete multiplicative noise due to variance.
A detailed analysis is conducted in which we describe on key factors, including recent step size, and data, all exhibit similar results on state-of-the-art neural network models.
arXiv Detail & Related papers (2020-06-11T09:58:01Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.