Everything You Always Wanted to Know About Storage Compressibility of
Pre-Trained ML Models but Were Afraid to Ask
- URL: http://arxiv.org/abs/2402.13429v1
- Date: Tue, 20 Feb 2024 23:45:37 GMT
- Title: Everything You Always Wanted to Know About Storage Compressibility of
Pre-Trained ML Models but Were Afraid to Ask
- Authors: Zhaoyuan Su, Ammar Ahmed, Zirui Wang, Ali Anwar, Yue Cheng
- Abstract summary: Existing data reduction techniques are not specifically designed for pre-trained model (PTM) dataset files.
This paper presents the first, exhaustive analysis to date of PTM datasets on storage compressibility.
We develop Elves, a compression framework that integrates ELF along with several other data reduction methods.
- Score: 19.612260423937744
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: As the number of pre-trained machine learning (ML) models is growing
exponentially, data reduction tools are not catching up. Existing data
reduction techniques are not specifically designed for pre-trained model (PTM)
dataset files. This is largely due to a lack of understanding of the patterns
and characteristics of these datasets, especially those relevant to data
reduction and compressibility.
This paper presents the first, exhaustive analysis to date of PTM datasets on
storage compressibility. Our analysis spans different types of data reduction
and compression techniques, from hash-based data deduplication, data similarity
detection, to dictionary-coding compression. Our analysis explores these
techniques at three data granularity levels, from model layers, model chunks,
to model parameters. We draw new observations that indicate that modern data
reduction tools are not effective when handling PTM datasets. There is a
pressing need for new compression methods that take into account PTMs' data
characteristics for effective storage reduction.
Motivated by our findings, we design ELF, a simple yet effective,
error-bounded, lossy floating-point compression method. ELF transforms
floating-point parameters in such a way that the common exponent field of the
transformed parameters can be completely eliminated to save storage space. We
develop Elves, a compression framework that integrates ELF along with several
other data reduction methods. Elves uses the most effective method to compress
PTMs that exhibit different patterns. Evaluation shows that Elves achieves an
overall compression ratio of $1.52\times$, which is $1.31\times$, $1.32\times$
and $1.29\times$ higher than a general-purpose compressor (zstd), an
error-bounded lossy compressor (SZ3), and the uniform model quantization,
respectively, with negligible model accuracy loss.
Related papers
- Variable Rate Neural Compression for Sparse Detector Data [9.331686712558144]
We propose a novel approach for TPC data compression via key-point identification facilitated by sparse convolution.
BCAE-VS achieves a $75%$ improvement in reconstruction accuracy with a $10%$ increase in compression ratio over the previous state-of-the-art model.
arXiv Detail & Related papers (2024-11-18T17:15:35Z) - EvoPress: Towards Optimal Dynamic Model Compression via Evolutionary Search [33.86918407429272]
We propose a new and general approach for dynamic compression that is provably optimal in a given input range.
We show that these theoretical guarantees lead to highly competitive practical performance for dynamic compression of Llama, Mistral and Phi models.
arXiv Detail & Related papers (2024-10-18T17:46:37Z) - Compression via Pre-trained Transformers: A Study on Byte-Level Multimodal Data [8.475091996107741]
This paper investigates whether there is a sweet spot where competitive compression ratios with pre-trained vanilla transformers are possible.
We train families of models on 165GB of raw byte sequences of either text, image, or audio data.
We find that relatively small models (i.e., millions of parameters) can outperform standard general-purpose compression algorithms.
arXiv Detail & Related papers (2024-10-07T14:32:03Z) - Language Models as Zero-shot Lossless Gradient Compressors: Towards
General Neural Parameter Prior Models [66.1595537904019]
Large language models (LLMs) can act as gradient priors in a zero-shot setting.
We introduce LM-GC, a novel method that integrates LLMs with arithmetic coding.
arXiv Detail & Related papers (2024-09-26T13:38:33Z) - MoDeGPT: Modular Decomposition for Large Language Model Compression [59.361006801465344]
This paper introduces textbfModular bfDecomposition (MoDeGPT), a novel structured compression framework.
MoDeGPT partitions the Transformer block into modules comprised of matrix pairs and reduces the hidden dimensions.
Our experiments show MoDeGPT, without backward propagation, matches or surpasses previous structured compression methods.
arXiv Detail & Related papers (2024-08-19T01:30:14Z) - Compression of Structured Data with Autoencoders: Provable Benefit of
Nonlinearities and Depth [83.15263499262824]
We prove that gradient descent converges to a solution that completely disregards the sparse structure of the input.
We show how to improve upon Gaussian performance for the compression of sparse data by adding a denoising function to a shallow architecture.
We validate our findings on image datasets, such as CIFAR-10 and MNIST.
arXiv Detail & Related papers (2024-02-07T16:32:29Z) - TensorGPT: Efficient Compression of Large Language Models based on Tensor-Train Decomposition [19.897367559948336]
We propose a training-free model compression approach based on the Matrix-Train Decomposition (TTD)
We then investigate the low-rank structures extracted by this approach, in terms of the compression ratio, the language task performance, and latency on a typical low-end device (i.e. Raspberry Pi)
arXiv Detail & Related papers (2023-07-02T09:33:09Z) - Language model compression with weighted low-rank factorization [73.61874728240568]
We introduce Fisher information to weigh the importance of parameters affecting the model prediction.
We find that our resulting task accuracy is much closer to the original model's performance.
Our method can directly compress a task-specific model while achieving better performance than other compact model strategies.
arXiv Detail & Related papers (2022-06-30T21:57:07Z) - Few-Shot Non-Parametric Learning with Deep Latent Variable Model [50.746273235463754]
We propose Non-Parametric learning by Compression with Latent Variables (NPC-LV)
NPC-LV is a learning framework for any dataset with abundant unlabeled data but very few labeled ones.
We show that NPC-LV outperforms supervised methods on all three datasets on image classification in low data regime.
arXiv Detail & Related papers (2022-06-23T09:35:03Z) - Parameter-Efficient Sparsity for Large Language Models Fine-Tuning [63.321205487234074]
We propose a.
sparse-efficient Sparse Training (PST) method to reduce the number of trainable parameters during sparse-aware training.
Experiments with diverse networks (i.e., BERT, RoBERTa and GPT-2) demonstrate PST performs on par or better than previous sparsity methods.
arXiv Detail & Related papers (2022-05-23T02:43:45Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.