Related papers: Everything You Always Wanted to Know About Storage Compressibility of Pre-Trained ML Models but Were Afraid to Ask

Everything You Always Wanted to Know About Storage Compressibility of Pre-Trained ML Models but Were Afraid to Ask

URL: http://arxiv.org/abs/2402.13429v1
Date: Tue, 20 Feb 2024 23:45:37 GMT
Title: Everything You Always Wanted to Know About Storage Compressibility of Pre-Trained ML Models but Were Afraid to Ask
Authors: Zhaoyuan Su, Ammar Ahmed, Zirui Wang, Ali Anwar, Yue Cheng
Abstract summary: Existing data reduction techniques are not specifically designed for pre-trained model (PTM) dataset files. This paper presents the first, exhaustive analysis to date of PTM datasets on storage compressibility. We develop Elves, a compression framework that integrates ELF along with several other data reduction methods.
Score: 19.612260423937744
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: As the number of pre-trained machine learning (ML) models is growing exponentially, data reduction tools are not catching up. Existing data reduction techniques are not specifically designed for pre-trained model (PTM) dataset files. This is largely due to a lack of understanding of the patterns and characteristics of these datasets, especially those relevant to data reduction and compressibility. This paper presents the first, exhaustive analysis to date of PTM datasets on storage compressibility. Our analysis spans different types of data reduction and compression techniques, from hash-based data deduplication, data similarity detection, to dictionary-coding compression. Our analysis explores these techniques at three data granularity levels, from model layers, model chunks, to model parameters. We draw new observations that indicate that modern data reduction tools are not effective when handling PTM datasets. There is a pressing need for new compression methods that take into account PTMs' data characteristics for effective storage reduction. Motivated by our findings, we design ELF, a simple yet effective, error-bounded, lossy floating-point compression method. ELF transforms floating-point parameters in such a way that the common exponent field of the transformed parameters can be completely eliminated to save storage space. We develop Elves, a compression framework that integrates ELF along with several other data reduction methods. Elves uses the most effective method to compress PTMs that exhibit different patterns. Evaluation shows that Elves achieves an overall compression ratio of $1.52\times$, which is $1.31\times$, $1.32\times$ and $1.29\times$ higher than a general-purpose compressor (zstd), an error-bounded lossy compressor (SZ3), and the uniform model quantization, respectively, with negligible model accuracy loss.

Related papers

Efficient Token Compression for Vision Transformer with Spatial Information Preserved [59.79302182800274]
Token compression is essential for reducing the computational and memory requirements of transformer models. We propose an efficient and hardware-compatible token compression method called Prune and Merge.
arXiv Detail & Related papers (2025-03-30T14:23:18Z)
Forget the Data and Fine-Tuning! Just Fold the Network to Compress [13.611551223875194]
We introduce model folding, a novel data-free model compression technique that merges structurally similar neurons across layers. We show that model folding achieves comparable performance to data-driven compression techniques and outperforms recently proposed data-free methods. This approach is particularly effective for compressing large-scale models, making it suitable for deployment in resource-constrained environments.
arXiv Detail & Related papers (2025-02-14T15:10:43Z)
Variable Rate Neural Compression for Sparse Detector Data [9.331686712558144]
We propose a novel approach for TPC data compression via key-point identification facilitated by sparse convolution. BCAE-VS achieves a $75%$ improvement in reconstruction accuracy with a $10%$ increase in compression ratio over the previous state-of-the-art model.
arXiv Detail & Related papers (2024-11-18T17:15:35Z)
EvoPress: Towards Optimal Dynamic Model Compression via Evolutionary Search [33.86918407429272]
We propose a new and general approach for dynamic compression that is provably optimal in a given input range. We show that these theoretical guarantees lead to highly competitive practical performance for dynamic compression of Llama, Mistral and Phi models.
arXiv Detail & Related papers (2024-10-18T17:46:37Z)
Compression via Pre-trained Transformers: A Study on Byte-Level Multimodal Data [8.475091996107741]
This paper investigates whether there is a sweet spot where competitive compression ratios with pre-trained vanilla transformers are possible. We train families of models on 165GB of raw byte sequences of either text, image, or audio data. We find that relatively small models (i.e., millions of parameters) can outperform standard general-purpose compression algorithms.
arXiv Detail & Related papers (2024-10-07T14:32:03Z)
Language Models as Zero-shot Lossless Gradient Compressors: Towards General Neural Parameter Prior Models [66.1595537904019]
Large language models (LLMs) can act as gradient priors in a zero-shot setting. We introduce LM-GC, a novel method that integrates LLMs with arithmetic coding.
arXiv Detail & Related papers (2024-09-26T13:38:33Z)
MoDeGPT: Modular Decomposition for Large Language Model Compression [59.361006801465344]
This paper introduces textbfModular bfDecomposition (MoDeGPT), a novel structured compression framework. MoDeGPT partitions the Transformer block into modules comprised of matrix pairs and reduces the hidden dimensions. Our experiments show MoDeGPT, without backward propagation, matches or surpasses previous structured compression methods.
arXiv Detail & Related papers (2024-08-19T01:30:14Z)
Compression of Structured Data with Autoencoders: Provable Benefit of Nonlinearities and Depth [83.15263499262824]
We prove that gradient descent converges to a solution that completely disregards the sparse structure of the input. We show how to improve upon Gaussian performance for the compression of sparse data by adding a denoising function to a shallow architecture. We validate our findings on image datasets, such as CIFAR-10 and MNIST.
arXiv Detail & Related papers (2024-02-07T16:32:29Z)
TensorGPT: Efficient Compression of Large Language Models based on Tensor-Train Decomposition [19.897367559948336]
We propose a training-free model compression approach based on the Matrix-Train Decomposition (TTD) We then investigate the low-rank structures extracted by this approach, in terms of the compression ratio, the language task performance, and latency on a typical low-end device (i.e. Raspberry Pi)
arXiv Detail & Related papers (2023-07-02T09:33:09Z)
Language model compression with weighted low-rank factorization [73.61874728240568]
We introduce Fisher information to weigh the importance of parameters affecting the model prediction. We find that our resulting task accuracy is much closer to the original model's performance. Our method can directly compress a task-specific model while achieving better performance than other compact model strategies.
arXiv Detail & Related papers (2022-06-30T21:57:07Z)
Few-Shot Non-Parametric Learning with Deep Latent Variable Model [50.746273235463754]
We propose Non-Parametric learning by Compression with Latent Variables (NPC-LV) NPC-LV is a learning framework for any dataset with abundant unlabeled data but very few labeled ones. We show that NPC-LV outperforms supervised methods on all three datasets on image classification in low data regime.
arXiv Detail & Related papers (2022-06-23T09:35:03Z)
Parameter-Efficient Sparsity for Large Language Models Fine-Tuning [63.321205487234074]
We propose a. sparse-efficient Sparse Training (PST) method to reduce the number of trainable parameters during sparse-aware training. Experiments with diverse networks (i.e., BERT, RoBERTa and GPT-2) demonstrate PST performs on par or better than previous sparsity methods.
arXiv Detail & Related papers (2022-05-23T02:43:45Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.