A flexible, extensible software framework for model compression based on
the LC algorithm
- URL: http://arxiv.org/abs/2005.07786v1
- Date: Fri, 15 May 2020 21:14:48 GMT
- Title: A flexible, extensible software framework for model compression based on
the LC algorithm
- Authors: Yerlan Idelbayev and Miguel \'A. Carreira-Perpi\~n\'an
- Abstract summary: We propose a software framework that allows a user to compress a neural network or other machine learning model with minimal effort.
The library is written in Python and PyTorch and available in Github.
- Score: 10.787390511207683
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We propose a software framework based on the ideas of the
Learning-Compression (LC) algorithm, that allows a user to compress a neural
network or other machine learning model using different compression schemes
with minimal effort. Currently, the supported compressions include pruning,
quantization, low-rank methods (including automatically learning the layer
ranks), and combinations of those, and the user can choose different
compression types for different parts of a neural network.
The LC algorithm alternates two types of steps until convergence: a learning
(L) step, which trains a model on a dataset (using an algorithm such as SGD);
and a compression (C) step, which compresses the model parameters (using a
compression scheme such as low-rank or quantization). This decoupling of the
"machine learning" aspect from the "signal compression" aspect means that
changing the model or the compression type amounts to calling the corresponding
subroutine in the L or C step, respectively. The library fully supports this by
design, which makes it flexible and extensible. This does not come at the
expense of performance: the runtime needed to compress a model is comparable to
that of training the model in the first place; and the compressed model is
competitive in terms of prediction accuracy and compression ratio with other
algorithms (which are often specialized for specific models or compression
schemes). The library is written in Python and PyTorch and available in Github.
Related papers
- LoRC: Low-Rank Compression for LLMs KV Cache with a Progressive Compression Strategy [59.1298692559785]
Key-Value ( KV) cache is crucial component in serving transformer-based autoregressive large language models (LLMs)
Existing approaches to mitigate this issue include: (1) efficient attention variants integrated in upcycling stages; (2) KV cache compression at test time; and (3) KV cache compression at test time.
We propose a low-rank approximation of KV weight matrices, allowing plug-in integration with existing transformer-based LLMs without model retraining.
Our method is designed to function without model tuning in upcycling stages or task-specific profiling in test stages.
arXiv Detail & Related papers (2024-10-04T03:10:53Z) - A Survey on Transformer Compression [84.18094368700379]
Transformer plays a vital role in the realms of natural language processing (NLP) and computer vision (CV)
Model compression methods reduce the memory and computational cost of Transformer.
This survey provides a comprehensive review of recent compression methods, with a specific focus on their application to Transformer-based models.
arXiv Detail & Related papers (2024-02-05T12:16:28Z) - Activations and Gradients Compression for Model-Parallel Training [85.99744701008802]
We study how simultaneous compression of activations and gradients in model-parallel distributed training setup affects convergence.
We find that gradients require milder compression rates than activations.
Experiments also show that models trained with TopK perform well only when compression is also applied during inference.
arXiv Detail & Related papers (2024-01-15T15:54:54Z) - Does compressing activations help model parallel training? [64.59298055364336]
We present the first empirical study on the effectiveness of compression methods for model parallelism.
We implement and evaluate three common classes of compression algorithms.
We evaluate these methods across more than 160 settings and 8 popular datasets.
arXiv Detail & Related papers (2023-01-06T18:58:09Z) - Deep learning model compression using network sensitivity and gradients [3.52359746858894]
We present model compression algorithms for both non-retraining and retraining conditions.
In the first case, we propose the Bin & Quant algorithm for compression of the deep learning models using the sensitivity of the network parameters.
In the second case, we propose our novel gradient-weighted k-means clustering algorithm (GWK)
arXiv Detail & Related papers (2022-10-11T03:02:40Z) - PowerGossip: Practical Low-Rank Communication Compression in
Decentralized Deep Learning [62.440827696638664]
We introduce a simple algorithm that directly compresses the model differences between neighboring workers.
Inspired by the PowerSGD for centralized deep learning, this algorithm uses power steps to maximize the information transferred per bit.
arXiv Detail & Related papers (2020-08-04T09:14:52Z) - Self-Supervised GAN Compression [32.21713098893454]
We show that a standard model compression technique, weight pruning, cannot be applied to GANs using existing methods.
We then develop a self-supervised compression technique which uses the trained discriminator to supervise the training of a compressed generator.
We show that this framework has a compelling performance to high degrees of sparsity, can be easily applied to new tasks and models, and enables meaningful comparisons between different pruning granularities.
arXiv Detail & Related papers (2020-07-03T04:18:54Z) - Neural Network Compression Framework for fast model inference [59.65531492759006]
We present a new framework for neural networks compression with fine-tuning, which we called Neural Network Compression Framework (NNCF)
It leverages recent advances of various network compression methods and implements some of them, such as sparsity, quantization, and binarization.
The framework can be used within the training samples, which are supplied with it, or as a standalone package that can be seamlessly integrated into the existing training code.
arXiv Detail & Related papers (2020-02-20T11:24:01Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.