Successive Pruning for Model Compression via Rate Distortion Theory
- URL: http://arxiv.org/abs/2102.08329v1
- Date: Tue, 16 Feb 2021 18:17:57 GMT
- Title: Successive Pruning for Model Compression via Rate Distortion Theory
- Authors: Berivan Isik, Albert No, Tsachy Weissman
- Abstract summary: We study NN compression from an information-theoretic approach and show that rate distortion theory suggests pruning to achieve the theoretical limits of NN compression.
Our derivation also provides an end-to-end compression pipeline involving a novel pruning strategy.
Our method consistently outperforms the existing pruning strategies and reduces the pruned model's size by 2.5 times.
- Score: 15.598364403631528
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Neural network (NN) compression has become essential to enable deploying
over-parameterized NN models on resource-constrained devices. As a simple and
easy-to-implement method, pruning is one of the most established NN compression
techniques. Although it is a mature method with more than 30 years of history,
there is still a lack of good understanding and systematic analysis of why
pruning works well even with aggressive compression ratios. In this work, we
answer this question by studying NN compression from an information-theoretic
approach and show that rate distortion theory suggests pruning to achieve the
theoretical limits of NN compression. Our derivation also provides an
end-to-end compression pipeline involving a novel pruning strategy. That is, in
addition to pruning the model, we also find a minimum-length binary
representation of it via entropy coding. Our method consistently outperforms
the existing pruning strategies and reduces the pruned model's size by 2.5
times. We evaluate the efficacy of our strategy on MNIST, CIFAR-10 and ImageNet
datasets using 5 distinct architectures.
Related papers
- "Lossless" Compression of Deep Neural Networks: A High-dimensional
Neural Tangent Kernel Approach [49.744093838327615]
We provide a novel compression approach to wide and fully-connected emphdeep neural nets.
Experiments on both synthetic and real-world data are conducted to support the advantages of the proposed compression scheme.
arXiv Detail & Related papers (2024-03-01T03:46:28Z) - Approximating Continuous Convolutions for Deep Network Compression [11.566258236184964]
We present ApproxConv, a novel method for compressing the layers of a convolutional neural network.
We show that our method is able to compress existing deep network models by half whilst losing only 1.86% accuracy.
arXiv Detail & Related papers (2022-10-17T11:41:26Z) - A Theoretical Understanding of Neural Network Compression from Sparse
Linear Approximation [37.525277809849776]
The goal of model compression is to reduce the size of a large neural network while retaining a comparable performance.
We use sparsity-sensitive $ell_q$-norm to characterize compressibility and provide a relationship between soft sparsity of the weights in the network and the degree of compression.
We also develop adaptive algorithms for pruning each neuron in the network informed by our theory.
arXiv Detail & Related papers (2022-06-11T20:10:35Z) - Implicit Neural Representations for Image Compression [103.78615661013623]
Implicit Neural Representations (INRs) have gained attention as a novel and effective representation for various data types.
We propose the first comprehensive compression pipeline based on INRs including quantization, quantization-aware retraining and entropy coding.
We find that our approach to source compression with INRs vastly outperforms similar prior work.
arXiv Detail & Related papers (2021-12-08T13:02:53Z) - Low-rank Tensor Decomposition for Compression of Convolutional Neural
Networks Using Funnel Regularization [1.8579693774597708]
We propose a model reduction method to compress the pre-trained networks using low-rank tensor decomposition.
A new regularization method, called funnel function, is proposed to suppress the unimportant factors during the compression.
For ResNet18 with ImageNet2012, our reduced model can reach more than twi times speed up in terms of GMAC with merely 0.7% Top-1 accuracy drop.
arXiv Detail & Related papers (2021-12-07T13:41:51Z) - Compact representations of convolutional neural networks via weight
pruning and quantization [63.417651529192014]
We propose a novel storage format for convolutional neural networks (CNNs) based on source coding and leveraging both weight pruning and quantization.
We achieve a reduction of space occupancy up to 0.6% on fully connected layers and 5.44% on the whole network, while performing at least as competitive as the baseline.
arXiv Detail & Related papers (2021-08-28T20:39:54Z) - An Information Theory-inspired Strategy for Automatic Network Pruning [88.51235160841377]
Deep convolution neural networks are well known to be compressed on devices with resource constraints.
Most existing network pruning methods require laborious human efforts and prohibitive computation resources.
We propose an information theory-inspired strategy for automatic model compression.
arXiv Detail & Related papers (2021-08-19T07:03:22Z) - Towards Compact CNNs via Collaborative Compression [166.86915086497433]
We propose a Collaborative Compression scheme, which joints channel pruning and tensor decomposition to compress CNN models.
We achieve 52.9% FLOPs reduction by removing 48.4% parameters on ResNet-50 with only a Top-1 accuracy drop of 0.56% on ImageNet 2012.
arXiv Detail & Related papers (2021-05-24T12:07:38Z) - Single-path Bit Sharing for Automatic Loss-aware Model Compression [126.98903867768732]
Single-path Bit Sharing (SBS) is able to significantly reduce computational cost while achieving promising performance.
Our SBS compressed MobileNetV2 achieves 22.6x Bit-Operation (BOP) reduction with only 0.1% drop in the Top-1 accuracy.
arXiv Detail & Related papers (2021-01-13T08:28:21Z) - Neural Network Compression Via Sparse Optimization [23.184290795230897]
We propose a model compression framework based on the recent progress on sparse optimization.
We achieve up to 7.2 and 2.9 times FLOPs reduction with the same level of evaluation of accuracy on VGG16 for CIFAR10 and ResNet50 for ImageNet.
arXiv Detail & Related papers (2020-11-10T03:03:55Z) - Tensor Reordering for CNN Compression [7.228285747845778]
We show how parameter redundancy in Convolutional Neural Network (CNN) filters can be effectively reduced by pruning in spectral domain.
Our approach is applied to pretrained CNNs and we show that minor additional fine-tuning allows our method to recover the original model performance.
arXiv Detail & Related papers (2020-10-22T23:45:34Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.