Compressing Deep Neural Networks via Layer Fusion
- URL: http://arxiv.org/abs/2007.14917v1
- Date: Wed, 29 Jul 2020 15:43:19 GMT
- Title: Compressing Deep Neural Networks via Layer Fusion
- Authors: James O' Neill, Greg Ver Steeg and Aram Galstyan
- Abstract summary: textitlayer fusion is a model compression technique that discovers which weights to combine and then fuses weights of similar fully-connected, convolutional and attention layers.
Layer fusion can significantly reduce the number of layers of the original network with little additional overhead, while maintaining competitive performance.
- Score: 32.80630183210368
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: This paper proposes \textit{layer fusion} - a model compression technique
that discovers which weights to combine and then fuses weights of similar
fully-connected, convolutional and attention layers. Layer fusion can
significantly reduce the number of layers of the original network with little
additional computation overhead, while maintaining competitive performance.
From experiments on CIFAR-10, we find that various deep convolution neural
networks can remain within 2\% accuracy points of the original networks up to a
compression ratio of 3.33 when iteratively retrained with layer fusion. For
experiments on the WikiText-2 language modelling dataset where pretrained
transformer models are used, we achieve compression that leads to a network
that is 20\% of its original size while being within 5 perplexity points of the
original network. We also find that other well-established compression
techniques can achieve competitive performance when compared to their original
networks given a sufficient number of retraining steps. Generally, we observe a
clear inflection point in performance as the amount of compression increases,
suggesting a bound on the amount of compression that can be achieved before an
exponential degradation in performance.
Related papers
- Low-rank Tensor Decomposition for Compression of Convolutional Neural
Networks Using Funnel Regularization [1.8579693774597708]
We propose a model reduction method to compress the pre-trained networks using low-rank tensor decomposition.
A new regularization method, called funnel function, is proposed to suppress the unimportant factors during the compression.
For ResNet18 with ImageNet2012, our reduced model can reach more than twi times speed up in terms of GMAC with merely 0.7% Top-1 accuracy drop.
arXiv Detail & Related papers (2021-12-07T13:41:51Z) - Low-Rank+Sparse Tensor Compression for Neural Networks [11.632913694957868]
We propose to combine low-rank tensor decomposition with sparse pruning in order to take advantage of both coarse and fine structure for compression.
We compress weights in SOTA architectures (MobileNetv3, EfficientNet, Vision Transformer) and compare this approach to sparse pruning and tensor decomposition alone.
arXiv Detail & Related papers (2021-11-02T15:55:07Z) - Compact representations of convolutional neural networks via weight
pruning and quantization [63.417651529192014]
We propose a novel storage format for convolutional neural networks (CNNs) based on source coding and leveraging both weight pruning and quantization.
We achieve a reduction of space occupancy up to 0.6% on fully connected layers and 5.44% on the whole network, while performing at least as competitive as the baseline.
arXiv Detail & Related papers (2021-08-28T20:39:54Z) - Compressing Neural Networks: Towards Determining the Optimal Layer-wise
Decomposition [62.41259783906452]
We present a novel global compression framework for deep neural networks.
It automatically analyzes each layer to identify the optimal per-layer compression ratio.
Our results open up new avenues for future research into the global performance-size trade-offs of modern neural networks.
arXiv Detail & Related papers (2021-07-23T20:01:30Z) - Towards Compact CNNs via Collaborative Compression [166.86915086497433]
We propose a Collaborative Compression scheme, which joints channel pruning and tensor decomposition to compress CNN models.
We achieve 52.9% FLOPs reduction by removing 48.4% parameters on ResNet-50 with only a Top-1 accuracy drop of 0.56% on ImageNet 2012.
arXiv Detail & Related papers (2021-05-24T12:07:38Z) - An Efficient Statistical-based Gradient Compression Technique for
Distributed Training Systems [77.88178159830905]
Sparsity-Inducing Distribution-based Compression (SIDCo) is a threshold-based sparsification scheme that enjoys similar threshold estimation quality to deep gradient compression (DGC)
Our evaluation shows SIDCo speeds up training by up to 41:7%, 7:6%, and 1:9% compared to the no-compression baseline, Topk, and DGC compressors, respectively.
arXiv Detail & Related papers (2021-01-26T13:06:00Z) - Layer-Wise Data-Free CNN Compression [49.73757297936685]
We show how to generate layer-wise training data using only a pretrained network.
We present results for layer-wise compression using quantization and pruning.
arXiv Detail & Related papers (2020-11-18T03:00:05Z) - Permute, Quantize, and Fine-tune: Efficient Compression of Neural
Networks [70.0243910593064]
Key to success of vector quantization is deciding which parameter groups should be compressed together.
In this paper we make the observation that the weights of two adjacent layers can be permuted while expressing the same function.
We then establish a connection to rate-distortion theory and search for permutations that result in networks that are easier to compress.
arXiv Detail & Related papers (2020-10-29T15:47:26Z) - Tensor Reordering for CNN Compression [7.228285747845778]
We show how parameter redundancy in Convolutional Neural Network (CNN) filters can be effectively reduced by pruning in spectral domain.
Our approach is applied to pretrained CNNs and we show that minor additional fine-tuning allows our method to recover the original model performance.
arXiv Detail & Related papers (2020-10-22T23:45:34Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.