Related papers: Permute, Quantize, and Fine-tune: Efficient Compression of Neural Networks

Permute, Quantize, and Fine-tune: Efficient Compression of Neural Networks

URL: http://arxiv.org/abs/2010.15703v3
Date: Sat, 10 Apr 2021 22:27:02 GMT
Title: Permute, Quantize, and Fine-tune: Efficient Compression of Neural Networks
Authors: Julieta Martinez, Jashan Shewakramani, Ting Wei Liu, Ioan Andrei B\^arsan, Wenyuan Zeng, Raquel Urtasun
Abstract summary: Key to success of vector quantization is deciding which parameter groups should be compressed together. In this paper we make the observation that the weights of two adjacent layers can be permuted while expressing the same function. We then establish a connection to rate-distortion theory and search for permutations that result in networks that are easier to compress.
Score: 70.0243910593064
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Compressing large neural networks is an important step for their deployment in resource-constrained computational platforms. In this context, vector quantization is an appealing framework that expresses multiple parameters using a single code, and has recently achieved state-of-the-art network compression on a range of core vision and natural language processing tasks. Key to the success of vector quantization is deciding which parameter groups should be compressed together. Previous work has relied on heuristics that group the spatial dimension of individual convolutional filters, but a general solution remains unaddressed. This is desirable for pointwise convolutions (which dominate modern architectures), linear layers (which have no notion of spatial dimension), and convolutions (when more than one filter is compressed to the same codeword). In this paper we make the observation that the weights of two adjacent layers can be permuted while expressing the same function. We then establish a connection to rate-distortion theory and search for permutations that result in networks that are easier to compress. Finally, we rely on an annealed quantization algorithm to better compress the network and achieve higher final accuracy. We show results on image classification, object detection, and segmentation, reducing the gap with the uncompressed model by 40 to 70% with respect to the current state of the art.

Related papers

Approximating Continuous Convolutions for Deep Network Compression [11.566258236184964]
We present ApproxConv, a novel method for compressing the layers of a convolutional neural network. We show that our method is able to compress existing deep network models by half whilst losing only 1.86% accuracy.
arXiv Detail & Related papers (2022-10-17T11:41:26Z)
A Theoretical Understanding of Neural Network Compression from Sparse Linear Approximation [37.525277809849776]
The goal of model compression is to reduce the size of a large neural network while retaining a comparable performance. We use sparsity-sensitive $ell_q$-norm to characterize compressibility and provide a relationship between soft sparsity of the weights in the network and the degree of compression. We also develop adaptive algorithms for pruning each neuron in the network informed by our theory.
arXiv Detail & Related papers (2022-06-11T20:10:35Z)
Fast Conditional Network Compression Using Bayesian HyperNetworks [54.06346724244786]
We introduce a conditional compression problem and propose a fast framework for tackling it. The problem is how to quickly compress a pretrained large neural network into optimal smaller networks given target contexts. Our methods can quickly generate compressed networks with significantly smaller sizes than baseline methods.
arXiv Detail & Related papers (2022-05-13T00:28:35Z)
COIN++: Data Agnostic Neural Compression [55.27113889737545]
COIN++ is a neural compression framework that seamlessly handles a wide range of data modalities. We demonstrate the effectiveness of our method by compressing various data modalities.
arXiv Detail & Related papers (2022-01-30T20:12:04Z)
Compressing Neural Networks: Towards Determining the Optimal Layer-wise Decomposition [62.41259783906452]
We present a novel global compression framework for deep neural networks. It automatically analyzes each layer to identify the optimal per-layer compression ratio. Our results open up new avenues for future research into the global performance-size trade-offs of modern neural networks.
arXiv Detail & Related papers (2021-07-23T20:01:30Z)
DeepReduce: A Sparse-tensor Communication Framework for Distributed Deep Learning [79.89085533866071]
This paper introduces DeepReduce, a versatile framework for the compressed communication of sparse tensors. DeepReduce decomposes tensors in two sets, values and indices, and allows both independent and combined compression of these sets. Our experiments with large real models demonstrate that DeepReduce transmits fewer data and imposes lower computational overhead than existing methods.
arXiv Detail & Related papers (2021-02-05T11:31:24Z)
Convolutional neural networks compression with low rank and sparse tensor decompositions [0.0]
Convolutional neural networks show outstanding results in a variety of computer vision tasks. For some real-world applications, it is crucial to develop models, which can be fast and light enough to run on edge systems and mobile devices. In this work, we consider a neural network compression method based on tensor decompositions.
arXiv Detail & Related papers (2020-06-11T13:53:18Z)
Cross-filter compression for CNN inference acceleration [4.324080238456531]
We propose a new cross-filter compression method that can provide $sim32times$ memory savings and $122times$ speed up in convolution operations. Our method, based on Binary-Weight and XNOR-Net separately, is evaluated on CIFAR-10 and ImageNet dataset.
arXiv Detail & Related papers (2020-05-18T19:06:14Z)
A "Network Pruning Network" Approach to Deep Model Compression [62.68120664998911]
We present a filter pruning approach for deep model compression using a multitask network. Our approach is based on learning a a pruner network to prune a pre-trained target network. The compressed model produced by our approach is generic and does not need any special hardware/software support.
arXiv Detail & Related papers (2020-01-15T20:38:23Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.