A Theoretical Understanding of Neural Network Compression from Sparse
Linear Approximation
- URL: http://arxiv.org/abs/2206.05604v1
- Date: Sat, 11 Jun 2022 20:10:35 GMT
- Title: A Theoretical Understanding of Neural Network Compression from Sparse
Linear Approximation
- Authors: Wenjing Yang, Ganghua Wang, Enmao Diao, Vahid Tarokh, Jie Ding, Yuhong
Yang
- Abstract summary: The goal of model compression is to reduce the size of a large neural network while retaining a comparable performance.
We use sparsity-sensitive $ell_q$-norm to characterize compressibility and provide a relationship between soft sparsity of the weights in the network and the degree of compression.
We also develop adaptive algorithms for pruning each neuron in the network informed by our theory.
- Score: 37.525277809849776
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The goal of model compression is to reduce the size of a large neural network
while retaining a comparable performance. As a result, computation and memory
costs in resource-limited applications may be significantly reduced by dropping
redundant weights, neurons, or layers. There have been many model compression
algorithms proposed that provide impressive empirical success. However, a
theoretical understanding of model compression is still limited. One problem is
understanding if a network is more compressible than another of the same
structure. Another problem is quantifying how much one can prune a network with
theoretically guaranteed accuracy degradation. In this work, we propose to use
the sparsity-sensitive $\ell_q$-norm ($0<q<1$) to characterize compressibility
and provide a relationship between soft sparsity of the weights in the network
and the degree of compression with a controlled accuracy degradation bound. We
also develop adaptive algorithms for pruning each neuron in the network
informed by our theory. Numerical studies demonstrate the promising performance
of the proposed methods compared with standard pruning algorithms.
Related papers
- "Lossless" Compression of Deep Neural Networks: A High-dimensional
Neural Tangent Kernel Approach [49.744093838327615]
We provide a novel compression approach to wide and fully-connected emphdeep neural nets.
Experiments on both synthetic and real-world data are conducted to support the advantages of the proposed compression scheme.
arXiv Detail & Related papers (2024-03-01T03:46:28Z) - Fast Conditional Network Compression Using Bayesian HyperNetworks [54.06346724244786]
We introduce a conditional compression problem and propose a fast framework for tackling it.
The problem is how to quickly compress a pretrained large neural network into optimal smaller networks given target contexts.
Our methods can quickly generate compressed networks with significantly smaller sizes than baseline methods.
arXiv Detail & Related papers (2022-05-13T00:28:35Z) - Low-rank Tensor Decomposition for Compression of Convolutional Neural
Networks Using Funnel Regularization [1.8579693774597708]
We propose a model reduction method to compress the pre-trained networks using low-rank tensor decomposition.
A new regularization method, called funnel function, is proposed to suppress the unimportant factors during the compression.
For ResNet18 with ImageNet2012, our reduced model can reach more than twi times speed up in terms of GMAC with merely 0.7% Top-1 accuracy drop.
arXiv Detail & Related papers (2021-12-07T13:41:51Z) - Low-Rank+Sparse Tensor Compression for Neural Networks [11.632913694957868]
We propose to combine low-rank tensor decomposition with sparse pruning in order to take advantage of both coarse and fine structure for compression.
We compress weights in SOTA architectures (MobileNetv3, EfficientNet, Vision Transformer) and compare this approach to sparse pruning and tensor decomposition alone.
arXiv Detail & Related papers (2021-11-02T15:55:07Z) - Compact representations of convolutional neural networks via weight
pruning and quantization [63.417651529192014]
We propose a novel storage format for convolutional neural networks (CNNs) based on source coding and leveraging both weight pruning and quantization.
We achieve a reduction of space occupancy up to 0.6% on fully connected layers and 5.44% on the whole network, while performing at least as competitive as the baseline.
arXiv Detail & Related papers (2021-08-28T20:39:54Z) - An Information Theory-inspired Strategy for Automatic Network Pruning [88.51235160841377]
Deep convolution neural networks are well known to be compressed on devices with resource constraints.
Most existing network pruning methods require laborious human efforts and prohibitive computation resources.
We propose an information theory-inspired strategy for automatic model compression.
arXiv Detail & Related papers (2021-08-19T07:03:22Z) - Compressing Neural Networks: Towards Determining the Optimal Layer-wise
Decomposition [62.41259783906452]
We present a novel global compression framework for deep neural networks.
It automatically analyzes each layer to identify the optimal per-layer compression ratio.
Our results open up new avenues for future research into the global performance-size trade-offs of modern neural networks.
arXiv Detail & Related papers (2021-07-23T20:01:30Z) - Heavy Tails in SGD and Compressibility of Overparametrized Neural
Networks [9.554646174100123]
We show that the dynamics of the gradient descent training algorithm has a key role in obtaining compressible networks.
We prove that the networks are guaranteed to be '$ell_p$-compressible', and the compression errors of different pruning techniques become arbitrarily small as the network size increases.
arXiv Detail & Related papers (2021-06-07T17:02:59Z) - Successive Pruning for Model Compression via Rate Distortion Theory [15.598364403631528]
We study NN compression from an information-theoretic approach and show that rate distortion theory suggests pruning to achieve the theoretical limits of NN compression.
Our derivation also provides an end-to-end compression pipeline involving a novel pruning strategy.
Our method consistently outperforms the existing pruning strategies and reduces the pruned model's size by 2.5 times.
arXiv Detail & Related papers (2021-02-16T18:17:57Z) - Pruning and Quantization for Deep Neural Network Acceleration: A Survey [2.805723049889524]
Deep neural networks have been applied in many applications exhibiting extraordinary abilities in the field of computer vision.
Complex network architectures challenge efficient real-time deployment and require significant computation resources and energy costs.
This paper provides a survey on two types of network compression: pruning and quantization.
arXiv Detail & Related papers (2021-01-24T08:21:04Z) - Permute, Quantize, and Fine-tune: Efficient Compression of Neural
Networks [70.0243910593064]
Key to success of vector quantization is deciding which parameter groups should be compressed together.
In this paper we make the observation that the weights of two adjacent layers can be permuted while expressing the same function.
We then establish a connection to rate-distortion theory and search for permutations that result in networks that are easier to compress.
arXiv Detail & Related papers (2020-10-29T15:47:26Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.