Related papers: A Theoretical Understanding of Neural Network Compression from Sparse Linear Approximation

A Theoretical Understanding of Neural Network Compression from Sparse Linear Approximation

URL: http://arxiv.org/abs/2206.05604v1
Date: Sat, 11 Jun 2022 20:10:35 GMT
Title: A Theoretical Understanding of Neural Network Compression from Sparse Linear Approximation
Authors: Wenjing Yang, Ganghua Wang, Enmao Diao, Vahid Tarokh, Jie Ding, Yuhong Yang
Abstract summary: The goal of model compression is to reduce the size of a large neural network while retaining a comparable performance. We use sparsity-sensitive $ell_q$-norm to characterize compressibility and provide a relationship between soft sparsity of the weights in the network and the degree of compression. We also develop adaptive algorithms for pruning each neuron in the network informed by our theory.
Score: 37.525277809849776
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: The goal of model compression is to reduce the size of a large neural network while retaining a comparable performance. As a result, computation and memory costs in resource-limited applications may be significantly reduced by dropping redundant weights, neurons, or layers. There have been many model compression algorithms proposed that provide impressive empirical success. However, a theoretical understanding of model compression is still limited. One problem is understanding if a network is more compressible than another of the same structure. Another problem is quantifying how much one can prune a network with theoretically guaranteed accuracy degradation. In this work, we propose to use the sparsity-sensitive $\ell_q$-norm ($0<q<1$) to characterize compressibility and provide a relationship between soft sparsity of the weights in the network and the degree of compression with a controlled accuracy degradation bound. We also develop adaptive algorithms for pruning each neuron in the network informed by our theory. Numerical studies demonstrate the promising performance of the proposed methods compared with standard pruning algorithms.

Related papers

Linearity-based neural network compression [1.2200609701777907]
We introduce the theory underlying this compression and evaluate our approach experimentally.<n>Applying our method on already importance-based pruned models shows very little interference between different types of compression.
arXiv Detail & Related papers (2025-06-26T11:04:12Z)
Reducing Storage of Pretrained Neural Networks by Rate-Constrained Quantization and Entropy Coding [56.066799081747845]
The ever-growing size of neural networks poses serious challenges on resource-constrained devices.<n>We propose a novel post-training compression framework that combines rate-aware quantization with entropy coding.<n>Our method allows for very fast decoding and is compatible with arbitrary quantization grids.
arXiv Detail & Related papers (2025-05-24T15:52:49Z)
Theoretical Guarantees for Low-Rank Compression of Deep Neural Networks [5.582683296425384]
Deep neural networks have achieved state-of-the-art performance across numerous applications. Low-rank approximation techniques offer a promising solution by reducing the size and complexity of these networks. We develop an analytical framework for data-driven post-training low-rank compression.
arXiv Detail & Related papers (2025-02-04T23:10:13Z)
"Lossless" Compression of Deep Neural Networks: A High-dimensional Neural Tangent Kernel Approach [49.744093838327615]
We provide a novel compression approach to wide and fully-connected emphdeep neural nets. Experiments on both synthetic and real-world data are conducted to support the advantages of the proposed compression scheme.
arXiv Detail & Related papers (2024-03-01T03:46:28Z)
Fast Conditional Network Compression Using Bayesian HyperNetworks [54.06346724244786]
We introduce a conditional compression problem and propose a fast framework for tackling it. The problem is how to quickly compress a pretrained large neural network into optimal smaller networks given target contexts. Our methods can quickly generate compressed networks with significantly smaller sizes than baseline methods.
arXiv Detail & Related papers (2022-05-13T00:28:35Z)
Low-rank Tensor Decomposition for Compression of Convolutional Neural Networks Using Funnel Regularization [1.8579693774597708]
We propose a model reduction method to compress the pre-trained networks using low-rank tensor decomposition. A new regularization method, called funnel function, is proposed to suppress the unimportant factors during the compression. For ResNet18 with ImageNet2012, our reduced model can reach more than twi times speed up in terms of GMAC with merely 0.7% Top-1 accuracy drop.
arXiv Detail & Related papers (2021-12-07T13:41:51Z)
Low-Rank+Sparse Tensor Compression for Neural Networks [11.632913694957868]
We propose to combine low-rank tensor decomposition with sparse pruning in order to take advantage of both coarse and fine structure for compression. We compress weights in SOTA architectures (MobileNetv3, EfficientNet, Vision Transformer) and compare this approach to sparse pruning and tensor decomposition alone.
arXiv Detail & Related papers (2021-11-02T15:55:07Z)
Compact representations of convolutional neural networks via weight pruning and quantization [63.417651529192014]
We propose a novel storage format for convolutional neural networks (CNNs) based on source coding and leveraging both weight pruning and quantization. We achieve a reduction of space occupancy up to 0.6% on fully connected layers and 5.44% on the whole network, while performing at least as competitive as the baseline.
arXiv Detail & Related papers (2021-08-28T20:39:54Z)
An Information Theory-inspired Strategy for Automatic Network Pruning [88.51235160841377]
Deep convolution neural networks are well known to be compressed on devices with resource constraints. Most existing network pruning methods require laborious human efforts and prohibitive computation resources. We propose an information theory-inspired strategy for automatic model compression.
arXiv Detail & Related papers (2021-08-19T07:03:22Z)
Compressing Neural Networks: Towards Determining the Optimal Layer-wise Decomposition [62.41259783906452]
We present a novel global compression framework for deep neural networks. It automatically analyzes each layer to identify the optimal per-layer compression ratio. Our results open up new avenues for future research into the global performance-size trade-offs of modern neural networks.
arXiv Detail & Related papers (2021-07-23T20:01:30Z)
Heavy Tails in SGD and Compressibility of Overparametrized Neural Networks [9.554646174100123]
We show that the dynamics of the gradient descent training algorithm has a key role in obtaining compressible networks. We prove that the networks are guaranteed to be '$ell_p$-compressible', and the compression errors of different pruning techniques become arbitrarily small as the network size increases.
arXiv Detail & Related papers (2021-06-07T17:02:59Z)
Successive Pruning for Model Compression via Rate Distortion Theory [15.598364403631528]
We study NN compression from an information-theoretic approach and show that rate distortion theory suggests pruning to achieve the theoretical limits of NN compression. Our derivation also provides an end-to-end compression pipeline involving a novel pruning strategy. Our method consistently outperforms the existing pruning strategies and reduces the pruned model's size by 2.5 times.
arXiv Detail & Related papers (2021-02-16T18:17:57Z)
Pruning and Quantization for Deep Neural Network Acceleration: A Survey [2.805723049889524]
Deep neural networks have been applied in many applications exhibiting extraordinary abilities in the field of computer vision. Complex network architectures challenge efficient real-time deployment and require significant computation resources and energy costs. This paper provides a survey on two types of network compression: pruning and quantization.
arXiv Detail & Related papers (2021-01-24T08:21:04Z)
Permute, Quantize, and Fine-tune: Efficient Compression of Neural Networks [70.0243910593064]
Key to success of vector quantization is deciding which parameter groups should be compressed together. In this paper we make the observation that the weights of two adjacent layers can be permuted while expressing the same function. We then establish a connection to rate-distortion theory and search for permutations that result in networks that are easier to compress.
arXiv Detail & Related papers (2020-10-29T15:47:26Z)

This list is automatically generated from the titles and abstracts of the papers in this site.