Neural Network Compression via Effective Filter Analysis and
Hierarchical Pruning
- URL: http://arxiv.org/abs/2206.03596v1
- Date: Tue, 7 Jun 2022 21:30:47 GMT
- Title: Neural Network Compression via Effective Filter Analysis and
Hierarchical Pruning
- Authors: Ziqi Zhou, Li Lian, Yilong Yin, Ze Wang
- Abstract summary: Current network compression methods have two open problems: first, there lacks a theoretical framework to estimate the maximum compression rate; second, some layers may get over-prunned, resulting in significant network performance drop.
This study propose a gradient-matrix singularity analysis-based method to estimate the maximum network redundancy.
Guided by that maximum rate, a novel and efficient hierarchical network pruning algorithm is developed to maximally condense the neuronal network structure without sacrificing network performance.
- Score: 41.19516938181544
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Network compression is crucial to making the deep networks to be more
efficient, faster, and generalizable to low-end hardware. Current network
compression methods have two open problems: first, there lacks a theoretical
framework to estimate the maximum compression rate; second, some layers may get
over-prunned, resulting in significant network performance drop. To solve these
two problems, this study propose a gradient-matrix singularity analysis-based
method to estimate the maximum network redundancy. Guided by that maximum rate,
a novel and efficient hierarchical network pruning algorithm is developed to
maximally condense the neuronal network structure without sacrificing network
performance. Substantial experiments are performed to demonstrate the efficacy
of the new method for pruning several advanced convolutional neural network
(CNN) architectures. Compared to existing pruning methods, the proposed pruning
algorithm achieved state-of-the-art performance. At the same or similar
compression ratio, the new method provided the highest network prediction
accuracy as compared to other methods.
Related papers
- Pushing the Efficiency Limit Using Structured Sparse Convolutions [82.31130122200578]
We propose Structured Sparse Convolution (SSC), which leverages the inherent structure in images to reduce the parameters in the convolutional filter.
We show that SSC is a generalization of commonly used layers (depthwise, groupwise and pointwise convolution) in efficient architectures''
Architectures based on SSC achieve state-of-the-art performance compared to baselines on CIFAR-10, CIFAR-100, Tiny-ImageNet, and ImageNet classification benchmarks.
arXiv Detail & Related papers (2022-10-23T18:37:22Z) - Neural Network Compression by Joint Sparsity Promotion and Redundancy
Reduction [4.9613162734482215]
This paper presents a novel training scheme based on composite constraints that prune redundant filters and minimize their effect on overall network learning via sparsity promotion.
Our tests on several pixel-wise segmentation benchmarks show that the number of neurons and the memory footprint of networks in the test phase are significantly reduced without affecting performance.
arXiv Detail & Related papers (2022-10-14T01:34:49Z) - i-SpaSP: Structured Neural Pruning via Sparse Signal Recovery [11.119895959906085]
We propose a novel, structured pruning algorithm for neural networks -- the iterative, Sparse Structured Pruning, dubbed as i-SpaSP.
i-SpaSP operates by identifying a larger set of important parameter groups within a network that contribute most to the residual between pruned and dense network output.
It is shown to discover high-performing sub-networks and improve upon the pruning efficiency of provable baseline methodologies by several orders of magnitude.
arXiv Detail & Related papers (2021-12-07T05:26:45Z) - Compact representations of convolutional neural networks via weight
pruning and quantization [63.417651529192014]
We propose a novel storage format for convolutional neural networks (CNNs) based on source coding and leveraging both weight pruning and quantization.
We achieve a reduction of space occupancy up to 0.6% on fully connected layers and 5.44% on the whole network, while performing at least as competitive as the baseline.
arXiv Detail & Related papers (2021-08-28T20:39:54Z) - Convolutional Neural Network Pruning with Structural Redundancy
Reduction [11.381864384054824]
We claim that identifying structural redundancy plays a more essential role than finding unimportant filters.
We propose a network pruning approach that identifies structural redundancy of a CNN and prunes filters in the selected layer(s) with the most redundancy.
arXiv Detail & Related papers (2021-04-08T00:16:24Z) - Manifold Regularized Dynamic Network Pruning [102.24146031250034]
This paper proposes a new paradigm that dynamically removes redundant filters by embedding the manifold information of all instances into the space of pruned networks.
The effectiveness of the proposed method is verified on several benchmarks, which shows better performance in terms of both accuracy and computational cost.
arXiv Detail & Related papers (2021-03-10T03:59:03Z) - ESPN: Extremely Sparse Pruned Networks [50.436905934791035]
We show that a simple iterative mask discovery method can achieve state-of-the-art compression of very deep networks.
Our algorithm represents a hybrid approach between single shot network pruning methods and Lottery-Ticket type approaches.
arXiv Detail & Related papers (2020-06-28T23:09:27Z) - Communication-Efficient Distributed Stochastic AUC Maximization with
Deep Neural Networks [50.42141893913188]
We study a distributed variable for large-scale AUC for a neural network as with a deep neural network.
Our model requires a much less number of communication rounds and still a number of communication rounds in theory.
Our experiments on several datasets show the effectiveness of our theory and also confirm our theory.
arXiv Detail & Related papers (2020-05-05T18:08:23Z) - Mixed-Precision Quantized Neural Network with Progressively Decreasing
Bitwidth For Image Classification and Object Detection [21.48875255723581]
A mixed-precision quantized neural network with progressively ecreasing bitwidth is proposed to improve the trade-off between accuracy and compression.
Experiments on typical network architectures and benchmark datasets demonstrate that the proposed method could achieve better or comparable results.
arXiv Detail & Related papers (2019-12-29T14:11:33Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.