Comprehensive Online Network Pruning via Learnable Scaling Factors
- URL: http://arxiv.org/abs/2010.02623v1
- Date: Tue, 6 Oct 2020 11:04:17 GMT
- Title: Comprehensive Online Network Pruning via Learnable Scaling Factors
- Authors: Muhammad Umair Haider, Murtaza Taj
- Abstract summary: Deep CNNs can either be pruned width-wise by removing filters based on their importance or depth-wise by removing layers and blocks.
We propose a comprehensive pruning strategy that can perform both width-wise as well as depth-wise pruning.
- Score: 3.274290296343038
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: One of the major challenges in deploying deep neural network architectures is
their size which has an adverse effect on their inference time and memory
requirements. Deep CNNs can either be pruned width-wise by removing filters
based on their importance or depth-wise by removing layers and blocks. Width
wise pruning (filter pruning) is commonly performed via learnable gates or
switches and sparsity regularizers whereas pruning of layers has so far been
performed arbitrarily by manually designing a smaller network usually referred
to as a student network. We propose a comprehensive pruning strategy that can
perform both width-wise as well as depth-wise pruning. This is achieved by
introducing gates at different granularities (neuron, filter, layer, block)
which are then controlled via an objective function that simultaneously
performs pruning at different granularity during each forward pass. Our
approach is applicable to wide-variety of architectures without any constraints
on spatial dimensions or connection type (sequential, residual, parallel or
inception). Our method has resulted in a compression ratio of 70% to 90%
without noticeable loss in accuracy when evaluated on benchmark datasets.
Related papers
- RL-Pruner: Structured Pruning Using Reinforcement Learning for CNN Compression and Acceleration [0.0]
We propose RL-Pruner, which uses reinforcement learning to learn the optimal pruning distribution.
RL-Pruner can automatically extract dependencies between filters in the input model and perform pruning, without requiring model-specific pruning implementations.
arXiv Detail & Related papers (2024-11-10T13:35:10Z) - Feature-Learning Networks Are Consistent Across Widths At Realistic
Scales [72.27228085606147]
We study the effect of width on the dynamics of feature-learning neural networks across a variety of architectures and datasets.
Early in training, wide neural networks trained on online data have not only identical loss curves but also agree in their point-wise test predictions throughout training.
We observe, however, that ensembles of narrower networks perform worse than a single wide network.
arXiv Detail & Related papers (2023-05-28T17:09:32Z) - Basis Scaling and Double Pruning for Efficient Inference in
Network-Based Transfer Learning [1.3467579878240454]
We decompose a convolutional layer into two layers: a convolutional layer with the orthonormal basis vectors as the filters, and a "BasisScalingConv" layer which is responsible for rescaling the features.
We can achieve pruning ratios up to 74.6% for CIFAR-10 and 98.9% for MNIST in model parameters.
arXiv Detail & Related papers (2021-08-06T00:04:02Z) - Group Fisher Pruning for Practical Network Compression [58.25776612812883]
We present a general channel pruning approach that can be applied to various complicated structures.
We derive a unified metric based on Fisher information to evaluate the importance of a single channel and coupled channels.
Our method can be used to prune any structures including those with coupled channels.
arXiv Detail & Related papers (2021-08-02T08:21:44Z) - Permute, Quantize, and Fine-tune: Efficient Compression of Neural
Networks [70.0243910593064]
Key to success of vector quantization is deciding which parameter groups should be compressed together.
In this paper we make the observation that the weights of two adjacent layers can be permuted while expressing the same function.
We then establish a connection to rate-distortion theory and search for permutations that result in networks that are easier to compress.
arXiv Detail & Related papers (2020-10-29T15:47:26Z) - Grow-Push-Prune: aligning deep discriminants for effective structural
network compression [5.532477732693]
This paper attempts to derive task-dependent compact models from a deep discriminant analysis perspective.
We propose an iterative and proactive approach for classification tasks which alternates between a pushing step and a pruning step.
Experiments on the MNIST, CIFAR10, and ImageNet datasets demonstrate our approach's efficacy.
arXiv Detail & Related papers (2020-09-29T01:29:23Z) - On the Predictability of Pruning Across Scales [29.94870276983399]
We show that the error of magnitude-pruned networks empirically follows a scaling law with interpretable coefficients that depend on the architecture and task.
As neural networks become ever larger and costlier to train, our findings suggest a framework for reasoning conceptually and analytically about a standard method for unstructured pruning.
arXiv Detail & Related papers (2020-06-18T15:41:46Z) - Dependency Aware Filter Pruning [74.69495455411987]
Pruning a proportion of unimportant filters is an efficient way to mitigate the inference cost.
Previous work prunes filters according to their weight norms or the corresponding batch-norm scaling factors.
We propose a novel mechanism to dynamically control the sparsity-inducing regularization so as to achieve the desired sparsity.
arXiv Detail & Related papers (2020-05-06T07:41:22Z) - DHP: Differentiable Meta Pruning via HyperNetworks [158.69345612783198]
This paper introduces a differentiable pruning method via hypernetworks for automatic network pruning.
Latent vectors control the output channels of the convolutional layers in the backbone network and act as a handle for the pruning of the layers.
Experiments are conducted on various networks for image classification, single image super-resolution, and denoising.
arXiv Detail & Related papers (2020-03-30T17:59:18Z) - A "Network Pruning Network" Approach to Deep Model Compression [62.68120664998911]
We present a filter pruning approach for deep model compression using a multitask network.
Our approach is based on learning a a pruner network to prune a pre-trained target network.
The compressed model produced by our approach is generic and does not need any special hardware/software support.
arXiv Detail & Related papers (2020-01-15T20:38:23Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.