To Filter Prune, or to Layer Prune, That Is The Question
- URL: http://arxiv.org/abs/2007.05667v3
- Date: Sun, 8 Nov 2020 17:48:23 GMT
- Title: To Filter Prune, or to Layer Prune, That Is The Question
- Authors: Sara Elkerdawy, Mostafa Elhoushi, Abhineet Singh, Hong Zhang and
Nilanjan Ray
- Abstract summary: We show the limitation of filter pruning methods in terms of latency reduction.
We present a set of layer pruning methods based on different criteria that achieve higher latency reduction than filter pruning methods on similar accuracy.
LayerPrune also outperforms handcrafted architectures such as Shufflenet, MobileNet, MNASNet and ResNet18 by 7.3%, 4.6%, 2.8% and 0.5% respectively on similar latency budget on ImageNet dataset.
- Score: 13.450136532402226
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Recent advances in pruning of neural networks have made it possible to remove
a large number of filters or weights without any perceptible drop in accuracy.
The number of parameters and that of FLOPs are usually the reported metrics to
measure the quality of the pruned models. However, the gain in speed for these
pruned models is often overlooked in the literature due to the complex nature
of latency measurements. In this paper, we show the limitation of filter
pruning methods in terms of latency reduction and propose LayerPrune framework.
LayerPrune presents a set of layer pruning methods based on different criteria
that achieve higher latency reduction than filter pruning methods on similar
accuracy. The advantage of layer pruning over filter pruning in terms of
latency reduction is a result of the fact that the former is not constrained by
the original model's depth and thus allows for a larger range of latency
reduction. For each filter pruning method we examined, we use the same filter
importance criterion to calculate a per-layer importance score in one-shot. We
then prune the least important layers and fine-tune the shallower model which
obtains comparable or better accuracy than its filter-based pruning
counterpart. This one-shot process allows to remove layers from single path
networks like VGG before fine-tuning, unlike in iterative filter pruning, a
minimum number of filters per layer is required to allow for data flow which
constraint the search space. To the best of our knowledge, we are the first to
examine the effect of pruning methods on latency metric instead of FLOPs for
multiple networks, datasets and hardware targets. LayerPrune also outperforms
handcrafted architectures such as Shufflenet, MobileNet, MNASNet and ResNet18
by 7.3%, 4.6%, 2.8% and 0.5% respectively on similar latency budget on ImageNet
dataset.
Related papers
- Filter Pruning for Efficient CNNs via Knowledge-driven Differential
Filter Sampler [103.97487121678276]
Filter pruning simultaneously accelerates the computation and reduces the memory overhead of CNNs.
We propose a novel Knowledge-driven Differential Filter Sampler(KDFS) with Masked Filter Modeling(MFM) framework for filter pruning.
arXiv Detail & Related papers (2023-07-01T02:28:41Z) - Cut Inner Layers: A Structured Pruning Strategy for Efficient U-Net GANs [2.8360662552057323]
This study conducts structured pruning on U-Net generators of conditional GANs.
A per-layer sensitivity analysis confirms that many unnecessary filters exist in the innermost layers near the bottleneck and can be substantially pruned.
arXiv Detail & Related papers (2022-06-29T13:55:36Z) - Asymptotic Soft Cluster Pruning for Deep Neural Networks [5.311178623385279]
Filter pruning method introduces structural sparsity by removing selected filters.
We propose a novel filter pruning method called Asymptotic Soft Cluster Pruning.
Our method can achieve competitive results compared with many state-of-the-art algorithms.
arXiv Detail & Related papers (2022-06-16T13:58:58Z) - End-to-End Sensitivity-Based Filter Pruning [49.61707925611295]
We present a sensitivity-based filter pruning algorithm (SbF-Pruner) to learn the importance scores of filters of each layer end-to-end.
Our method learns the scores from the filter weights, enabling it to account for the correlations between the filters of each layer.
arXiv Detail & Related papers (2022-04-15T10:21:05Z) - Pruning Networks with Cross-Layer Ranking & k-Reciprocal Nearest Filters [151.2423480789271]
A novel pruning method, termed CLR-RNF, is proposed for filter-level network pruning.
We conduct image classification on CIFAR-10 and ImageNet to demonstrate the superiority of our CLR-RNF over the state-of-the-arts.
arXiv Detail & Related papers (2022-02-15T04:53:24Z) - Data Agnostic Filter Gating for Efficient Deep Networks [72.4615632234314]
Current filter pruning methods mainly leverage feature maps to generate important scores for filters and prune those with smaller scores.
In this paper, we propose a data filter pruning method that uses an auxiliary network named Dagger module to induce pruning.
In addition, to help prune filters with certain FLOPs constraints, we leverage an explicit FLOPs-aware regularization to directly promote pruning filters toward target FLOPs.
arXiv Detail & Related papers (2020-10-28T15:26:40Z) - Towards Optimal Filter Pruning with Balanced Performance and Pruning
Speed [17.115185960327665]
We propose a balanced filter pruning method for both performance and pruning speed.
Our method is able to prune a layer with approximate layer-wise optimal pruning rate at preset loss variation.
The proposed pruning method is widely applicable to common architectures and does not involve any additional training except the final fine-tuning.
arXiv Detail & Related papers (2020-10-14T06:17:09Z) - Dependency Aware Filter Pruning [74.69495455411987]
Pruning a proportion of unimportant filters is an efficient way to mitigate the inference cost.
Previous work prunes filters according to their weight norms or the corresponding batch-norm scaling factors.
We propose a novel mechanism to dynamically control the sparsity-inducing regularization so as to achieve the desired sparsity.
arXiv Detail & Related papers (2020-05-06T07:41:22Z) - A "Network Pruning Network" Approach to Deep Model Compression [62.68120664998911]
We present a filter pruning approach for deep model compression using a multitask network.
Our approach is based on learning a a pruner network to prune a pre-trained target network.
The compressed model produced by our approach is generic and does not need any special hardware/software support.
arXiv Detail & Related papers (2020-01-15T20:38:23Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.