BWCP: Probabilistic Learning-to-Prune Channels for ConvNets via Batch
Whitening
- URL: http://arxiv.org/abs/2105.06423v1
- Date: Thu, 13 May 2021 17:00:05 GMT
- Title: BWCP: Probabilistic Learning-to-Prune Channels for ConvNets via Batch
Whitening
- Authors: Wenqi Shao, Hang Yu, Zhaoyang Zhang, Hang Xu, Zhenguo Li, Ping Luo
- Abstract summary: This work presents a probabilistic channel pruning method to accelerate Convolutional Neural Networks (CNNs)
Previous pruning methods often zero out unimportant channels in training in a deterministic manner, which reduces CNN's learning capacity and results in suboptimal performance.
We develop a probability-based pruning algorithm, called batch whitening channel pruning (BWCP), which canally discard unimportant channels by modeling the probability of a channel being activated.
- Score: 63.081808698068365
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: This work presents a probabilistic channel pruning method to accelerate
Convolutional Neural Networks (CNNs). Previous pruning methods often zero out
unimportant channels in training in a deterministic manner, which reduces CNN's
learning capacity and results in suboptimal performance. To address this
problem, we develop a probability-based pruning algorithm, called batch
whitening channel pruning (BWCP), which can stochastically discard unimportant
channels by modeling the probability of a channel being activated. BWCP has
several merits. (1) It simultaneously trains and prunes CNNs from scratch in a
probabilistic way, exploring larger network space than deterministic methods.
(2) BWCP is empowered by the proposed batch whitening tool, which is able to
empirically and theoretically increase the activation probability of useful
channels while keeping unimportant channels unchanged without adding any extra
parameters and computational cost in inference. (3) Extensive experiments on
CIFAR-10, CIFAR-100, and ImageNet with various network architectures show that
BWCP outperforms its counterparts by achieving better accuracy given limited
computational budgets. For example, ResNet50 pruned by BWCP has only 0.70\%
Top-1 accuracy drop on ImageNet, while reducing 43.1\% FLOPs of the plain
ResNet50.
Related papers
- Pruning Very Deep Neural Network Channels for Efficient Inference [6.497816402045099]
Given a trained CNN model, we propose an iterative two-step algorithm to effectively prune each layer.
VGG-16 achieves the state-of-the-art results by 5x speed-up along with only 0.3% increase of error.
Our method is able to accelerate modern networks like ResNet, Xception and suffers only 1.4%, 1.0% accuracy loss under 2x speed-up respectively.
arXiv Detail & Related papers (2022-11-14T06:48:33Z) - CATRO: Channel Pruning via Class-Aware Trace Ratio Optimization [61.71504948770445]
We propose a novel channel pruning method via Class-Aware Trace Ratio Optimization (CATRO) to reduce the computational burden and accelerate the model inference.
We show that CATRO achieves higher accuracy with similar cost or lower cost with similar accuracy than other state-of-the-art channel pruning algorithms.
Because of its class-aware property, CATRO is suitable to prune efficient networks adaptively for various classification subtasks, enhancing handy deployment and usage of deep networks in real-world applications.
arXiv Detail & Related papers (2021-10-21T06:26:31Z) - AdaPruner: Adaptive Channel Pruning and Effective Weights Inheritance [9.3421559369389]
We propose a pruning framework that adaptively determines the number of each layer's channels as well as the wights inheritance criteria for sub-network.
AdaPruner allows to obtain pruned network quickly, accurately and efficiently.
On ImageNet, we reduce 32.8% FLOPs of MobileNetV2 with only 0.62% decrease for top-1 accuracy, which exceeds all previous state-of-the-art channel pruning methods.
arXiv Detail & Related papers (2021-09-14T01:52:05Z) - GDP: Stabilized Neural Network Pruning via Gates with Differentiable
Polarization [84.57695474130273]
Gate-based or importance-based pruning methods aim to remove channels whose importance is smallest.
GDP can be plugged before convolutional layers without bells and whistles, to control the on-and-off of each channel.
Experiments conducted over CIFAR-10 and ImageNet datasets show that the proposed GDP achieves the state-of-the-art performance.
arXiv Detail & Related papers (2021-09-06T03:17:10Z) - Group Fisher Pruning for Practical Network Compression [58.25776612812883]
We present a general channel pruning approach that can be applied to various complicated structures.
We derive a unified metric based on Fisher information to evaluate the importance of a single channel and coupled channels.
Our method can be used to prune any structures including those with coupled channels.
arXiv Detail & Related papers (2021-08-02T08:21:44Z) - Carrying out CNN Channel Pruning in a White Box [121.97098626458886]
We conduct channel pruning in a white box.
To model the contribution of each channel to differentiating categories, we develop a class-wise mask for each channel.
It is the first time that CNN interpretability theory is considered to guide channel pruning.
arXiv Detail & Related papers (2021-04-24T04:59:03Z) - ACP: Automatic Channel Pruning via Clustering and Swarm Intelligence
Optimization for CNN [6.662639002101124]
convolutional neural network (CNN) gets deeper and wider in recent years.
Existing magnitude-based pruning methods are efficient, but the performance of the compressed network is unpredictable.
We propose a novel automatic channel pruning method (ACP)
ACP is evaluated against several state-of-the-art CNNs on three different classification datasets.
arXiv Detail & Related papers (2021-01-16T08:56:38Z) - PruneNet: Channel Pruning via Global Importance [22.463154358632472]
We propose a simple-yet-effective method for pruning channels based on a computationally light-weight yet effective data driven optimization step.
With non-uniform pruning across the layers on ResNet-$50$, we are able to match the FLOP reduction of state-of-the-art channel pruning results.
arXiv Detail & Related papers (2020-05-22T17:09:56Z) - Gradual Channel Pruning while Training using Feature Relevance Scores
for Convolutional Neural Networks [6.534515590778012]
Pruning is one of the predominant approaches used for deep network compression.
We present a simple-yet-effective gradual channel pruning while training methodology using a novel data-driven metric.
We demonstrate the effectiveness of the proposed methodology on architectures such as VGG and ResNet.
arXiv Detail & Related papers (2020-02-23T17:56:18Z) - Discrimination-aware Network Pruning for Deep Model Compression [79.44318503847136]
Existing pruning methods either train from scratch with sparsity constraints or minimize the reconstruction error between the feature maps of the pre-trained models and the compressed ones.
We propose a simple-yet-effective method called discrimination-aware channel pruning (DCP) to choose the channels that actually contribute to the discriminative power.
Experiments on both image classification and face recognition demonstrate the effectiveness of our methods.
arXiv Detail & Related papers (2020-01-04T07:07:41Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.