GDP: Stabilized Neural Network Pruning via Gates with Differentiable
Polarization
- URL: http://arxiv.org/abs/2109.02220v2
- Date: Wed, 8 Sep 2021 07:51:17 GMT
- Title: GDP: Stabilized Neural Network Pruning via Gates with Differentiable
Polarization
- Authors: Yi Guo, Huan Yuan, Jianchao Tan, Zhangyang Wang, Sen Yang, Ji Liu
- Abstract summary: Gate-based or importance-based pruning methods aim to remove channels whose importance is smallest.
GDP can be plugged before convolutional layers without bells and whistles, to control the on-and-off of each channel.
Experiments conducted over CIFAR-10 and ImageNet datasets show that the proposed GDP achieves the state-of-the-art performance.
- Score: 84.57695474130273
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Model compression techniques are recently gaining explosive attention for
obtaining efficient AI models for various real-time applications. Channel
pruning is one important compression strategy and is widely used in slimming
various DNNs. Previous gate-based or importance-based pruning methods aim to
remove channels whose importance is smallest. However, it remains unclear what
criteria the channel importance should be measured on, leading to various
channel selection heuristics. Some other sampling-based pruning methods deploy
sampling strategies to train sub-nets, which often causes the training
instability and the compressed model's degraded performance. In view of the
research gaps, we present a new module named Gates with Differentiable
Polarization (GDP), inspired by principled optimization ideas. GDP can be
plugged before convolutional layers without bells and whistles, to control the
on-and-off of each channel or whole layer block. During the training process,
the polarization effect will drive a subset of gates to smoothly decrease to
exact zero, while other gates gradually stay away from zero by a large margin.
When training terminates, those zero-gated channels can be painlessly removed,
while other non-zero gates can be absorbed into the succeeding convolution
kernel, causing completely no interruption to training nor damage to the
trained model. Experiments conducted over CIFAR-10 and ImageNet datasets show
that the proposed GDP algorithm achieves the state-of-the-art performance on
various benchmark DNNs at a broad range of pruning ratios. We also apply GDP to
DeepLabV3Plus-ResNet50 on the challenging Pascal VOC segmentation task, whose
test performance sees no drop (even slightly improved) with over 60% FLOPs
saving.
Related papers
- Instant Complexity Reduction in CNNs using Locality-Sensitive Hashing [50.79602839359522]
We propose HASTE (Hashing for Tractable Efficiency), a parameter-free and data-free module that acts as a plug-and-play replacement for any regular convolution module.
We are able to drastically compress latent feature maps without sacrificing much accuracy by using locality-sensitive hashing (LSH)
In particular, we are able to instantly drop 46.72% of FLOPs while only losing 1.25% accuracy by just swapping the convolution modules in a ResNet34 on CIFAR-10 for our HASTE module.
arXiv Detail & Related papers (2023-09-29T13:09:40Z) - Binary Early-Exit Network for Adaptive Inference on Low-Resource Devices [3.591566487849146]
Binary neural networks (BNNs) tackle the issue with extreme compression and speed-up gains compared to real-valued models.
We propose a simple but effective method to accelerate inference through unifying BNNs with an early-exiting strategy.
Our approach allows simple instances to exit early based on a decision threshold and utilizes output layers added to different intermediate layers to avoid executing the entire binary model.
arXiv Detail & Related papers (2022-06-17T22:11:11Z) - CHEX: CHannel EXploration for CNN Model Compression [47.3520447163165]
We propose a novel Channel Exploration methodology, dubbed as CHEX, to rectify these problems.
CheX repeatedly prunes and regrows the channels throughout the training process, which reduces the risk of pruning important channels prematurely.
Results demonstrate that CHEX can effectively reduce the FLOPs of diverse CNN architectures on a variety of computer vision tasks.
arXiv Detail & Related papers (2022-03-29T17:52:41Z) - CATRO: Channel Pruning via Class-Aware Trace Ratio Optimization [61.71504948770445]
We propose a novel channel pruning method via Class-Aware Trace Ratio Optimization (CATRO) to reduce the computational burden and accelerate the model inference.
We show that CATRO achieves higher accuracy with similar cost or lower cost with similar accuracy than other state-of-the-art channel pruning algorithms.
Because of its class-aware property, CATRO is suitable to prune efficient networks adaptively for various classification subtasks, enhancing handy deployment and usage of deep networks in real-world applications.
arXiv Detail & Related papers (2021-10-21T06:26:31Z) - Only Train Once: A One-Shot Neural Network Training And Pruning
Framework [31.959625731943675]
Structured pruning is a commonly used technique in deploying deep neural networks (DNNs) onto resource-constrained devices.
We propose a framework that DNNs are slimmer with competitive performances and significant FLOPs reductions by Only-Train-Once (OTO)
OTO contains two keys: (i) we partition the parameters of DNNs into zero-invariant groups, enabling us to prune zero groups without affecting the output; and (ii) to promote zero groups, we then formulate a structured-Image optimization algorithm, Half-Space Projected (HSPG)
To demonstrate the effectiveness of OTO, we train and
arXiv Detail & Related papers (2021-07-15T17:15:20Z) - BWCP: Probabilistic Learning-to-Prune Channels for ConvNets via Batch
Whitening [63.081808698068365]
This work presents a probabilistic channel pruning method to accelerate Convolutional Neural Networks (CNNs)
Previous pruning methods often zero out unimportant channels in training in a deterministic manner, which reduces CNN's learning capacity and results in suboptimal performance.
We develop a probability-based pruning algorithm, called batch whitening channel pruning (BWCP), which canally discard unimportant channels by modeling the probability of a channel being activated.
arXiv Detail & Related papers (2021-05-13T17:00:05Z) - Rethinking Network Pruning -- under the Pre-train and Fine-tune Paradigm [5.621336109915588]
We show for the first time that sparse pruning compresses a BERT model significantly more than reducing its number of channels and layers.
Our method outperforms the leading competitors with a 20-times weight/FLOPs compression and neglectable loss in prediction accuracy.
arXiv Detail & Related papers (2021-04-18T02:20:37Z) - DAIS: Automatic Channel Pruning via Differentiable Annealing Indicator
Search [55.164053971213576]
convolutional neural network has achieved great success in fulfilling computer vision tasks despite large computation overhead.
Structured (channel) pruning is usually applied to reduce the model redundancy while preserving the network structure.
Existing structured pruning methods require hand-crafted rules which may lead to tremendous pruning space.
arXiv Detail & Related papers (2020-11-04T07:43:01Z) - Discrimination-aware Network Pruning for Deep Model Compression [79.44318503847136]
Existing pruning methods either train from scratch with sparsity constraints or minimize the reconstruction error between the feature maps of the pre-trained models and the compressed ones.
We propose a simple-yet-effective method called discrimination-aware channel pruning (DCP) to choose the channels that actually contribute to the discriminative power.
Experiments on both image classification and face recognition demonstrate the effectiveness of our methods.
arXiv Detail & Related papers (2020-01-04T07:07:41Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.