ResRep: Lossless CNN Pruning via Decoupling Remembering and Forgetting
- URL: http://arxiv.org/abs/2007.03260v4
- Date: Sat, 14 Aug 2021 19:36:54 GMT
- Title: ResRep: Lossless CNN Pruning via Decoupling Remembering and Forgetting
- Authors: Xiaohan Ding, Tianxiang Hao, Jianchao Tan, Ji Liu, Jungong Han, Yuchen
Guo, Guiguang Ding
- Abstract summary: We propose ResRep, which slims down a CNN by reducing the width (number of output channels) of convolutional layers.
Inspired by the neurobiology research about the independence of remembering and forgetting, we propose to re- parameterize a CNN into the remembering parts and forgetting parts.
We equivalently merge the remembering and forgetting parts into the original architecture with narrower layers.
- Score: 105.97936163854693
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We propose ResRep, a novel method for lossless channel pruning (a.k.a. filter
pruning), which slims down a CNN by reducing the width (number of output
channels) of convolutional layers. Inspired by the neurobiology research about
the independence of remembering and forgetting, we propose to re-parameterize a
CNN into the remembering parts and forgetting parts, where the former learn to
maintain the performance and the latter learn to prune. Via training with
regular SGD on the former but a novel update rule with penalty gradients on the
latter, we realize structured sparsity. Then we equivalently merge the
remembering and forgetting parts into the original architecture with narrower
layers. In this sense, ResRep can be viewed as a successful application of
Structural Re-parameterization. Such a methodology distinguishes ResRep from
the traditional learning-based pruning paradigm that applies a penalty on
parameters to produce sparsity, which may suppress the parameters essential for
the remembering. ResRep slims down a standard ResNet-50 with 76.15% accuracy on
ImageNet to a narrower one with only 45% FLOPs and no accuracy drop, which is
the first to achieve lossless pruning with such a high compression ratio. The
code and models are at https://github.com/DingXiaoH/ResRep.
Related papers
- Instant Complexity Reduction in CNNs using Locality-Sensitive Hashing [50.79602839359522]
We propose HASTE (Hashing for Tractable Efficiency), a parameter-free and data-free module that acts as a plug-and-play replacement for any regular convolution module.
We are able to drastically compress latent feature maps without sacrificing much accuracy by using locality-sensitive hashing (LSH)
In particular, we are able to instantly drop 46.72% of FLOPs while only losing 1.25% accuracy by just swapping the convolution modules in a ResNet34 on CIFAR-10 for our HASTE module.
arXiv Detail & Related papers (2023-09-29T13:09:40Z) - Interpretations Steered Network Pruning via Amortized Inferred Saliency
Maps [85.49020931411825]
Convolutional Neural Networks (CNNs) compression is crucial to deploying these models in edge devices with limited resources.
We propose to address the channel pruning problem from a novel perspective by leveraging the interpretations of a model to steer the pruning process.
We tackle this challenge by introducing a selector model that predicts real-time smooth saliency masks for pruned models.
arXiv Detail & Related papers (2022-09-07T01:12:11Z) - CHEX: CHannel EXploration for CNN Model Compression [47.3520447163165]
We propose a novel Channel Exploration methodology, dubbed as CHEX, to rectify these problems.
CheX repeatedly prunes and regrows the channels throughout the training process, which reduces the risk of pruning important channels prematurely.
Results demonstrate that CHEX can effectively reduce the FLOPs of diverse CNN architectures on a variety of computer vision tasks.
arXiv Detail & Related papers (2022-03-29T17:52:41Z) - Structured Pruning is All You Need for Pruning CNNs at Initialization [38.88730369884401]
Pruning is a popular technique for reducing the model size and computational cost of convolutional neural networks (CNNs)
We propose PreCropping, a structured hardware-efficient model compression scheme.
Compared to weight pruning, the proposed scheme is regular and dense in both storage and computation without sacrificing accuracy.
arXiv Detail & Related papers (2022-03-04T19:54:31Z) - Layer Pruning via Fusible Residual Convolutional Block for Deep Neural
Networks [15.64167076052513]
layer pruning has less inference time and runtime memory usage when the same FLOPs and number of parameters are pruned.
We propose a simple layer pruning method using residual convolutional block (ResConv)
Our pruning method achieves excellent performance of compression and acceleration over the state-thearts on different datasets.
arXiv Detail & Related papers (2020-11-29T12:51:16Z) - Tensor Reordering for CNN Compression [7.228285747845778]
We show how parameter redundancy in Convolutional Neural Network (CNN) filters can be effectively reduced by pruning in spectral domain.
Our approach is applied to pretrained CNNs and we show that minor additional fine-tuning allows our method to recover the original model performance.
arXiv Detail & Related papers (2020-10-22T23:45:34Z) - UCP: Uniform Channel Pruning for Deep Convolutional Neural Networks
Compression and Acceleration [24.42067007684169]
We propose a novel uniform channel pruning (UCP) method to prune deep CNN.
The unimportant channels, including convolutional kernels related to them, are pruned directly.
We verify our method on CIFAR-10, CIFAR-100 and ILSVRC-2012 for image classification.
arXiv Detail & Related papers (2020-10-03T01:51:06Z) - Efficient Integer-Arithmetic-Only Convolutional Neural Networks [87.01739569518513]
We replace conventional ReLU with Bounded ReLU and find that the decline is due to activation quantization.
Our integer networks achieve equivalent performance as the corresponding FPN networks, but have only 1/4 memory cost and run 2x faster on modern GPU.
arXiv Detail & Related papers (2020-06-21T08:23:03Z) - Filter Sketch for Network Pruning [184.41079868885265]
We propose a novel network pruning approach by information preserving of pre-trained network weights (filters)
Our approach, referred to as FilterSketch, encodes the second-order information of pre-trained weights.
Experiments on CIFAR-10 show that FilterSketch reduces 63.3% of FLOPs and prunes 59.9% of network parameters with negligible accuracy cost.
arXiv Detail & Related papers (2020-01-23T13:57:08Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.