Structured Pruning is All You Need for Pruning CNNs at Initialization
- URL: http://arxiv.org/abs/2203.02549v1
- Date: Fri, 4 Mar 2022 19:54:31 GMT
- Title: Structured Pruning is All You Need for Pruning CNNs at Initialization
- Authors: Yaohui Cai, Weizhe Hua, Hongzheng Chen, G. Edward Suh, Christopher De
Sa, Zhiru Zhang
- Abstract summary: Pruning is a popular technique for reducing the model size and computational cost of convolutional neural networks (CNNs)
We propose PreCropping, a structured hardware-efficient model compression scheme.
Compared to weight pruning, the proposed scheme is regular and dense in both storage and computation without sacrificing accuracy.
- Score: 38.88730369884401
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Pruning is a popular technique for reducing the model size and computational
cost of convolutional neural networks (CNNs). However, a slow retraining or
fine-tuning procedure is often required to recover the accuracy loss caused by
pruning. Recently, a new research direction on weight pruning,
pruning-at-initialization (PAI), is proposed to directly prune CNNs before
training so that fine-tuning or retraining can be avoided. While PAI has shown
promising results in reducing the model size, existing approaches rely on
fine-grained weight pruning which requires unstructured sparse matrix
computation, making it difficult to achieve real speedup in practice unless the
sparsity is very high.
This work is the first to show that fine-grained weight pruning is in fact
not necessary for PAI. Instead, the layerwise compression ratio is the main
critical factor to determine the accuracy of a CNN model pruned at
initialization. Based on this key observation, we propose PreCropping, a
structured hardware-efficient model compression scheme. PreCropping directly
compresses the model at the channel level following the layerwise compression
ratio. Compared to weight pruning, the proposed scheme is regular and dense in
both storage and computation without sacrificing accuracy. In addition, since
PreCropping compresses CNNs at initialization, the computational and memory
costs of CNNs are reduced for both training and inference on commodity
hardware. We empirically demonstrate our approaches on several modern CNN
architectures, including ResNet, ShuffleNet, and MobileNet for both CIFAR-10
and ImageNet.
Related papers
- Iterative Soft Shrinkage Learning for Efficient Image Super-Resolution [91.3781512926942]
Image super-resolution (SR) has witnessed extensive neural network designs from CNN to transformer architectures.
This work investigates the potential of network pruning for super-resolution iteration to take advantage of off-the-shelf network designs and reduce the underlying computational overhead.
We propose a novel Iterative Soft Shrinkage-Percentage (ISS-P) method by optimizing the sparse structure of a randomly network at each and tweaking unimportant weights with a small amount proportional to the magnitude scale on-the-fly.
arXiv Detail & Related papers (2023-03-16T21:06:13Z) - Interpretations Steered Network Pruning via Amortized Inferred Saliency
Maps [85.49020931411825]
Convolutional Neural Networks (CNNs) compression is crucial to deploying these models in edge devices with limited resources.
We propose to address the channel pruning problem from a novel perspective by leveraging the interpretations of a model to steer the pruning process.
We tackle this challenge by introducing a selector model that predicts real-time smooth saliency masks for pruned models.
arXiv Detail & Related papers (2022-09-07T01:12:11Z) - CrAM: A Compression-Aware Minimizer [103.29159003723815]
We propose a new compression-aware minimizer dubbed CrAM that modifies the optimization step in a principled way.
CrAM produces dense models that can be more accurate than the standard SGD/Adam-based baselines, but which are stable under weight pruning.
CrAM can produce sparse models which perform well for transfer learning, and it also works for semi-structured 2:4 pruning patterns supported by GPU hardware.
arXiv Detail & Related papers (2022-07-28T16:13:28Z) - CHEX: CHannel EXploration for CNN Model Compression [47.3520447163165]
We propose a novel Channel Exploration methodology, dubbed as CHEX, to rectify these problems.
CheX repeatedly prunes and regrows the channels throughout the training process, which reduces the risk of pruning important channels prematurely.
Results demonstrate that CHEX can effectively reduce the FLOPs of diverse CNN architectures on a variety of computer vision tasks.
arXiv Detail & Related papers (2022-03-29T17:52:41Z) - ACP: Automatic Channel Pruning via Clustering and Swarm Intelligence
Optimization for CNN [6.662639002101124]
convolutional neural network (CNN) gets deeper and wider in recent years.
Existing magnitude-based pruning methods are efficient, but the performance of the compressed network is unpredictable.
We propose a novel automatic channel pruning method (ACP)
ACP is evaluated against several state-of-the-art CNNs on three different classification datasets.
arXiv Detail & Related papers (2021-01-16T08:56:38Z) - UCP: Uniform Channel Pruning for Deep Convolutional Neural Networks
Compression and Acceleration [24.42067007684169]
We propose a novel uniform channel pruning (UCP) method to prune deep CNN.
The unimportant channels, including convolutional kernels related to them, are pruned directly.
We verify our method on CIFAR-10, CIFAR-100 and ILSVRC-2012 for image classification.
arXiv Detail & Related papers (2020-10-03T01:51:06Z) - Improving Network Slimming with Nonconvex Regularization [8.017631543721684]
Convolutional neural networks (CNNs) have developed to become powerful models for various computer vision tasks.
Most of the state-of-the-art CNNs cannot be deployed directly.
straightforward approach to compressing CNN is proposed.
arXiv Detail & Related papers (2020-10-03T01:04:02Z) - ResRep: Lossless CNN Pruning via Decoupling Remembering and Forgetting [105.97936163854693]
We propose ResRep, which slims down a CNN by reducing the width (number of output channels) of convolutional layers.
Inspired by the neurobiology research about the independence of remembering and forgetting, we propose to re- parameterize a CNN into the remembering parts and forgetting parts.
We equivalently merge the remembering and forgetting parts into the original architecture with narrower layers.
arXiv Detail & Related papers (2020-07-07T07:56:45Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.