Related papers: Layer Pruning via Fusible Residual Convolutional Block for Deep Neural Networks

Layer Pruning via Fusible Residual Convolutional Block for Deep Neural Networks

URL: http://arxiv.org/abs/2011.14356v1
Date: Sun, 29 Nov 2020 12:51:16 GMT
Title: Layer Pruning via Fusible Residual Convolutional Block for Deep Neural Networks
Authors: Pengtao Xu, Jian Cao, Fanhua Shang, Wenyu Sun, Pu Li
Abstract summary: layer pruning has less inference time and runtime memory usage when the same FLOPs and number of parameters are pruned. We propose a simple layer pruning method using residual convolutional block (ResConv) Our pruning method achieves excellent performance of compression and acceleration over the state-thearts on different datasets.
Score: 15.64167076052513
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: In order to deploy deep convolutional neural networks (CNNs) on resource-limited devices, many model pruning methods for filters and weights have been developed, while only a few to layer pruning. However, compared with filter pruning and weight pruning, the compact model obtained by layer pruning has less inference time and run-time memory usage when the same FLOPs and number of parameters are pruned because of less data moving in memory. In this paper, we propose a simple layer pruning method using fusible residual convolutional block (ResConv), which is implemented by inserting shortcut connection with a trainable information control parameter into a single convolutional layer. Using ResConv structures in training can improve network accuracy and train deep plain networks, and adds no additional computation during inference process because ResConv is fused to be an ordinary convolutional layer after training. For layer pruning, we convert convolutional layers of network into ResConv with a layer scaling factor. In the training process, the L1 regularization is adopted to make the scaling factors sparse, so that unimportant layers are automatically identified and then removed, resulting in a model of layer reduction. Our pruning method achieves excellent performance of compression and acceleration over the state-of-the-arts on different datasets, and needs no retraining in the case of low pruning rate. For example, with ResNet-110, we achieve a 65.5%-FLOPs reduction by removing 55.5% of the parameters, with only a small loss of 0.13% in top-1 accuracy on CIFAR-10.

Related papers

Loss-Aware Automatic Selection of Structured Pruning Criteria for Deep Neural Network Acceleration [1.3225694028747144]
This paper presents an efficient Loss-Aware Automatic Selection of Structured Pruning Criteria (LAASP) for slimming and accelerating deep neural networks.<n>The pruning-while-training approach eliminates the first stage and integrates the second and third stages into a single cycle.<n>Experiments on the VGGNet and ResNet models on the CIFAR-10 and ImageNet benchmark datasets demonstrate the effectiveness of the proposed method.
arXiv Detail & Related papers (2025-06-25T06:18:46Z)
LAPP: Layer Adaptive Progressive Pruning for Compressing CNNs from Scratch [14.911305800463285]
We propose a novel framework named Layer Adaptive Progressive Pruning (LAPP) LAPP designs an effective and efficient pruning strategy that introduces a learnable threshold for each layer and FLOPs constraints for network. Our method demonstrates superior performance gains over previous compression methods on various datasets and backbone architectures.
arXiv Detail & Related papers (2023-09-25T14:08:45Z)
Iterative Soft Shrinkage Learning for Efficient Image Super-Resolution [91.3781512926942]
Image super-resolution (SR) has witnessed extensive neural network designs from CNN to transformer architectures. This work investigates the potential of network pruning for super-resolution iteration to take advantage of off-the-shelf network designs and reduce the underlying computational overhead. We propose a novel Iterative Soft Shrinkage-Percentage (ISS-P) method by optimizing the sparse structure of a randomly network at each and tweaking unimportant weights with a small amount proportional to the magnitude scale on-the-fly.
arXiv Detail & Related papers (2023-03-16T21:06:13Z)
Boosting Pruned Networks with Linear Over-parameterization [8.796518772724955]
Structured pruning compresses neural networks by reducing channels (filters) for fast inference and low footprint at run-time. To restore accuracy after pruning, fine-tuning is usually applied to pruned networks. We propose a novel method that first linearly over- parameterizes the compact layers in pruned networks to enlarge the number of fine-tuning parameters.
arXiv Detail & Related papers (2022-04-25T05:30:26Z)
End-to-End Sensitivity-Based Filter Pruning [49.61707925611295]
We present a sensitivity-based filter pruning algorithm (SbF-Pruner) to learn the importance scores of filters of each layer end-to-end. Our method learns the scores from the filter weights, enabling it to account for the correlations between the filters of each layer.
arXiv Detail & Related papers (2022-04-15T10:21:05Z)
Compact representations of convolutional neural networks via weight pruning and quantization [63.417651529192014]
We propose a novel storage format for convolutional neural networks (CNNs) based on source coding and leveraging both weight pruning and quantization. We achieve a reduction of space occupancy up to 0.6% on fully connected layers and 5.44% on the whole network, while performing at least as competitive as the baseline.
arXiv Detail & Related papers (2021-08-28T20:39:54Z)
Basis Scaling and Double Pruning for Efficient Inference in Network-Based Transfer Learning [1.3467579878240454]
We decompose a convolutional layer into two layers: a convolutional layer with the orthonormal basis vectors as the filters, and a "BasisScalingConv" layer which is responsible for rescaling the features. We can achieve pruning ratios up to 74.6% for CIFAR-10 and 98.9% for MNIST in model parameters.
arXiv Detail & Related papers (2021-08-06T00:04:02Z)
Pruning Neural Networks with Interpolative Decompositions [5.377278489623063]
We introduce a principled approach to neural network pruning that casts the problem as a structured low-rank matrix approximation. We demonstrate how to prune a neural network by first building a set of primitives to prune a single fully connected or convolution layer. We achieve an accuracy of 93.62 $pm$ 0.36% using VGG-16 on CIFAR-10, with a 51% FLOPS reduction.
arXiv Detail & Related papers (2021-07-30T20:13:49Z)
Manifold Regularized Dynamic Network Pruning [102.24146031250034]
This paper proposes a new paradigm that dynamically removes redundant filters by embedding the manifold information of all instances into the space of pruned networks. The effectiveness of the proposed method is verified on several benchmarks, which shows better performance in terms of both accuracy and computational cost.
arXiv Detail & Related papers (2021-03-10T03:59:03Z)
UCP: Uniform Channel Pruning for Deep Convolutional Neural Networks Compression and Acceleration [24.42067007684169]
We propose a novel uniform channel pruning (UCP) method to prune deep CNN. The unimportant channels, including convolutional kernels related to them, are pruned directly. We verify our method on CIFAR-10, CIFAR-100 and ILSVRC-2012 for image classification.
arXiv Detail & Related papers (2020-10-03T01:51:06Z)
ResRep: Lossless CNN Pruning via Decoupling Remembering and Forgetting [105.97936163854693]
We propose ResRep, which slims down a CNN by reducing the width (number of output channels) of convolutional layers. Inspired by the neurobiology research about the independence of remembering and forgetting, we propose to re- parameterize a CNN into the remembering parts and forgetting parts. We equivalently merge the remembering and forgetting parts into the original architecture with narrower layers.
arXiv Detail & Related papers (2020-07-07T07:56:45Z)
DHP: Differentiable Meta Pruning via HyperNetworks [158.69345612783198]
This paper introduces a differentiable pruning method via hypernetworks for automatic network pruning. Latent vectors control the output channels of the convolutional layers in the backbone network and act as a handle for the pruning of the layers. Experiments are conducted on various networks for image classification, single image super-resolution, and denoising.
arXiv Detail & Related papers (2020-03-30T17:59:18Z)

This list is automatically generated from the titles and abstracts of the papers in this site.