CR-SFP: Learning Consistent Representation for Soft Filter Pruning
- URL: http://arxiv.org/abs/2312.11555v1
- Date: Sun, 17 Dec 2023 06:41:04 GMT
- Title: CR-SFP: Learning Consistent Representation for Soft Filter Pruning
- Authors: Jingyang Xiang, Zhuangzhi Chen, Jianbiao Mei, Siqi Li, Jun Chen, Yong
Liu
- Abstract summary: Soft filter pruning(SFP) has emerged as an effective pruning technique for allowing pruned filters to update and regrow to the network.
We propose to mitigate this gap by learning consistent representation for soft filter pruning, dubbed as CR-SFP.
CR-SFP is a simple yet effective training framework to improve the accuracy of P-NN without introducing any additional inference cost.
- Score: 18.701621806529438
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Soft filter pruning~(SFP) has emerged as an effective pruning technique for
allowing pruned filters to update and the opportunity for them to regrow to the
network. However, this pruning strategy applies training and pruning in an
alternative manner, which inevitably causes inconsistent representations
between the reconstructed network~(R-NN) at the training and the pruned
network~(P-NN) at the inference, resulting in performance degradation. In this
paper, we propose to mitigate this gap by learning consistent representation
for soft filter pruning, dubbed as CR-SFP. Specifically, for each training
step, CR-SFP optimizes the R-NN and P-NN simultaneously with different
distorted versions of the same training data, while forcing them to be
consistent by minimizing their posterior distribution via the bidirectional
KL-divergence loss. Meanwhile, the R-NN and P-NN share backbone parameters thus
only additional classifier parameters are introduced. After training, we can
export the P-NN for inference. CR-SFP is a simple yet effective training
framework to improve the accuracy of P-NN without introducing any additional
inference cost. It can also be combined with a variety of pruning criteria and
loss functions. Extensive experiments demonstrate our CR-SFP achieves
consistent improvements across various CNN architectures. Notably, on ImageNet,
our CR-SFP reduces more than 41.8\% FLOPs on ResNet18 with 69.2\% top-1
accuracy, improving SFP by 2.1\% under the same training settings. The code
will be publicly available on GitHub.
Related papers
- RL-Pruner: Structured Pruning Using Reinforcement Learning for CNN Compression and Acceleration [0.0]
We propose RL-Pruner, which uses reinforcement learning to learn the optimal pruning distribution.
RL-Pruner can automatically extract dependencies between filters in the input model and perform pruning, without requiring model-specific pruning implementations.
arXiv Detail & Related papers (2024-11-10T13:35:10Z) - Trainability Preserving Neural Structured Pruning [64.65659982877891]
We present trainability preserving pruning (TPP), a regularization-based structured pruning method that can effectively maintain trainability during sparsification.
TPP can compete with the ground-truth dynamical isometry recovery method on linear networks.
It delivers encouraging performance in comparison to many top-performing filter pruning methods.
arXiv Detail & Related papers (2022-07-25T21:15:47Z) - Receptive Field-based Segmentation for Distributed CNN Inference
Acceleration in Collaborative Edge Computing [93.67044879636093]
We study inference acceleration using distributed convolutional neural networks (CNNs) in collaborative edge computing network.
We propose a novel collaborative edge computing using fused-layer parallelization to partition a CNN model into multiple blocks of convolutional layers.
arXiv Detail & Related papers (2022-07-22T18:38:11Z) - Interspace Pruning: Using Adaptive Filter Representations to Improve
Training of Sparse CNNs [69.3939291118954]
Unstructured pruning is well suited to reduce the memory footprint of convolutional neural networks (CNNs)
Standard unstructured pruning (SP) reduces the memory footprint of CNNs by setting filter elements to zero.
We introduce interspace pruning (IP), a general tool to improve existing pruning methods.
arXiv Detail & Related papers (2022-03-15T11:50:45Z) - Sequence Transduction with Graph-based Supervision [96.04967815520193]
We present a new transducer objective function that generalizes the RNN-T loss to accept a graph representation of the labels.
We demonstrate that transducer-based ASR with CTC-like lattice achieves better results compared to standard RNN-T.
arXiv Detail & Related papers (2021-11-01T21:51:42Z) - Manipulating Identical Filter Redundancy for Efficient Pruning on Deep
and Complicated CNN [126.88224745942456]
We propose a novel Centripetal SGD (C-SGD) to make some filters identical, resulting in ideal redundancy patterns.
C-SGD delivers better performance because the redundancy is better organized, compared to the existing methods.
arXiv Detail & Related papers (2021-07-30T06:18:19Z) - Feature Flow Regularization: Improving Structured Sparsity in Deep
Neural Networks [12.541769091896624]
Pruning is a model compression method that removes redundant parameters in deep neural networks (DNNs)
We propose a simple and effective regularization strategy from a new perspective of evolution of features, which we call feature flow regularization (FFR)
Experiments with VGGNets, ResNets on CIFAR-10/100, and Tiny ImageNet datasets demonstrate that FFR can significantly improve both unstructured and structured sparsity.
arXiv Detail & Related papers (2021-06-05T15:00:50Z) - Softer Pruning, Incremental Regularization [12.190136491373359]
The Soft Filter Pruning (SFP) method zeroizes the pruned filters during training while updating them in the next training epoch.
To utilize the trained pruned filters, we proposed a SofteR Filter Pruning (S RFP) method and its variant, Asymptotic SofteR Filter Pruning (AS RFP)
Our methods perform well across various networks, datasets and pruning rates, also transferable to weight pruning.
arXiv Detail & Related papers (2020-10-19T13:37:19Z) - Distillation Guided Residual Learning for Binary Convolutional Neural
Networks [83.6169936912264]
It is challenging to bridge the performance gap between Binary CNN (BCNN) and Floating point CNN (FCNN)
We observe that, this performance gap leads to substantial residuals between intermediate feature maps of BCNN and FCNN.
To minimize the performance gap, we enforce BCNN to produce similar intermediate feature maps with the ones of FCNN.
This training strategy, i.e., optimizing each binary convolutional block with block-wise distillation loss derived from FCNN, leads to a more effective optimization to BCNN.
arXiv Detail & Related papers (2020-07-10T07:55:39Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.