Related papers: Model Pruning Based on Quantified Similarity of Feature Maps

Model Pruning Based on Quantified Similarity of Feature Maps

URL: http://arxiv.org/abs/2105.06052v1
Date: Thu, 13 May 2021 02:57:30 GMT
Title: Model Pruning Based on Quantified Similarity of Feature Maps
Authors: Zidu Wang, Xuexin Liu, Long Huang, Yunqing Chen, Yufei Zhang, Zhikang Lin, Rui Wang
Abstract summary: We propose a novel theory to find redundant information in three dimensional tensors. We use this theory to prune convolutional neural networks to enhance the inference speed.
Score: 5.271060872578571
License: http://creativecommons.org/licenses/by/4.0/
Abstract: A high-accuracy CNN is often accompanied by huge parameters, which are usually stored in the high-dimensional tensors. However, there are few methods can figure out the redundant information of the parameters stored in the high-dimensional tensors, which leads to the lack of theoretical guidance for the compression of CNNs. In this paper, we propose a novel theory to find redundant information in three dimensional tensors, namely Quantified Similarity of Feature Maps (QSFM), and use this theory to prune convolutional neural networks to enhance the inference speed. Our method belongs to filter pruning, which can be implemented without using any special libraries. We perform our method not only on common convolution layers but also on special convolution layers, such as depthwise separable convolution layers. The experiments prove that QSFM can find the redundant information in the neural network effectively. Without any fine-tuning operation, QSFM can compress ResNet-56 on CIFAR-10 significantly (48.27% FLOPs and 57.90% parameters reduction) with only a loss of 0.54% in the top-1 accuracy. QSFM also prunes ResNet-56, VGG-16 and MobileNetV2 with fine-tuning operation, which also shows excellent results.

Related papers

An Effective Information Theoretic Framework for Channel Pruning [4.014774237233169]
We present a novel channel pruning approach via information theory and interpretability of neural networks. Our method improves the accuracy by 0.21% when reducing 45.5% FLOPs and removing 40.3% parameters for ResNet-56 on CIFAR-10.
arXiv Detail & Related papers (2024-08-14T17:19:56Z)
Just How Flexible are Neural Networks in Practice? [89.80474583606242]
It is widely believed that a neural network can fit a training set containing at least as many samples as it has parameters. In practice, however, we only find solutions via our training procedure, including the gradient and regularizers, limiting flexibility.
arXiv Detail & Related papers (2024-06-17T12:24:45Z)
Towards Generalized Entropic Sparsification for Convolutional Neural Networks [0.0]
Convolutional neural networks (CNNs) are reported to be overparametrized. Here, we introduce a layer-by-layer data-driven pruning method based on the mathematical idea aiming at a computationally-scalable entropic relaxation of the pruning problem. The sparse subnetwork is found from the pre-trained (full) CNN using the network entropy minimization as a sparsity constraint.
arXiv Detail & Related papers (2024-04-06T21:33:39Z)
Filter Pruning For CNN With Enhanced Linear Representation Redundancy [3.853146967741941]
We present a data-driven loss function term calculated from the correlation matrix of different feature maps in the same layer, named CCM-loss. CCM-loss provides us with another universal transcendental mathematical tool besides L*-norm regularization. In our new strategy, we mainly focus on the consistency and integrality of the information flow in the network.
arXiv Detail & Related papers (2023-10-10T06:27:30Z)
Instant Complexity Reduction in CNNs using Locality-Sensitive Hashing [50.79602839359522]
We propose HASTE (Hashing for Tractable Efficiency), a parameter-free and data-free module that acts as a plug-and-play replacement for any regular convolution module. We are able to drastically compress latent feature maps without sacrificing much accuracy by using locality-sensitive hashing (LSH) In particular, we are able to instantly drop 46.72% of FLOPs while only losing 1.25% accuracy by just swapping the convolution modules in a ResNet34 on CIFAR-10 for our HASTE module.
arXiv Detail & Related papers (2023-09-29T13:09:40Z)
EdgeNeXt: Efficiently Amalgamated CNN-Transformer Architecture for Mobile Vision Applications [68.35683849098105]
We introduce split depth-wise transpose attention (SDTA) encoder that splits input tensors into multiple channel groups. Our EdgeNeXt model with 1.3M parameters achieves 71.2% top-1 accuracy on ImageNet-1K. Our EdgeNeXt model with 5.6M parameters achieves 79.4% top-1 accuracy on ImageNet-1K.
arXiv Detail & Related papers (2022-06-21T17:59:56Z)
Toward Compact Parameter Representations for Architecture-Agnostic Neural Network Compression [26.501979992447605]
This paper investigates compression from the perspective of compactly representing and storing trained parameters. We leverage additive quantization, an extreme lossy compression method invented for image descriptors, to compactly represent the parameters. We conduct experiments on MobileNet-v2, VGG-11, ResNet-50, Feature Pyramid Networks, and pruned DNNs trained for classification, detection, and segmentation tasks.
arXiv Detail & Related papers (2021-11-19T17:03:11Z)
Compact representations of convolutional neural networks via weight pruning and quantization [63.417651529192014]
We propose a novel storage format for convolutional neural networks (CNNs) based on source coding and leveraging both weight pruning and quantization. We achieve a reduction of space occupancy up to 0.6% on fully connected layers and 5.44% on the whole network, while performing at least as competitive as the baseline.
arXiv Detail & Related papers (2021-08-28T20:39:54Z)
Group Fisher Pruning for Practical Network Compression [58.25776612812883]
We present a general channel pruning approach that can be applied to various complicated structures. We derive a unified metric based on Fisher information to evaluate the importance of a single channel and coupled channels. Our method can be used to prune any structures including those with coupled channels.
arXiv Detail & Related papers (2021-08-02T08:21:44Z)
Layer Pruning via Fusible Residual Convolutional Block for Deep Neural Networks [15.64167076052513]
layer pruning has less inference time and runtime memory usage when the same FLOPs and number of parameters are pruned. We propose a simple layer pruning method using residual convolutional block (ResConv) Our pruning method achieves excellent performance of compression and acceleration over the state-thearts on different datasets.
arXiv Detail & Related papers (2020-11-29T12:51:16Z)
SCOP: Scientific Control for Reliable Neural Network Pruning [127.20073865874636]
This paper proposes a reliable neural network pruning algorithm by setting up a scientific control. Redundant filters can be discovered in the adversarial process of different features. Our method can reduce 57.8% parameters and 60.2% FLOPs of ResNet-101 with only 0.01% top-1 accuracy loss on ImageNet.
arXiv Detail & Related papers (2020-10-21T03:02:01Z)
Widening and Squeezing: Towards Accurate and Efficient QNNs [125.172220129257]
Quantization neural networks (QNNs) are very attractive to the industry because their extremely cheap calculation and storage overhead, but their performance is still worse than that of networks with full-precision parameters. Most of existing methods aim to enhance performance of QNNs especially binary neural networks by exploiting more effective training techniques. We address this problem by projecting features in original full-precision networks to high-dimensional quantization features.
arXiv Detail & Related papers (2020-02-03T04:11:13Z)

This list is automatically generated from the titles and abstracts of the papers in this site.