Toward Compact Parameter Representations for Architecture-Agnostic
Neural Network Compression
- URL: http://arxiv.org/abs/2111.10320v1
- Date: Fri, 19 Nov 2021 17:03:11 GMT
- Title: Toward Compact Parameter Representations for Architecture-Agnostic
Neural Network Compression
- Authors: Yuezhou Sun, Wenlong Zhao, Lijun Zhang, Xiao Liu, Hui Guan, Matei
Zaharia
- Abstract summary: This paper investigates compression from the perspective of compactly representing and storing trained parameters.
We leverage additive quantization, an extreme lossy compression method invented for image descriptors, to compactly represent the parameters.
We conduct experiments on MobileNet-v2, VGG-11, ResNet-50, Feature Pyramid Networks, and pruned DNNs trained for classification, detection, and segmentation tasks.
- Score: 26.501979992447605
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: This paper investigates deep neural network (DNN) compression from the
perspective of compactly representing and storing trained parameters. We
explore the previously overlooked opportunity of cross-layer
architecture-agnostic representation sharing for DNN parameters. To do this, we
decouple feedforward parameters from DNN architectures and leverage additive
quantization, an extreme lossy compression method invented for image
descriptors, to compactly represent the parameters. The representations are
then finetuned on task objectives to improve task accuracy. We conduct
extensive experiments on MobileNet-v2, VGG-11, ResNet-50, Feature Pyramid
Networks, and pruned DNNs trained for classification, detection, and
segmentation tasks. The conceptually simple scheme consistently outperforms
iterative unstructured pruning. Applied to ResNet-50 with 76.1% top-1 accuracy
on the ILSVRC12 classification challenge, it achieves a $7.2\times$ compression
ratio with no accuracy loss and a $15.3\times$ compression ratio at 74.79%
accuracy. Further analyses suggest that representation sharing can frequently
happen across network layers and that learning shared representations for an
entire DNN can achieve better accuracy at the same compression ratio than
compressing the model as multiple separate parts. We release PyTorch code to
facilitate DNN deployment on resource-constrained devices and spur future
research on efficient representations and storage of DNN parameters.
Related papers
- Convolutional Neural Network Compression via Dynamic Parameter Rank
Pruning [4.7027290803102675]
We propose an efficient training method for CNN compression via dynamic parameter rank pruning.
Our experiments show that the proposed method can yield substantial storage savings while maintaining or even enhancing classification performance.
arXiv Detail & Related papers (2024-01-15T23:52:35Z) - Attention-based Feature Compression for CNN Inference Offloading in Edge
Computing [93.67044879636093]
This paper studies the computational offloading of CNN inference in device-edge co-inference systems.
We propose a novel autoencoder-based CNN architecture (AECNN) for effective feature extraction at end-device.
Experiments show that AECNN can compress the intermediate data by more than 256x with only about 4% accuracy loss.
arXiv Detail & Related papers (2022-11-24T18:10:01Z) - A Low-Complexity Approach to Rate-Distortion Optimized Variable Bit-Rate
Compression for Split DNN Computing [5.3221129103999125]
Split computing has emerged as a recent paradigm for implementation of DNN-based AI workloads.
We present an approach that addresses the challenge of optimizing the rate-accuracy-complexity trade-off.
Our approach is remarkably lightweight, both during training and inference, highly effective and achieves excellent rate-distortion performance.
arXiv Detail & Related papers (2022-08-24T15:02:11Z) - EdgeNeXt: Efficiently Amalgamated CNN-Transformer Architecture for
Mobile Vision Applications [68.35683849098105]
We introduce split depth-wise transpose attention (SDTA) encoder that splits input tensors into multiple channel groups.
Our EdgeNeXt model with 1.3M parameters achieves 71.2% top-1 accuracy on ImageNet-1K.
Our EdgeNeXt model with 5.6M parameters achieves 79.4% top-1 accuracy on ImageNet-1K.
arXiv Detail & Related papers (2022-06-21T17:59:56Z) - DS-Net++: Dynamic Weight Slicing for Efficient Inference in CNNs and
Transformers [105.74546828182834]
We show a hardware-efficient dynamic inference regime, named dynamic weight slicing, which adaptively slice a part of network parameters for inputs with diverse difficulty levels.
We present dynamic slimmable network (DS-Net) and dynamic slice-able network (DS-Net++) by input-dependently adjusting filter numbers of CNNs and multiple dimensions in both CNNs and transformers.
arXiv Detail & Related papers (2021-09-21T09:57:21Z) - Compact representations of convolutional neural networks via weight
pruning and quantization [63.417651529192014]
We propose a novel storage format for convolutional neural networks (CNNs) based on source coding and leveraging both weight pruning and quantization.
We achieve a reduction of space occupancy up to 0.6% on fully connected layers and 5.44% on the whole network, while performing at least as competitive as the baseline.
arXiv Detail & Related papers (2021-08-28T20:39:54Z) - Model Pruning Based on Quantified Similarity of Feature Maps [5.271060872578571]
We propose a novel theory to find redundant information in three dimensional tensors.
We use this theory to prune convolutional neural networks to enhance the inference speed.
arXiv Detail & Related papers (2021-05-13T02:57:30Z) - Tensor Reordering for CNN Compression [7.228285747845778]
We show how parameter redundancy in Convolutional Neural Network (CNN) filters can be effectively reduced by pruning in spectral domain.
Our approach is applied to pretrained CNNs and we show that minor additional fine-tuning allows our method to recover the original model performance.
arXiv Detail & Related papers (2020-10-22T23:45:34Z) - Widening and Squeezing: Towards Accurate and Efficient QNNs [125.172220129257]
Quantization neural networks (QNNs) are very attractive to the industry because their extremely cheap calculation and storage overhead, but their performance is still worse than that of networks with full-precision parameters.
Most of existing methods aim to enhance performance of QNNs especially binary neural networks by exploiting more effective training techniques.
We address this problem by projecting features in original full-precision networks to high-dimensional quantization features.
arXiv Detail & Related papers (2020-02-03T04:11:13Z) - PatDNN: Achieving Real-Time DNN Execution on Mobile Devices with
Pattern-based Weight Pruning [57.20262984116752]
We introduce a new dimension, fine-grained pruning patterns inside the coarse-grained structures, revealing a previously unknown point in design space.
With the higher accuracy enabled by fine-grained pruning patterns, the unique insight is to use the compiler to re-gain and guarantee high hardware efficiency.
arXiv Detail & Related papers (2020-01-01T04:52:07Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.