Related papers: Joint Matrix Decomposition for Deep Convolutional Neural Networks Compression

Joint Matrix Decomposition for Deep Convolutional Neural Networks Compression

URL: http://arxiv.org/abs/2107.04386v2
Date: Mon, 12 Jul 2021 03:06:42 GMT
Title: Joint Matrix Decomposition for Deep Convolutional Neural Networks Compression
Authors: Shaowu Chen, Jiahao Zhou, Weize Sun, Lei Huang
Abstract summary: Deep convolutional neural networks (CNNs) with a large number of parameters requires huge computational resources. Decomposition-based methods, therefore, have been utilized to compress CNNs in recent years. We propose to compress CNNs and alleviate performance degradation via joint matrix decomposition.
Score: 5.083621265568845
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Deep convolutional neural networks (CNNs) with a large number of parameters requires huge computational resources, which has limited the application of CNNs on resources constrained appliances. Decomposition-based methods, therefore, have been utilized to compress CNNs in recent years. However, since the compression factor and performance are negatively correlated, the state-of-the-art works either suffer from severe performance degradation or have limited low compression factors. To overcome these problems, unlike previous works compressing layers separately, we propose to compress CNNs and alleviate performance degradation via joint matrix decomposition. The idea is inspired by the fact that there are lots of repeated modules in CNNs, and by projecting weights with the same structures into the same subspace, networks can be further compressed and even accelerated. In particular, three joint matrix decomposition schemes are developed, and the corresponding optimization approaches based on Singular Values Decomposition are proposed. Extensive experiments are conducted across three challenging compact CNNs and 3 benchmark data sets to demonstrate the superior performance of our proposed algorithms. As a result, our methods can compress the size of ResNet-34 by 22x with slighter accuracy degradation compared with several state-of-the-art methods.

Related papers

Reducing Storage of Pretrained Neural Networks by Rate-Constrained Quantization and Entropy Coding [56.066799081747845]
The ever-growing size of neural networks poses serious challenges on resource-constrained devices.<n>We propose a novel post-training compression framework that combines rate-aware quantization with entropy coding.<n>Our method allows for very fast decoding and is compatible with arbitrary quantization grids.
arXiv Detail & Related papers (2025-05-24T15:52:49Z)
Reduced storage direct tensor ring decomposition for convolutional neural networks compression [0.0]
We propose a novel low-rank CNNs compression method based on reduced storage direct tensor ring decomposition (RSDTR) The proposed method offers a higher circular mode permutation flexibility, and it is characterized by large parameter and FLOPS compression rates. Experiments, performed on the CIFAR-10 and ImageNet datasets, clearly demonstrate the efficiency of RSDTR in comparison to other state-of-the-art CNNs compression approaches.
arXiv Detail & Related papers (2024-05-17T14:16:40Z)
Convolutional Neural Network Compression via Dynamic Parameter Rank Pruning [4.7027290803102675]
We propose an efficient training method for CNN compression via dynamic parameter rank pruning. Our experiments show that the proposed method can yield substantial storage savings while maintaining or even enhancing classification performance.
arXiv Detail & Related papers (2024-01-15T23:52:35Z)
Low-rank Tensor Decomposition for Compression of Convolutional Neural Networks Using Funnel Regularization [1.8579693774597708]
We propose a model reduction method to compress the pre-trained networks using low-rank tensor decomposition. A new regularization method, called funnel function, is proposed to suppress the unimportant factors during the compression. For ResNet18 with ImageNet2012, our reduced model can reach more than twi times speed up in terms of GMAC with merely 0.7% Top-1 accuracy drop.
arXiv Detail & Related papers (2021-12-07T13:41:51Z)
Compact representations of convolutional neural networks via weight pruning and quantization [63.417651529192014]
We propose a novel storage format for convolutional neural networks (CNNs) based on source coding and leveraging both weight pruning and quantization. We achieve a reduction of space occupancy up to 0.6% on fully connected layers and 5.44% on the whole network, while performing at least as competitive as the baseline.
arXiv Detail & Related papers (2021-08-28T20:39:54Z)
Compressing Neural Networks: Towards Determining the Optimal Layer-wise Decomposition [62.41259783906452]
We present a novel global compression framework for deep neural networks. It automatically analyzes each layer to identify the optimal per-layer compression ratio. Our results open up new avenues for future research into the global performance-size trade-offs of modern neural networks.
arXiv Detail & Related papers (2021-07-23T20:01:30Z)
Efficient Micro-Structured Weight Unification and Pruning for Neural Network Compression [56.83861738731913]
Deep Neural Network (DNN) models are essential for practical applications, especially for resource limited devices. Previous unstructured or structured weight pruning methods can hardly truly accelerate inference. We propose a generalized weight unification framework at a hardware compatible micro-structured level to achieve high amount of compression and acceleration.
arXiv Detail & Related papers (2021-06-15T17:22:59Z)
Towards Compact CNNs via Collaborative Compression [166.86915086497433]
We propose a Collaborative Compression scheme, which joints channel pruning and tensor decomposition to compress CNN models. We achieve 52.9% FLOPs reduction by removing 48.4% parameters on ResNet-50 with only a Top-1 accuracy drop of 0.56% on ImageNet 2012.
arXiv Detail & Related papers (2021-05-24T12:07:38Z)
Permute, Quantize, and Fine-tune: Efficient Compression of Neural Networks [70.0243910593064]
Key to success of vector quantization is deciding which parameter groups should be compressed together. In this paper we make the observation that the weights of two adjacent layers can be permuted while expressing the same function. We then establish a connection to rate-distortion theory and search for permutations that result in networks that are easier to compress.
arXiv Detail & Related papers (2020-10-29T15:47:26Z)
Compression strategies and space-conscious representations for deep neural networks [0.3670422696827526]
Recent advances in deep learning have made available powerful convolutional neural networks (CNN) with state-of-the-art performance in several real-world applications. CNNs have millions of parameters, thus they are not deployable on resource-limited platforms. In this paper, we investigate the impact of lossy compression of CNNs by weight pruning and quantization.
arXiv Detail & Related papers (2020-07-15T19:41:19Z)
Structured Sparsification with Joint Optimization of Group Convolution and Channel Shuffle [117.95823660228537]
We propose a novel structured sparsification method for efficient network compression. The proposed method automatically induces structured sparsity on the convolutional weights. We also address the problem of inter-group communication with a learnable channel shuffle mechanism.
arXiv Detail & Related papers (2020-02-19T12:03:10Z)

This list is automatically generated from the titles and abstracts of the papers in this site.