Related papers: Convolutional Neural Network Compression Based on Low-Rank Decomposition

Convolutional Neural Network Compression Based on Low-Rank Decomposition

URL: http://arxiv.org/abs/2408.16289v1
Date: Thu, 29 Aug 2024 06:40:34 GMT
Title: Convolutional Neural Network Compression Based on Low-Rank Decomposition
Authors: Yaping He, Linhao Jiang, Di Wu,
Abstract summary: This paper proposes a model compression method that integrates Variational Bayesian Matrix Factorization. VBMF is employed to estimate the rank of the weight tensor at each layer. Experimental results show that for both high and low compression ratios, our compression model exhibits advanced performance.
Score: 3.3295360710329738
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Deep neural networks typically impose significant computational loads and memory consumption. Moreover, the large parameters pose constraints on deploying the model on edge devices such as embedded systems. Tensor decomposition offers a clear advantage in compressing large-scale weight tensors. Nevertheless, direct utilization of low-rank decomposition typically leads to significant accuracy loss. This paper proposes a model compression method that integrates Variational Bayesian Matrix Factorization (VBMF) with orthogonal regularization. Initially, the model undergoes over-parameterization and training, with orthogonal regularization applied to enhance its likelihood of achieving the accuracy of the original model. Secondly, VBMF is employed to estimate the rank of the weight tensor at each layer. Our framework is sufficiently general to apply to other convolutional neural networks and easily adaptable to incorporate other tensor decomposition methods. Experimental results show that for both high and low compression ratios, our compression model exhibits advanced performance.

Related papers

tCURLoRA: Tensor CUR Decomposition Based Low-Rank Parameter Adaptation and Its Application in Medical Image Segmentation [1.3281936946796913]
Transfer learning, by leveraging knowledge from pre-trained models, has significantly enhanced the performance of target tasks. As deep neural networks scale up, full fine-tuning introduces substantial computational and storage challenges. We propose tCURLoRA, a novel fine-tuning method based on tensor CUR decomposition.
arXiv Detail & Related papers (2025-01-04T08:25:32Z)
Efficient Compression of Overparameterized Deep Models through Low-Dimensional Learning Dynamics [10.673414267895355]
We present a novel approach for compressing over parameterized models. Our algorithm improves the training efficiency by more than 2x, without compromising generalization.
arXiv Detail & Related papers (2023-11-08T23:57:03Z)
Frequency-Aware Re-Parameterization for Over-Fitting Based Image Compression [12.725194101094711]
Over-fitting-based image compression requires weights compactness for compression and fast convergence for practical use. This paper presents a simple re- parameterization method to train CNNs with reduced weights storage and accelerated convergence. The proposed method is verified with extensive experiments of over-fitting-based image restoration on various datasets, achieving up to -46.12% BD-rate on top of HEIF with only 200 iterations.
arXiv Detail & Related papers (2023-10-12T06:32:12Z)
Quantization Aware Factorization for Deep Neural Network Compression [20.04951101799232]
decomposition of convolutional and fully-connected layers is an effective way to reduce parameters and FLOP in neural networks. A conventional post-training quantization approach applied to networks with weights yields a drop in accuracy. This motivated us to develop an algorithm that finds decomposed approximation directly with quantized factors.
arXiv Detail & Related papers (2023-08-08T21:38:02Z)
Low-rank Tensor Decomposition for Compression of Convolutional Neural Networks Using Funnel Regularization [1.8579693774597708]
We propose a model reduction method to compress the pre-trained networks using low-rank tensor decomposition. A new regularization method, called funnel function, is proposed to suppress the unimportant factors during the compression. For ResNet18 with ImageNet2012, our reduced model can reach more than twi times speed up in terms of GMAC with merely 0.7% Top-1 accuracy drop.
arXiv Detail & Related papers (2021-12-07T13:41:51Z)
Compact representations of convolutional neural networks via weight pruning and quantization [63.417651529192014]
We propose a novel storage format for convolutional neural networks (CNNs) based on source coding and leveraging both weight pruning and quantization. We achieve a reduction of space occupancy up to 0.6% on fully connected layers and 5.44% on the whole network, while performing at least as competitive as the baseline.
arXiv Detail & Related papers (2021-08-28T20:39:54Z)
Efficient Micro-Structured Weight Unification and Pruning for Neural Network Compression [56.83861738731913]
Deep Neural Network (DNN) models are essential for practical applications, especially for resource limited devices. Previous unstructured or structured weight pruning methods can hardly truly accelerate inference. We propose a generalized weight unification framework at a hardware compatible micro-structured level to achieve high amount of compression and acceleration.
arXiv Detail & Related papers (2021-06-15T17:22:59Z)
Substitutional Neural Image Compression [48.20906717052056]
Substitutional Neural Image Compression (SNIC) is a general approach for enhancing any neural image compression model. It boosts compression performance toward a flexible distortion metric and enables bit-rate control using a single model instance.
arXiv Detail & Related papers (2021-05-16T20:53:31Z)
Stable Low-rank Tensor Decomposition for Compression of Convolutional Neural Network [19.717842489217684]
This paper is the first study on degeneracy in the tensor decomposition of convolutional kernels. We present a novel method, which can stabilize the low-rank approximation of convolutional kernels and ensure efficient compression. We evaluate our approach on popular CNN architectures for image classification and show that our method results in much lower accuracy degradation and provides consistent performance.
arXiv Detail & Related papers (2020-08-12T17:10:12Z)
Learning Context-Based Non-local Entropy Modeling for Image Compression [140.64888994506313]
In this paper, we propose a non-local operation for context modeling by employing the global similarity within the context. The entropy model is further adopted as the rate loss in a joint rate-distortion optimization. Considering that the width of the transforms is essential in training low distortion models, we finally produce a U-Net block in the transforms to increase the width with manageable memory consumption and time complexity.
arXiv Detail & Related papers (2020-05-10T13:28:18Z)
Revisiting Initialization of Neural Networks [72.24615341588846]
We propose a rigorous estimation of the global curvature of weights across layers by approximating and controlling the norm of their Hessian matrix. Our experiments on Word2Vec and the MNIST/CIFAR image classification tasks confirm that tracking the Hessian norm is a useful diagnostic tool.
arXiv Detail & Related papers (2020-04-20T18:12:56Z)
Understanding Generalization in Deep Learning via Tensor Methods [53.808840694241]
We advance the understanding of the relations between the network's architecture and its generalizability from the compression perspective. We propose a series of intuitive, data-dependent and easily-measurable properties that tightly characterize the compressibility and generalizability of neural networks.
arXiv Detail & Related papers (2020-01-14T22:26:57Z)

This list is automatically generated from the titles and abstracts of the papers in this site.