CNN Acceleration by Low-rank Approximation with Quantized Factors
- URL: http://arxiv.org/abs/2006.08878v1
- Date: Tue, 16 Jun 2020 02:28:05 GMT
- Title: CNN Acceleration by Low-rank Approximation with Quantized Factors
- Authors: Nikolay Kozyrskiy, Anh-Huy Phan
- Abstract summary: The modern convolutional neural networks although achieve great results in solving complex computer vision tasks still cannot be effectively used in mobile and embedded devices.
In order to solve this problem the novel approach combining two known methods, low-rank tensor approximation in Tucker format and quantization of weights and feature maps (activations) is proposed.
The efficiency of our method is demonstrated for ResNet18 and ResNet34 on CIFAR-10, CIFAR-100 and Imagenet classification tasks.
- Score: 9.654865591431593
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The modern convolutional neural networks although achieve great results in
solving complex computer vision tasks still cannot be effectively used in
mobile and embedded devices due to the strict requirements for computational
complexity, memory and power consumption. The CNNs have to be compressed and
accelerated before deployment. In order to solve this problem the novel
approach combining two known methods, low-rank tensor approximation in Tucker
format and quantization of weights and feature maps (activations), is proposed.
The greedy one-step and multi-step algorithms for the task of multilinear rank
selection are proposed. The approach for quality restoration after applying
Tucker decomposition and quantization is developed. The efficiency of our
method is demonstrated for ResNet18 and ResNet34 on CIFAR-10, CIFAR-100 and
Imagenet classification tasks. As a result of comparative analysis performed
for other methods for compression and acceleration our approach showed its
promising features.
Related papers
- Task-Oriented Real-time Visual Inference for IoVT Systems: A Co-design Framework of Neural Networks and Edge Deployment [61.20689382879937]
Task-oriented edge computing addresses this by shifting data analysis to the edge.
Existing methods struggle to balance high model performance with low resource consumption.
We propose a novel co-design framework to optimize neural network architecture.
arXiv Detail & Related papers (2024-10-29T19:02:54Z) - Dynamic Semantic Compression for CNN Inference in Multi-access Edge
Computing: A Graph Reinforcement Learning-based Autoencoder [82.8833476520429]
We propose a novel semantic compression method, autoencoder-based CNN architecture (AECNN) for effective semantic extraction and compression in partial offloading.
In the semantic encoder, we introduce a feature compression module based on the channel attention mechanism in CNNs, to compress intermediate data by selecting the most informative features.
In the semantic decoder, we design a lightweight decoder to reconstruct the intermediate data through learning from the received compressed data to improve accuracy.
arXiv Detail & Related papers (2024-01-19T15:19:47Z) - Convolutional Neural Network Compression via Dynamic Parameter Rank
Pruning [4.7027290803102675]
We propose an efficient training method for CNN compression via dynamic parameter rank pruning.
Our experiments show that the proposed method can yield substantial storage savings while maintaining or even enhancing classification performance.
arXiv Detail & Related papers (2024-01-15T23:52:35Z) - Iterative Soft Shrinkage Learning for Efficient Image Super-Resolution [91.3781512926942]
Image super-resolution (SR) has witnessed extensive neural network designs from CNN to transformer architectures.
This work investigates the potential of network pruning for super-resolution iteration to take advantage of off-the-shelf network designs and reduce the underlying computational overhead.
We propose a novel Iterative Soft Shrinkage-Percentage (ISS-P) method by optimizing the sparse structure of a randomly network at each and tweaking unimportant weights with a small amount proportional to the magnitude scale on-the-fly.
arXiv Detail & Related papers (2023-03-16T21:06:13Z) - On Model Compression for Neural Networks: Framework, Algorithm, and Convergence Guarantee [21.818773423324235]
This paper focuses on two model compression techniques: low-rank approximation and weight approximation.
In this paper, a holistic framework is proposed for model compression from a novel perspective of non optimization.
arXiv Detail & Related papers (2023-03-13T02:14:42Z) - Complexity-Driven CNN Compression for Resource-constrained Edge AI [1.6114012813668934]
We propose a novel and computationally efficient pruning pipeline by exploiting the inherent layer-level complexities of CNNs.
We define three modes of pruning, namely parameter-aware (PA), FLOPs-aware (FA), and memory-aware (MA), to introduce versatile compression of CNNs.
arXiv Detail & Related papers (2022-08-26T16:01:23Z) - Learning from Images: Proactive Caching with Parallel Convolutional
Neural Networks [94.85780721466816]
A novel framework for proactive caching is proposed in this paper.
It combines model-based optimization with data-driven techniques by transforming an optimization problem into a grayscale image.
Numerical results show that the proposed scheme can reduce 71.6% computation time with only 0.8% additional performance cost.
arXiv Detail & Related papers (2021-08-15T21:32:47Z) - Multi-objective Evolutionary Approach for Efficient Kernel Size and
Shape for CNN [12.697368516837718]
State-of-the-art development in CNN topology, such as VGGNet and ResNet, have become increasingly accurate.
These networks are computationally expensive involving billions of arithmetic operations and parameters.
This paper considers optimising the computational resource consumption by reducing the size and number of kernels in convolutional layers.
arXiv Detail & Related papers (2021-06-28T14:47:29Z) - Quantized Proximal Averaging Network for Analysis Sparse Coding [23.080395291046408]
We unfold an iterative algorithm into a trainable network that facilitates learning sparsity prior to quantization.
We demonstrate applications to compressed image recovery and magnetic resonance image reconstruction.
arXiv Detail & Related papers (2021-05-13T12:05:35Z) - Manifold Regularized Dynamic Network Pruning [102.24146031250034]
This paper proposes a new paradigm that dynamically removes redundant filters by embedding the manifold information of all instances into the space of pruned networks.
The effectiveness of the proposed method is verified on several benchmarks, which shows better performance in terms of both accuracy and computational cost.
arXiv Detail & Related papers (2021-03-10T03:59:03Z) - A Deep-Unfolded Reference-Based RPCA Network For Video
Foreground-Background Separation [86.35434065681925]
This paper proposes a new deep-unfolding-based network design for the problem of Robust Principal Component Analysis (RPCA)
Unlike existing designs, our approach focuses on modeling the temporal correlation between the sparse representations of consecutive video frames.
Experimentation using the moving MNIST dataset shows that the proposed network outperforms a recently proposed state-of-the-art RPCA network in the task of video foreground-background separation.
arXiv Detail & Related papers (2020-10-02T11:40:09Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.