PENNI: Pruned Kernel Sharing for Efficient CNN Inference
- URL: http://arxiv.org/abs/2005.07133v2
- Date: Thu, 25 Jun 2020 02:28:00 GMT
- Title: PENNI: Pruned Kernel Sharing for Efficient CNN Inference
- Authors: Shiyu Li, Edward Hanson, Hai Li, Yiran Chen
- Abstract summary: State-of-the-art (SOTA) CNNs achieve outstanding performance on various tasks.
Their high computation demand and massive number of parameters make it difficult to deploy these SOTA CNNs onto resource-constrained devices.
We propose PENNI, a CNN model compression framework that is able to achieve model compactness and hardware efficiency simultaneously.
- Score: 41.050335599000036
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Although state-of-the-art (SOTA) CNNs achieve outstanding performance on
various tasks, their high computation demand and massive number of parameters
make it difficult to deploy these SOTA CNNs onto resource-constrained devices.
Previous works on CNN acceleration utilize low-rank approximation of the
original convolution layers to reduce computation cost. However, these methods
are very difficult to conduct upon sparse models, which limits execution
speedup since redundancies within the CNN model are not fully exploited. We
argue that kernel granularity decomposition can be conducted with low-rank
assumption while exploiting the redundancy within the remaining compact
coefficients. Based on this observation, we propose PENNI, a CNN model
compression framework that is able to achieve model compactness and hardware
efficiency simultaneously by (1) implementing kernel sharing in convolution
layers via a small number of basis kernels and (2) alternately adjusting bases
and coefficients with sparse constraints. Experiments show that we can prune
97% parameters and 92% FLOPs on ResNet18 CIFAR10 with no accuracy loss, and
achieve 44% reduction in run-time memory consumption and a 53% reduction in
inference latency.
Related papers
- Convolutional Neural Network Compression via Dynamic Parameter Rank
Pruning [4.7027290803102675]
We propose an efficient training method for CNN compression via dynamic parameter rank pruning.
Our experiments show that the proposed method can yield substantial storage savings while maintaining or even enhancing classification performance.
arXiv Detail & Related papers (2024-01-15T23:52:35Z) - Instant Complexity Reduction in CNNs using Locality-Sensitive Hashing [50.79602839359522]
We propose HASTE (Hashing for Tractable Efficiency), a parameter-free and data-free module that acts as a plug-and-play replacement for any regular convolution module.
We are able to drastically compress latent feature maps without sacrificing much accuracy by using locality-sensitive hashing (LSH)
In particular, we are able to instantly drop 46.72% of FLOPs while only losing 1.25% accuracy by just swapping the convolution modules in a ResNet34 on CIFAR-10 for our HASTE module.
arXiv Detail & Related papers (2023-09-29T13:09:40Z) - SCONNA: A Stochastic Computing Based Optical Accelerator for Ultra-Fast,
Energy-Efficient Inference of Integer-Quantized CNNs [0.0]
A CNN inference task uses convolution operations that are typically transformed into vector-dot-product (VDP) operations.
Several photonic microring resonators (MRRs) based hardware architectures have been proposed to accelerate integer-quantized CNNs.
Existing photonic MRR-based analog accelerators exhibit a very strong trade-off between the achievable input/weight precision and VDP operation size.
arXiv Detail & Related papers (2023-02-14T13:35:15Z) - Attention-based Feature Compression for CNN Inference Offloading in Edge
Computing [93.67044879636093]
This paper studies the computational offloading of CNN inference in device-edge co-inference systems.
We propose a novel autoencoder-based CNN architecture (AECNN) for effective feature extraction at end-device.
Experiments show that AECNN can compress the intermediate data by more than 256x with only about 4% accuracy loss.
arXiv Detail & Related papers (2022-11-24T18:10:01Z) - Compact representations of convolutional neural networks via weight
pruning and quantization [63.417651529192014]
We propose a novel storage format for convolutional neural networks (CNNs) based on source coding and leveraging both weight pruning and quantization.
We achieve a reduction of space occupancy up to 0.6% on fully connected layers and 5.44% on the whole network, while performing at least as competitive as the baseline.
arXiv Detail & Related papers (2021-08-28T20:39:54Z) - Multi-objective Evolutionary Approach for Efficient Kernel Size and
Shape for CNN [12.697368516837718]
State-of-the-art development in CNN topology, such as VGGNet and ResNet, have become increasingly accurate.
These networks are computationally expensive involving billions of arithmetic operations and parameters.
This paper considers optimising the computational resource consumption by reducing the size and number of kernels in convolutional layers.
arXiv Detail & Related papers (2021-06-28T14:47:29Z) - Efficient Micro-Structured Weight Unification and Pruning for Neural
Network Compression [56.83861738731913]
Deep Neural Network (DNN) models are essential for practical applications, especially for resource limited devices.
Previous unstructured or structured weight pruning methods can hardly truly accelerate inference.
We propose a generalized weight unification framework at a hardware compatible micro-structured level to achieve high amount of compression and acceleration.
arXiv Detail & Related papers (2021-06-15T17:22:59Z) - ACP: Automatic Channel Pruning via Clustering and Swarm Intelligence
Optimization for CNN [6.662639002101124]
convolutional neural network (CNN) gets deeper and wider in recent years.
Existing magnitude-based pruning methods are efficient, but the performance of the compressed network is unpredictable.
We propose a novel automatic channel pruning method (ACP)
ACP is evaluated against several state-of-the-art CNNs on three different classification datasets.
arXiv Detail & Related papers (2021-01-16T08:56:38Z) - Tensor Reordering for CNN Compression [7.228285747845778]
We show how parameter redundancy in Convolutional Neural Network (CNN) filters can be effectively reduced by pruning in spectral domain.
Our approach is applied to pretrained CNNs and we show that minor additional fine-tuning allows our method to recover the original model performance.
arXiv Detail & Related papers (2020-10-22T23:45:34Z) - ACDC: Weight Sharing in Atom-Coefficient Decomposed Convolution [57.635467829558664]
We introduce a structural regularization across convolutional kernels in a CNN.
We show that CNNs now maintain performance with dramatic reduction in parameters and computations.
arXiv Detail & Related papers (2020-09-04T20:41:47Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.