Pre-defined Sparsity for Low-Complexity Convolutional Neural Networks
- URL: http://arxiv.org/abs/2001.10710v2
- Date: Tue, 4 Feb 2020 19:26:39 GMT
- Title: Pre-defined Sparsity for Low-Complexity Convolutional Neural Networks
- Authors: Souvik Kundu, Mahdi Nazemi, Massoud Pedram, Keith M. Chugg, Peter A.
Beerel
- Abstract summary: This work introduces convolutional layers with pre-defined sparse 2D kernels that have support sets that repeat periodically within and across filters.
Due to the efficient storage of our periodic sparse kernels, the parameter savings can translate into considerable improvements in energy efficiency.
- Score: 9.409651543514615
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The high energy cost of processing deep convolutional neural networks impedes
their ubiquitous deployment in energy-constrained platforms such as embedded
systems and IoT devices. This work introduces convolutional layers with
pre-defined sparse 2D kernels that have support sets that repeat periodically
within and across filters. Due to the efficient storage of our periodic sparse
kernels, the parameter savings can translate into considerable improvements in
energy efficiency due to reduced DRAM accesses, thus promising significant
improvements in the trade-off between energy consumption and accuracy for both
training and inference. To evaluate this approach, we performed experiments
with two widely accepted datasets, CIFAR-10 and Tiny ImageNet in sparse
variants of the ResNet18 and VGG16 architectures. Compared to baseline models,
our proposed sparse variants require up to 82% fewer model parameters with
5.6times fewer FLOPs with negligible loss in accuracy for ResNet18 on CIFAR-10.
For VGG16 trained on Tiny ImageNet, our approach requires 5.8times fewer FLOPs
and up to 83.3% fewer model parameters with a drop in top-5 (top-1) accuracy of
only 1.2% (2.1%). We also compared the performance of our proposed
architectures with that of ShuffleNet andMobileNetV2. Using similar
hyperparameters and FLOPs, our ResNet18 variants yield an average accuracy
improvement of 2.8%.
Related papers
- Dual sparse training framework: inducing activation map sparsity via Transformed $\ell1$ regularization [2.631955426232593]
This paper presents a method to induce the sparsity of activation maps based on Transformed $ell1$ regularization.
Compared to previous methods, Transformed $ell1$ can achieve higher sparsity and better adapt to different network structures.
The dual sparse training framework can greatly reduce the computational load and provide potential for reducing the required storage during runtime.
arXiv Detail & Related papers (2024-05-30T03:11:21Z) - Learning Activation Functions for Sparse Neural Networks [12.234742322758418]
Sparse Neural Networks (SNNs) can potentially demonstrate similar performance to their dense counterparts.
However, the accuracy drop incurred by SNNs, especially at high pruning ratios, can be an issue in critical deployment conditions.
We focus on learning a novel way to tune activation functions for sparse networks.
arXiv Detail & Related papers (2023-05-18T13:30:29Z) - Efficient CNN Architecture Design Guided by Visualization [13.074652653088584]
VGNetG-1.0MP achieves 67.7% top-1 accuracy with 0.99M parameters and 69.2% top-1 accuracy with 1.14M parameters on ImageNet classification dataset.
Our VGNetF-1.5MP archives 64.4%(-3.2%) top-1 accuracy and 66.2%(-1.4%) top-1 accuracy with additional Gaussian kernels.
arXiv Detail & Related papers (2022-07-21T06:22:15Z) - EdgeNeXt: Efficiently Amalgamated CNN-Transformer Architecture for
Mobile Vision Applications [68.35683849098105]
We introduce split depth-wise transpose attention (SDTA) encoder that splits input tensors into multiple channel groups.
Our EdgeNeXt model with 1.3M parameters achieves 71.2% top-1 accuracy on ImageNet-1K.
Our EdgeNeXt model with 5.6M parameters achieves 79.4% top-1 accuracy on ImageNet-1K.
arXiv Detail & Related papers (2022-06-21T17:59:56Z) - DS-Net++: Dynamic Weight Slicing for Efficient Inference in CNNs and
Transformers [105.74546828182834]
We show a hardware-efficient dynamic inference regime, named dynamic weight slicing, which adaptively slice a part of network parameters for inputs with diverse difficulty levels.
We present dynamic slimmable network (DS-Net) and dynamic slice-able network (DS-Net++) by input-dependently adjusting filter numbers of CNNs and multiple dimensions in both CNNs and transformers.
arXiv Detail & Related papers (2021-09-21T09:57:21Z) - Toward Compact Deep Neural Networks via Energy-Aware Pruning [2.578242050187029]
We propose a novel energy-aware pruning method that quantifies the importance of each filter in the network using nuclear-norm (NN)
We achieve competitive results with 40.4/49.8% of FLOPs and 45.9/52.9% of parameter reduction with 94.13/94.61% in the Top-1 accuracy with ResNet-56/110 on CIFAR-10.
arXiv Detail & Related papers (2021-03-19T15:33:16Z) - Non-Parametric Adaptive Network Pruning [125.4414216272874]
We introduce non-parametric modeling to simplify the algorithm design.
Inspired by the face recognition community, we use a message passing algorithm to obtain an adaptive number of exemplars.
EPruner breaks the dependency on the training data in determining the "important" filters.
arXiv Detail & Related papers (2021-01-20T06:18:38Z) - MicroNet: Towards Image Recognition with Extremely Low FLOPs [117.96848315180407]
MicroNet is an efficient convolutional neural network using extremely low computational cost.
A family of MicroNets achieve a significant performance gain over the state-of-the-art in the low FLOP regime.
For instance, MicroNet-M1 achieves 61.1% top-1 accuracy on ImageNet classification with 12 MFLOPs, outperforming MobileNetV3 by 11.3%.
arXiv Detail & Related papers (2020-11-24T18:59:39Z) - PENNI: Pruned Kernel Sharing for Efficient CNN Inference [41.050335599000036]
State-of-the-art (SOTA) CNNs achieve outstanding performance on various tasks.
Their high computation demand and massive number of parameters make it difficult to deploy these SOTA CNNs onto resource-constrained devices.
We propose PENNI, a CNN model compression framework that is able to achieve model compactness and hardware efficiency simultaneously.
arXiv Detail & Related papers (2020-05-14T16:57:41Z) - Highly Efficient Salient Object Detection with 100K Parameters [137.74898755102387]
We propose a flexible convolutional module, namely generalized OctConv (gOctConv), to efficiently utilize both in-stage and cross-stages multi-scale features.
We build an extremely light-weighted model, namely CSNet, which achieves comparable performance with about 0.2% (100k) of large models on popular object detection benchmarks.
arXiv Detail & Related papers (2020-03-12T07:00:46Z) - End-to-End Multi-speaker Speech Recognition with Transformer [88.22355110349933]
We replace the RNN-based encoder-decoder in the speech recognition model with a Transformer architecture.
We also modify the self-attention component to be restricted to a segment rather than the whole sequence in order to reduce computation.
arXiv Detail & Related papers (2020-02-10T16:29:26Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.