Phantom: A High-Performance Computational Core for Sparse Convolutional
Neural Networks
- URL: http://arxiv.org/abs/2111.05002v1
- Date: Tue, 9 Nov 2021 08:43:03 GMT
- Title: Phantom: A High-Performance Computational Core for Sparse Convolutional
Neural Networks
- Authors: Mahmood Azhar Qureshi, Arslan Munir
- Abstract summary: Sparse convolutional neural networks (CNNs) have gained significant traction over the past few years.
They can drastically decrease the model size and computations, if exploited befittingly, as compared to their dense counterparts.
Recently proposed sparse accelerators like SCNN, Eyeriss v2, and SparTen, actively exploit the two-sided or full sparsity, that is, sparsity in both weights and activations, for performance gains.
These accelerators either have inefficient micro-architecture, which limits their performance, have no support for non-unit stride convolutions and fully-connected layers, or suffer
- Score: 3.198144010381572
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Sparse convolutional neural networks (CNNs) have gained significant traction
over the past few years as sparse CNNs can drastically decrease the model size
and computations, if exploited befittingly, as compared to their dense
counterparts. Sparse CNNs often introduce variations in the layer shapes and
sizes, which can prevent dense accelerators from performing well on sparse CNN
models. Recently proposed sparse accelerators like SCNN, Eyeriss v2, and
SparTen, actively exploit the two-sided or full sparsity, that is, sparsity in
both weights and activations, for performance gains. These accelerators,
however, either have inefficient micro-architecture, which limits their
performance, have no support for non-unit stride convolutions and
fully-connected (FC) layers, or suffer massively from systematic load
imbalance. To circumvent these issues and support both sparse and dense models,
we propose Phantom, a multi-threaded, dynamic, and flexible neural
computational core. Phantom uses sparse binary mask representation to actively
lookahead into sparse computations, and dynamically schedule its computational
threads to maximize the thread utilization and throughput. We also generate a
two-dimensional (2D) mesh architecture of Phantom neural computational cores,
which we refer to as Phantom-2D accelerator, and propose a novel dataflow that
supports all layers of a CNN, including unit and non-unit stride convolutions,
and FC layers. In addition, Phantom-2D uses a two-level load balancing strategy
to minimize the computational idling, thereby, further improving the hardware
utilization. To show support for different types of layers, we evaluate the
performance of the Phantom architecture on VGG16 and MobileNet. Our simulations
show that the Phantom-2D accelerator attains a performance gain of 12x, 4.1x,
1.98x, and 2.36x, over dense architectures, SCNN, SparTen, and Eyeriss v2,
respectively.
Related papers
- Mitigating Memory Wall Effects in CNN Engines with On-the-Fly Weights
Generation [13.681095158525514]
unzipFPGA is a novel CNN inference system that counteracts the limitations of existing CNN engines.
We introduce a weights generator module that enables the on-chip on-the-fly generation of weights.
We further enhance unzipFPGA with an automated hardware-aware methodology that tailors the weights generation mechanism to the target CNN-device pair.
arXiv Detail & Related papers (2023-07-25T11:19:21Z) - InceptionNeXt: When Inception Meets ConvNeXt [147.50287103414115]
We build a series of networks, namely IncepitonNeXt, which not only enjoy high throughputs but also maintain competitive performance.
InceptionNeXt achieves 1.6x higher training throughputs than ConvNeXt-T, as well as attains 0.2% top-1 accuracy improvement on ImageNet-1K.
arXiv Detail & Related papers (2023-03-29T17:59:58Z) - EcoFlow: Efficient Convolutional Dataflows for Low-Power Neural Network
Accelerators [12.223778147172107]
Dilated and transposed convolutions are widely used in modern convolutional neural networks (CNNs)
These kernels stress current compute systems due to their high memory intensity, exascale compute demands, and large energy consumption.
We propose EcoFlow, a new set of dataflows and mapping algorithms for dilated and transposed convolutions.
arXiv Detail & Related papers (2022-02-04T18:48:36Z) - DS-Net++: Dynamic Weight Slicing for Efficient Inference in CNNs and
Transformers [105.74546828182834]
We show a hardware-efficient dynamic inference regime, named dynamic weight slicing, which adaptively slice a part of network parameters for inputs with diverse difficulty levels.
We present dynamic slimmable network (DS-Net) and dynamic slice-able network (DS-Net++) by input-dependently adjusting filter numbers of CNNs and multiple dimensions in both CNNs and transformers.
arXiv Detail & Related papers (2021-09-21T09:57:21Z) - Content-Aware Convolutional Neural Networks [98.97634685964819]
Convolutional Neural Networks (CNNs) have achieved great success due to the powerful feature learning ability of convolution layers.
We propose a Content-aware Convolution (CAC) that automatically detects the smooth windows and applies a 1x1 convolutional kernel to replace the original large kernel.
arXiv Detail & Related papers (2021-06-30T03:54:35Z) - Learning N:M Fine-grained Structured Sparse Neural Networks From Scratch [75.69506249886622]
Sparsity in Deep Neural Networks (DNNs) has been widely studied to compress and accelerate the models on resource-constrained environments.
In this paper, we are the first to study training from scratch an N:M fine-grained structured sparse network.
arXiv Detail & Related papers (2021-02-08T05:55:47Z) - GhostSR: Learning Ghost Features for Efficient Image Super-Resolution [49.393251361038025]
Single image super-resolution (SISR) system based on convolutional neural networks (CNNs) achieves fancy performance while requires huge computational costs.
We propose to use shift operation to generate the redundant features (i.e., Ghost features) of SISR models.
We show that both the non-compact and lightweight SISR models embedded in our proposed module can achieve comparable performance to that of their baselines.
arXiv Detail & Related papers (2021-01-21T10:09:47Z) - When deep learning models on GPU can be accelerated by taking advantage
of unstructured sparsity [0.0]
This paper is focused on the improvement the efficiency of the sparse convolutional neural networks (CNNs) layers on graphic processing units ( GPU)
The modern CNN models need megabytes of coefficients and needed millions MAC operations to perform convolution.
We show when is worth using a direct sparse operation to speed-up the calculation of the convolution layers.
arXiv Detail & Related papers (2020-11-12T10:13:48Z) - RT3D: Achieving Real-Time Execution of 3D Convolutional Neural Networks
on Mobile Devices [57.877112704841366]
This paper proposes RT3D, a model compression and mobile acceleration framework for 3D CNNs.
For the first time, real-time execution of 3D CNNs is achieved on off-the-shelf mobiles.
arXiv Detail & Related papers (2020-07-20T02:05:32Z) - Learning Sparse & Ternary Neural Networks with Entropy-Constrained
Trained Ternarization (EC2T) [17.13246260883765]
Deep neural networks (DNNs) have shown remarkable success in a variety of machine learning applications.
In recent years, there is an increasing interest in deploying DNNs to resource-constrained devices with limited energy, memory, and computational budget.
We propose Entropy-Constrained Trained Ternarization (EC2T), a general framework to create sparse and ternary neural networks.
arXiv Detail & Related papers (2020-04-02T15:38:00Z) - Performance Aware Convolutional Neural Network Channel Pruning for
Embedded GPUs [6.035819238203187]
We show that a reduction in the number of convolutional channels, pruning 12% of the initial size, is in some cases detrimental to performance.
We also find examples where performance-aware pruning achieves the intended results, with performance speedups of 3x with cuDNN and above 10x with Arm Compute Library and TVM.
arXiv Detail & Related papers (2020-02-20T12:07:44Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.