S2Engine: A Novel Systolic Architecture for Sparse Convolutional Neural
Networks
- URL: http://arxiv.org/abs/2106.07894v1
- Date: Tue, 15 Jun 2021 06:08:37 GMT
- Title: S2Engine: A Novel Systolic Architecture for Sparse Convolutional Neural
Networks
- Authors: Jianlei Yang, Wenzhi Fu, Xingzhou Cheng, Xucheng Ye, Pengcheng Dai,
and Weisheng Zhao
- Abstract summary: S2Engine transmits compressed data internally and allows each processing element to dynamically select an aligned data from the compressed dataflow in convolution.
Compared to the naive systolic array, S2Engine achieves about $3.2times$ and about $3.0times$ improvements on speed and energy efficiency, respectively.
- Score: 5.417507302691321
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Convolutional neural networks (CNNs) have achieved great success in
performing cognitive tasks. However, execution of CNNs requires a large amount
of computing resources and generates heavy memory traffic, which imposes a
severe challenge on computing system design. Through optimizing parallel
executions and data reuse in convolution, systolic architecture demonstrates
great advantages in accelerating CNN computations. However, regular internal
data transmission path in traditional systolic architecture prevents the
systolic architecture from completely leveraging the benefits introduced by
neural network sparsity. Deployment of fine-grained sparsity on the existing
systolic architectures is greatly hindered by the incurred computational
overheads. In this work, we propose S2Engine $-$ a novel systolic architecture
that can fully exploit the sparsity in CNNs with maximized data reuse. S2Engine
transmits compressed data internally and allows each processing element to
dynamically select an aligned data from the compressed dataflow in convolution.
Compared to the naive systolic array, S2Engine achieves about $3.2\times$ and
about $3.0\times$ improvements on speed and energy efficiency, respectively.
Related papers
- TCCT-Net: Two-Stream Network Architecture for Fast and Efficient Engagement Estimation via Behavioral Feature Signals [58.865901821451295]
We present a novel two-stream feature fusion "Tensor-Convolution and Convolution-Transformer Network" (TCCT-Net) architecture.
To better learn the meaningful patterns in the temporal-spatial domain, we design a "CT" stream that integrates a hybrid convolutional-transformer.
In parallel, to efficiently extract rich patterns from the temporal-frequency domain, we introduce a "TC" stream that uses Continuous Wavelet Transform (CWT) to represent information in a 2D tensor form.
arXiv Detail & Related papers (2024-04-15T06:01:48Z) - YFlows: Systematic Dataflow Exploration and Code Generation for
Efficient Neural Network Inference using SIMD Architectures on CPUs [3.1445034800095413]
We address the challenges associated with deploying neural networks on CPUs.
Our novel approach is to use the dataflow of a neural network to explore data reuse opportunities.
Our results show that the dataflow that keeps outputs in SIMD registers consistently yields the best performance.
arXiv Detail & Related papers (2023-10-01T05:11:54Z) - BiFSMNv2: Pushing Binary Neural Networks for Keyword Spotting to
Real-Network Performance [54.214426436283134]
Deep neural networks, such as the Deep-FSMN, have been widely studied for keyword spotting (KWS) applications.
We present a strong yet efficient binary neural network for KWS, namely BiFSMNv2, pushing it to the real-network accuracy performance.
We highlight that benefiting from the compact architecture and optimized hardware kernel, BiFSMNv2 can achieve an impressive 25.1x speedup and 20.2x storage-saving on edge hardware.
arXiv Detail & Related papers (2022-11-13T18:31:45Z) - Dynamic Split Computing for Efficient Deep Edge Intelligence [78.4233915447056]
We introduce dynamic split computing, where the optimal split location is dynamically selected based on the state of the communication channel.
We show that dynamic split computing achieves faster inference in edge computing environments where the data rate and server load vary over time.
arXiv Detail & Related papers (2022-05-23T12:35:18Z) - EcoFlow: Efficient Convolutional Dataflows for Low-Power Neural Network
Accelerators [12.223778147172107]
Dilated and transposed convolutions are widely used in modern convolutional neural networks (CNNs)
These kernels stress current compute systems due to their high memory intensity, exascale compute demands, and large energy consumption.
We propose EcoFlow, a new set of dataflows and mapping algorithms for dilated and transposed convolutions.
arXiv Detail & Related papers (2022-02-04T18:48:36Z) - Design and Scaffolded Training of an Efficient DNN Operator for Computer
Vision on the Edge [3.3767251810292955]
FuSeConv is a drop-in replacement for depthwise separable convolutions.
FuSeConv factorizes convolution fully along their spatial and depth dimensions.
Neural Operator Scaffolding scaffolds the training of FuSeConv by distilling knowledge from depthwise separable convolutions.
arXiv Detail & Related papers (2021-08-25T19:22:25Z) - Quantized Neural Networks via {-1, +1} Encoding Decomposition and
Acceleration [83.84684675841167]
We propose a novel encoding scheme using -1, +1 to decompose quantized neural networks (QNNs) into multi-branch binary networks.
We validate the effectiveness of our method on large-scale image classification, object detection, and semantic segmentation tasks.
arXiv Detail & Related papers (2021-06-18T03:11:15Z) - FuSeConv: Fully Separable Convolutions for Fast Inference on Systolic
Arrays [2.8583189395674653]
We propose FuSeConv as a drop-in replacement for depth-wise separable convolution.
FuSeConv generalizes the decomposition of convolutions fully to separable 1D convolutions along spatial and depth dimensions.
We achieve a significant speed-up of 3x-7x with the MobileNet family of networks on a systolic array of size 64x64, with comparable accuracy on the ImageNet dataset.
arXiv Detail & Related papers (2021-05-27T20:19:39Z) - Learning N:M Fine-grained Structured Sparse Neural Networks From Scratch [75.69506249886622]
Sparsity in Deep Neural Networks (DNNs) has been widely studied to compress and accelerate the models on resource-constrained environments.
In this paper, we are the first to study training from scratch an N:M fine-grained structured sparse network.
arXiv Detail & Related papers (2021-02-08T05:55:47Z) - Binary Graph Neural Networks [69.51765073772226]
Graph Neural Networks (GNNs) have emerged as a powerful and flexible framework for representation learning on irregular data.
In this paper, we present and evaluate different strategies for the binarization of graph neural networks.
We show that through careful design of the models, and control of the training process, binary graph neural networks can be trained at only a moderate cost in accuracy on challenging benchmarks.
arXiv Detail & Related papers (2020-12-31T18:48:58Z) - SparseTrain: Exploiting Dataflow Sparsity for Efficient Convolutional
Neural Networks Training [34.657942518465575]
Training Convolutional Neural Networks (CNNs) usually requires a large number of computational resources.
In this paper, textitSparseTrain is proposed to accelerate CNN training by fully exploiting the sparsity.
We have built %a simple compiler to map CNNs onto textitSparseTrain, and a cycle-accurate architecture simulator to evaluate the performance and efficiency.
arXiv Detail & Related papers (2020-07-21T11:01:36Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.