Systolic-CNN: An OpenCL-defined Scalable Run-time-flexible FPGA
Accelerator Architecture for Accelerating Convolutional Neural Network
Inference in Cloud/Edge Computing
- URL: http://arxiv.org/abs/2012.03177v1
- Date: Sun, 6 Dec 2020 03:53:11 GMT
- Title: Systolic-CNN: An OpenCL-defined Scalable Run-time-flexible FPGA
Accelerator Architecture for Accelerating Convolutional Neural Network
Inference in Cloud/Edge Computing
- Authors: Akshay Dua, Yixing Li, Fengbo Ren
- Abstract summary: Systolic-CNN is an OpenCL-defined scalable, run-time-flexible FPGA accelerator architecture.
Systolic-CNN is optimized for accelerating the inference of various convolutional neural networks (CNNs) in multi-tenancy cloud/edge computing.
- Score: 8.826181951806928
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: This paper presents Systolic-CNN, an OpenCL-defined scalable,
run-time-flexible FPGA accelerator architecture, optimized for accelerating the
inference of various convolutional neural networks (CNNs) in multi-tenancy
cloud/edge computing. The existing OpenCL-defined FPGA accelerators for CNN
inference are insufficient due to limited flexibility for supporting multiple
CNN models at run time and poor scalability resulting in underutilized FPGA
resources and limited computational parallelism. Systolic-CNN adopts a highly
pipelined and paralleled 1-D systolic array architecture, which efficiently
explores both spatial and temporal parallelism for accelerating CNN inference
on FPGAs. Systolic-CNN is highly scalable and parameterized, which can be
easily adapted by users to achieve up to 100% utilization of the coarse-grained
computation resources (i.e., DSP blocks) for a given FPGA. Systolic-CNN is also
run-time-flexible in the context of multi-tenancy cloud/edge computing, which
can be time-shared to accelerate a variety of CNN models at run time without
the need of recompiling the FPGA kernel hardware nor reprogramming the FPGA.
The experiment results based on an Intel Arria/Stratix 10 GX FPGA Development
board show that the optimized single-precision implementation of Systolic-CNN
can achieve an average inference latency of 7ms/2ms, 84ms/33ms, 202ms/73ms,
1615ms/873ms, and 900ms/498ms per image for accelerating AlexNet, ResNet-50,
ResNet-152, RetinaNet, and Light-weight RetinaNet, respectively. Codes are
available at https://github.com/PSCLab-ASU/Systolic-CNN.
Related papers
- Spiker+: a framework for the generation of efficient Spiking Neural
Networks FPGA accelerators for inference at the edge [49.42371633618761]
Spiker+ is a framework for generating efficient, low-power, and low-area customized Spiking Neural Networks (SNN) accelerators on FPGA for inference at the edge.
Spiker+ is tested on two benchmark datasets, the MNIST and the Spiking Heidelberg Digits (SHD)
arXiv Detail & Related papers (2024-01-02T10:42:42Z) - Reconfigurable Distributed FPGA Cluster Design for Deep Learning
Accelerators [59.11160990637615]
We propose a distributed system based on lowpower embedded FPGAs designed for edge computing applications.
The proposed system can simultaneously execute diverse Neural Network (NN) models, arrange the graph in a pipeline structure, and manually allocate greater resources to the most computationally intensive layers of the NN graph.
arXiv Detail & Related papers (2023-05-24T16:08:55Z) - End-to-end codesign of Hessian-aware quantized neural networks for FPGAs
and ASICs [49.358119307844035]
We develop an end-to-end workflow for the training and implementation of co-designed neural networks (NNs)
This makes efficient NN implementations in hardware accessible to nonexperts, in a single open-sourced workflow.
We demonstrate the workflow in a particle physics application involving trigger decisions that must operate at the 40 MHz collision rate of the Large Hadron Collider (LHC)
We implement an optimized mixed-precision NN for high-momentum particle jets in simulated LHC proton-proton collisions.
arXiv Detail & Related papers (2023-04-13T18:00:01Z) - HARFLOW3D: A Latency-Oriented 3D-CNN Accelerator Toolflow for HAR on
FPGA Devices [71.45672882756001]
This study introduces a novel streaming architecture based toolflow for mapping 3D Convolutional Neural Networks onto FPGAs.
The HARFLOW3D toolflow takes as input a 3D CNN in ONNX format and a description of the FPGA characteristics.
The ability of the toolflow to support a broad range of models and devices is shown through a number of experiments on various 3D CNN and FPGA system pairs.
arXiv Detail & Related papers (2023-03-30T08:25:27Z) - Optimization of FPGA-based CNN Accelerators Using Metaheuristics [1.854931308524932]
convolutional neural networks (CNNs) have demonstrated their ability to solve problems in many fields.
FPGAs have seen a surge in interest for accelerating CNN inference.
Current trend in FPGA-based CNN accelerators is to implement multiple convolutional layer processors (CLPs)
arXiv Detail & Related papers (2022-09-22T18:57:49Z) - FFCNN: Fast FPGA based Acceleration for Convolution neural network
inference [0.0]
We present Fast Inference on FPGAs for Convolution Neural Network (FFCNN)
FFCNN is based on a deeply pipelined OpenCL kernels architecture.
Data reuse and task mapping techniques are also presented to improve design efficiency.
arXiv Detail & Related papers (2022-08-28T16:55:25Z) - FPGA-optimized Hardware acceleration for Spiking Neural Networks [69.49429223251178]
This work presents the development of a hardware accelerator for an SNN, with off-line training, applied to an image recognition task.
The design targets a Xilinx Artix-7 FPGA, using in total around the 40% of the available hardware resources.
It reduces the classification time by three orders of magnitude, with a small 4.5% impact on the accuracy, if compared to its software, full precision counterpart.
arXiv Detail & Related papers (2022-01-18T13:59:22Z) - An Adaptive Device-Edge Co-Inference Framework Based on Soft
Actor-Critic [72.35307086274912]
High-dimension parameter model and large-scale mathematical calculation restrict execution efficiency, especially for Internet of Things (IoT) devices.
We propose a new Deep Reinforcement Learning (DRL)-Soft Actor Critic for discrete (SAC-d), which generates the emphexit point, emphexit point, and emphcompressing bits by soft policy iterations.
Based on the latency and accuracy aware reward design, such an computation can well adapt to the complex environment like dynamic wireless channel and arbitrary processing, and is capable of supporting the 5G URL
arXiv Detail & Related papers (2022-01-09T09:31:50Z) - ZynqNet: An FPGA-Accelerated Embedded Convolutional Neural Network [0.0]
This thesis explores the potential of FPGA-based CNN acceleration.
It demonstrates a fully functional proof-of-concept CNN implementation on a Zynq System-on-Chip.
arXiv Detail & Related papers (2020-05-14T11:54:04Z) - CNN2Gate: Toward Designing a General Framework for Implementation of
Convolutional Neural Networks on FPGA [0.3655021726150368]
This paper introduces an integrated framework that supports compilation of a CNN model for an FPGA target.
CNN2Gate exploits the OpenCL synthesis workflow for FPGAs offered by commercial vendors.
This paper reports results of automatic synthesis and design-space exploration of AlexNet and VGG-16 on various Intel FPGA platforms.
arXiv Detail & Related papers (2020-04-06T01:57:53Z) - Evolutionary Bin Packing for Memory-Efficient Dataflow Inference
Acceleration on FPGA [2.3395728784538767]
Convolutional neural network (CNN) dataflow inference accelerators implemented in Field Programmable Gate Arrays (FPGAs) have demonstrated increased energy efficiency and lower latency.
However, the shapes complex of CNN parameter memories do not typically map well to FPGA on-chip memories (OCM)
We present a design methodology that improves the mapping efficiency of CNN parameters to FPGA OCM.
arXiv Detail & Related papers (2020-03-24T09:55:08Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.