CNN2Gate: Toward Designing a General Framework for Implementation of
Convolutional Neural Networks on FPGA
- URL: http://arxiv.org/abs/2004.04641v2
- Date: Fri, 10 Apr 2020 00:59:40 GMT
- Title: CNN2Gate: Toward Designing a General Framework for Implementation of
Convolutional Neural Networks on FPGA
- Authors: Alireza Ghaffari, Yvon Savaria
- Abstract summary: This paper introduces an integrated framework that supports compilation of a CNN model for an FPGA target.
CNN2Gate exploits the OpenCL synthesis workflow for FPGAs offered by commercial vendors.
This paper reports results of automatic synthesis and design-space exploration of AlexNet and VGG-16 on various Intel FPGA platforms.
- Score: 0.3655021726150368
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Convolutional Neural Networks (CNNs) have a major impact on our society
because of the numerous services they provide. On the other hand, they require
considerable computing power. To satisfy these requirements, it is possible to
use graphic processing units (GPUs). However, high power consumption and
limited external IOs constrain their usability and suitability in industrial
and mission-critical scenarios. Recently, the number of researches that utilize
FPGAs to implement CNNs are increasing rapidly. This is due to the lower power
consumption and easy reconfigurability offered by these platforms. Because of
the research efforts put into topics such as architecture, synthesis and
optimization, some new challenges are arising to integrate such hardware
solutions to high-level machine learning software libraries. This paper
introduces an integrated framework (CNN2Gate) that supports compilation of a
CNN model for an FPGA target. CNN2Gate exploits the OpenCL synthesis workflow
for FPGAs offered by commercial vendors. CNN2Gate is capable of parsing CNN
models from several popular high-level machine learning libraries such as
Keras, Pytorch, Caffe2 etc. CNN2Gate extracts computation flow of layers, in
addition to weights and biases and applies a "given" fixed-point quantization.
Furthermore, it writes this information in the proper format for OpenCL
synthesis tools that are then used to build and run the project on FPGA.
CNN2Gate performs design-space exploration using a reinforcement learning agent
and fits the design on different FPGAs with limited logic resources
automatically. This paper reports results of automatic synthesis and
design-space exploration of AlexNet and VGG-16 on various Intel FPGA platforms.
CNN2Gate achieves a latency of 205 ms for VGG-16 and 18 ms for AlexNet on the
FPGA.
Related papers
- FPGA-based Acceleration of Neural Network for Image Classification using Vitis AI [0.0]
We accelerate a CNN for image classification with the CIFAR-10 dataset using Vitis-AI on Xilinx Zynq UltraScale+ MPSoC ZCU104 FPGA evaluation board.
The work achieves 3.33-5.82x higher throughput and 3.39-6.30x higher energy efficiency than CPU and GPU baselines.
arXiv Detail & Related papers (2024-12-30T14:26:17Z) - Quasar-ViT: Hardware-Oriented Quantization-Aware Architecture Search for Vision Transformers [56.37495946212932]
Vision transformers (ViTs) have demonstrated their superior accuracy for computer vision tasks compared to convolutional neural networks (CNNs)
This work proposes Quasar-ViT, a hardware-oriented quantization-aware architecture search framework for ViTs.
arXiv Detail & Related papers (2024-07-25T16:35:46Z) - End-to-end codesign of Hessian-aware quantized neural networks for FPGAs
and ASICs [49.358119307844035]
We develop an end-to-end workflow for the training and implementation of co-designed neural networks (NNs)
This makes efficient NN implementations in hardware accessible to nonexperts, in a single open-sourced workflow.
We demonstrate the workflow in a particle physics application involving trigger decisions that must operate at the 40 MHz collision rate of the Large Hadron Collider (LHC)
We implement an optimized mixed-precision NN for high-momentum particle jets in simulated LHC proton-proton collisions.
arXiv Detail & Related papers (2023-04-13T18:00:01Z) - HARFLOW3D: A Latency-Oriented 3D-CNN Accelerator Toolflow for HAR on
FPGA Devices [71.45672882756001]
This study introduces a novel streaming architecture based toolflow for mapping 3D Convolutional Neural Networks onto FPGAs.
The HARFLOW3D toolflow takes as input a 3D CNN in ONNX format and a description of the FPGA characteristics.
The ability of the toolflow to support a broad range of models and devices is shown through a number of experiments on various 3D CNN and FPGA system pairs.
arXiv Detail & Related papers (2023-03-30T08:25:27Z) - Optimization of FPGA-based CNN Accelerators Using Metaheuristics [1.854931308524932]
convolutional neural networks (CNNs) have demonstrated their ability to solve problems in many fields.
FPGAs have seen a surge in interest for accelerating CNN inference.
Current trend in FPGA-based CNN accelerators is to implement multiple convolutional layer processors (CLPs)
arXiv Detail & Related papers (2022-09-22T18:57:49Z) - FFCNN: Fast FPGA based Acceleration for Convolution neural network
inference [0.0]
We present Fast Inference on FPGAs for Convolution Neural Network (FFCNN)
FFCNN is based on a deeply pipelined OpenCL kernels architecture.
Data reuse and task mapping techniques are also presented to improve design efficiency.
arXiv Detail & Related papers (2022-08-28T16:55:25Z) - FPGA-optimized Hardware acceleration for Spiking Neural Networks [69.49429223251178]
This work presents the development of a hardware accelerator for an SNN, with off-line training, applied to an image recognition task.
The design targets a Xilinx Artix-7 FPGA, using in total around the 40% of the available hardware resources.
It reduces the classification time by three orders of magnitude, with a small 4.5% impact on the accuracy, if compared to its software, full precision counterpart.
arXiv Detail & Related papers (2022-01-18T13:59:22Z) - Quantized Neural Networks via {-1, +1} Encoding Decomposition and
Acceleration [83.84684675841167]
We propose a novel encoding scheme using -1, +1 to decompose quantized neural networks (QNNs) into multi-branch binary networks.
We validate the effectiveness of our method on large-scale image classification, object detection, and semantic segmentation tasks.
arXiv Detail & Related papers (2021-06-18T03:11:15Z) - AdderNet and its Minimalist Hardware Design for Energy-Efficient
Artificial Intelligence [111.09105910265154]
We present a novel minimalist hardware architecture using adder convolutional neural network (AdderNet)
The whole AdderNet can practically achieve 16% enhancement in speed.
We conclude the AdderNet is able to surpass all the other competitors.
arXiv Detail & Related papers (2021-01-25T11:31:52Z) - Systolic-CNN: An OpenCL-defined Scalable Run-time-flexible FPGA
Accelerator Architecture for Accelerating Convolutional Neural Network
Inference in Cloud/Edge Computing [8.826181951806928]
Systolic-CNN is an OpenCL-defined scalable, run-time-flexible FPGA accelerator architecture.
Systolic-CNN is optimized for accelerating the inference of various convolutional neural networks (CNNs) in multi-tenancy cloud/edge computing.
arXiv Detail & Related papers (2020-12-06T03:53:11Z) - ZynqNet: An FPGA-Accelerated Embedded Convolutional Neural Network [0.0]
This thesis explores the potential of FPGA-based CNN acceleration.
It demonstrates a fully functional proof-of-concept CNN implementation on a Zynq System-on-Chip.
arXiv Detail & Related papers (2020-05-14T11:54:04Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.