Optimization of FPGA-based CNN Accelerators Using Metaheuristics
- URL: http://arxiv.org/abs/2209.11272v1
- Date: Thu, 22 Sep 2022 18:57:49 GMT
- Title: Optimization of FPGA-based CNN Accelerators Using Metaheuristics
- Authors: Sadiq M. Sait, Aiman El-Maleh, Mohammad Altakrouri, and Ahmad Shawahna
- Abstract summary: convolutional neural networks (CNNs) have demonstrated their ability to solve problems in many fields.
FPGAs have seen a surge in interest for accelerating CNN inference.
Current trend in FPGA-based CNN accelerators is to implement multiple convolutional layer processors (CLPs)
- Score: 1.854931308524932
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In recent years, convolutional neural networks (CNNs) have demonstrated their
ability to solve problems in many fields and with accuracy that was not
possible before. However, this comes with extensive computational requirements,
which made general CPUs unable to deliver the desired real-time performance. At
the same time, FPGAs have seen a surge in interest for accelerating CNN
inference. This is due to their ability to create custom designs with different
levels of parallelism. Furthermore, FPGAs provide better performance per watt
compared to GPUs. The current trend in FPGA-based CNN accelerators is to
implement multiple convolutional layer processors (CLPs), each of which is
tailored for a subset of layers. However, the growing complexity of CNN
architectures makes optimizing the resources available on the target FPGA
device to deliver optimal performance more challenging. In this paper, we
present a CNN accelerator and an accompanying automated design methodology that
employs metaheuristics for partitioning available FPGA resources to design a
Multi-CLP accelerator. Specifically, the proposed design tool adopts simulated
annealing (SA) and tabu search (TS) algorithms to find the number of CLPs
required and their respective configurations to achieve optimal performance on
a given target FPGA device. Here, the focus is on the key specifications and
hardware resources, including digital signal processors, block RAMs, and
off-chip memory bandwidth. Experimental results and comparisons using four
well-known benchmark CNNs are presented demonstrating that the proposed
acceleration framework is both encouraging and promising. The SA-/TS-based
Multi-CLP achieves 1.31x - 2.37x higher throughput than the state-of-the-art
Single-/Multi-CLP approaches in accelerating AlexNet, SqueezeNet 1.1, VGGNet,
and GoogLeNet architectures on the Xilinx VC707 and VC709 FPGA boards.
Related papers
- Quasar-ViT: Hardware-Oriented Quantization-Aware Architecture Search for Vision Transformers [56.37495946212932]
Vision transformers (ViTs) have demonstrated their superior accuracy for computer vision tasks compared to convolutional neural networks (CNNs)
This work proposes Quasar-ViT, a hardware-oriented quantization-aware architecture search framework for ViTs.
arXiv Detail & Related papers (2024-07-25T16:35:46Z) - Reconfigurable Distributed FPGA Cluster Design for Deep Learning
Accelerators [59.11160990637615]
We propose a distributed system based on lowpower embedded FPGAs designed for edge computing applications.
The proposed system can simultaneously execute diverse Neural Network (NN) models, arrange the graph in a pipeline structure, and manually allocate greater resources to the most computationally intensive layers of the NN graph.
arXiv Detail & Related papers (2023-05-24T16:08:55Z) - HARFLOW3D: A Latency-Oriented 3D-CNN Accelerator Toolflow for HAR on
FPGA Devices [71.45672882756001]
This study introduces a novel streaming architecture based toolflow for mapping 3D Convolutional Neural Networks onto FPGAs.
The HARFLOW3D toolflow takes as input a 3D CNN in ONNX format and a description of the FPGA characteristics.
The ability of the toolflow to support a broad range of models and devices is shown through a number of experiments on various 3D CNN and FPGA system pairs.
arXiv Detail & Related papers (2023-03-30T08:25:27Z) - Adaptable Butterfly Accelerator for Attention-based NNs via Hardware and
Algorithm Co-design [66.39546326221176]
Attention-based neural networks have become pervasive in many AI tasks.
The use of the attention mechanism and feed-forward network (FFN) demands excessive computational and memory resources.
This paper proposes a hardware-friendly variant that adopts a unified butterfly sparsity pattern to approximate both the attention mechanism and the FFNs.
arXiv Detail & Related papers (2022-09-20T09:28:26Z) - FFCNN: Fast FPGA based Acceleration for Convolution neural network
inference [0.0]
We present Fast Inference on FPGAs for Convolution Neural Network (FFCNN)
FFCNN is based on a deeply pipelined OpenCL kernels architecture.
Data reuse and task mapping techniques are also presented to improve design efficiency.
arXiv Detail & Related papers (2022-08-28T16:55:25Z) - An FPGA-based Solution for Convolution Operation Acceleration [0.0]
This paper proposes an FPGA-based architecture to accelerate the convolution operation.
The project's purpose is to produce an FPGA IP core that can process a convolutional layer at a time.
arXiv Detail & Related papers (2022-06-09T14:12:30Z) - FPGA-optimized Hardware acceleration for Spiking Neural Networks [69.49429223251178]
This work presents the development of a hardware accelerator for an SNN, with off-line training, applied to an image recognition task.
The design targets a Xilinx Artix-7 FPGA, using in total around the 40% of the available hardware resources.
It reduces the classification time by three orders of magnitude, with a small 4.5% impact on the accuracy, if compared to its software, full precision counterpart.
arXiv Detail & Related papers (2022-01-18T13:59:22Z) - An Adaptive Device-Edge Co-Inference Framework Based on Soft
Actor-Critic [72.35307086274912]
High-dimension parameter model and large-scale mathematical calculation restrict execution efficiency, especially for Internet of Things (IoT) devices.
We propose a new Deep Reinforcement Learning (DRL)-Soft Actor Critic for discrete (SAC-d), which generates the emphexit point, emphexit point, and emphcompressing bits by soft policy iterations.
Based on the latency and accuracy aware reward design, such an computation can well adapt to the complex environment like dynamic wireless channel and arbitrary processing, and is capable of supporting the 5G URL
arXiv Detail & Related papers (2022-01-09T09:31:50Z) - SECDA: Efficient Hardware/Software Co-Design of FPGA-based DNN
Accelerators for Edge Inference [0.0]
We propose SECDA, a new hardware/software co-design methodology to reduce design time of optimized Deep Neural Networks (DNN) inference accelerators on edge devices with FPGAs.
We use SECDA to efficiently develop two different DNN accelerator designs on a PYNQ-Z1 board, a platform that includes an edge FPGA.
We evaluate the two accelerator designs with four common DNN models, achieving an average performance speedup across models of up to 3.5$times$ with a 2.9$times$ reduction in energy consumption over CPU-only inference.
arXiv Detail & Related papers (2021-10-01T15:20:29Z) - Systolic-CNN: An OpenCL-defined Scalable Run-time-flexible FPGA
Accelerator Architecture for Accelerating Convolutional Neural Network
Inference in Cloud/Edge Computing [8.826181951806928]
Systolic-CNN is an OpenCL-defined scalable, run-time-flexible FPGA accelerator architecture.
Systolic-CNN is optimized for accelerating the inference of various convolutional neural networks (CNNs) in multi-tenancy cloud/edge computing.
arXiv Detail & Related papers (2020-12-06T03:53:11Z) - CNN2Gate: Toward Designing a General Framework for Implementation of
Convolutional Neural Networks on FPGA [0.3655021726150368]
This paper introduces an integrated framework that supports compilation of a CNN model for an FPGA target.
CNN2Gate exploits the OpenCL synthesis workflow for FPGAs offered by commercial vendors.
This paper reports results of automatic synthesis and design-space exploration of AlexNet and VGG-16 on various Intel FPGA platforms.
arXiv Detail & Related papers (2020-04-06T01:57:53Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.