Design optimization for high-performance computing using FPGA
- URL: http://arxiv.org/abs/2304.12474v1
- Date: Mon, 24 Apr 2023 22:20:42 GMT
- Title: Design optimization for high-performance computing using FPGA
- Authors: Murat Isik, Kayode Inadagbo, Hakan Aktas
- Abstract summary: We optimize Tensil AI's open-source inference accelerator for maximum performance using ResNet20 trained on CIFAR.
Running the CIFAR test data set shows very little accuracy drop when rounding down from the original 32-bit floating point.
The proposed accelerator achieves a throughput of 21.12 Giga-Operations Per Second (GOP/s) with a 5.21 W on-chip power consumption at 100 MHz.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Reconfigurable architectures like Field Programmable Gate Arrays (FPGAs) have
been used for accelerating computations in several domains because of their
unique combination of flexibility, performance, and power efficiency. However,
FPGAs have not been widely used for high-performance computing, primarily
because of their programming complexity and difficulties in optimizing
performance. We optimize Tensil AI's open-source inference accelerator for
maximum performance using ResNet20 trained on CIFAR in this paper in order to
gain insight into the use of FPGAs for high-performance computing. In this
paper, we show how improving hardware design, using Xilinx Ultra RAM, and using
advanced compiler strategies can lead to improved inference performance. We
also demonstrate that running the CIFAR test data set shows very little
accuracy drop when rounding down from the original 32-bit floating point. The
heterogeneous computing model in our platform allows us to achieve a frame rate
of 293.58 frames per second (FPS) and a %90 accuracy on a ResNet20 trained
using CIFAR. The experimental results show that the proposed accelerator
achieves a throughput of 21.12 Giga-Operations Per Second (GOP/s) with a 5.21 W
on-chip power consumption at 100 MHz. The comparison results with off-the-shelf
devices and recent state-of-the-art implementations illustrate that the
proposed accelerator has obvious advantages in terms of energy efficiency.
Related papers
- EPS-MoE: Expert Pipeline Scheduler for Cost-Efficient MoE Inference [49.94169109038806]
This paper introduces EPS-MoE, a novel expert pipeline scheduler for MoE.
Our results demonstrate an average 21% improvement in prefill throughput over existing parallel inference methods.
arXiv Detail & Related papers (2024-10-16T05:17:49Z) - Quasar-ViT: Hardware-Oriented Quantization-Aware Architecture Search for Vision Transformers [56.37495946212932]
Vision transformers (ViTs) have demonstrated their superior accuracy for computer vision tasks compared to convolutional neural networks (CNNs)
This work proposes Quasar-ViT, a hardware-oriented quantization-aware architecture search framework for ViTs.
arXiv Detail & Related papers (2024-07-25T16:35:46Z) - SWAT: Scalable and Efficient Window Attention-based Transformers Acceleration on FPGAs [3.302913401404089]
Sliding window-based static sparse attention mitigates the problem by limiting the attention scope of the input tokens.
We propose a dataflow-aware FPGA-based accelerator design, SWAT, that efficiently leverages the sparsity to achieve scalable performance for long input.
arXiv Detail & Related papers (2024-05-27T10:25:08Z) - Understanding the Potential of FPGA-Based Spatial Acceleration for Large Language Model Inference [11.614722231006695]
Large language models (LLMs) boasting billions of parameters have generated a significant demand for efficient deployment in inference workloads.
This paper investigates the feasibility and potential of model-specific spatial acceleration for LLM inference on FPGAs.
arXiv Detail & Related papers (2023-12-23T04:27:06Z) - Reconfigurable Distributed FPGA Cluster Design for Deep Learning
Accelerators [59.11160990637615]
We propose a distributed system based on lowpower embedded FPGAs designed for edge computing applications.
The proposed system can simultaneously execute diverse Neural Network (NN) models, arrange the graph in a pipeline structure, and manually allocate greater resources to the most computationally intensive layers of the NN graph.
arXiv Detail & Related papers (2023-05-24T16:08:55Z) - HARFLOW3D: A Latency-Oriented 3D-CNN Accelerator Toolflow for HAR on
FPGA Devices [71.45672882756001]
This study introduces a novel streaming architecture based toolflow for mapping 3D Convolutional Neural Networks onto FPGAs.
The HARFLOW3D toolflow takes as input a 3D CNN in ONNX format and a description of the FPGA characteristics.
The ability of the toolflow to support a broad range of models and devices is shown through a number of experiments on various 3D CNN and FPGA system pairs.
arXiv Detail & Related papers (2023-03-30T08:25:27Z) - An Energy-Efficient Reconfigurable Autoencoder Implementation on FPGA [5.457842083043013]
We look at the different autoencoders available and use the convolutional autoencoder in both FPGA and GPU-based implementations to process noisy static MNIST images.
The evaluation of the proposed design achieved 80%accuracy and our experimental results show that the proposed accelerator achieves a throughput of 21.12 Giga-Operations Per Second (GOP/s) with a 5.93 W on-chip power consumption at 100 MHz.
arXiv Detail & Related papers (2023-01-17T18:04:05Z) - Optimization of FPGA-based CNN Accelerators Using Metaheuristics [1.854931308524932]
convolutional neural networks (CNNs) have demonstrated their ability to solve problems in many fields.
FPGAs have seen a surge in interest for accelerating CNN inference.
Current trend in FPGA-based CNN accelerators is to implement multiple convolutional layer processors (CLPs)
arXiv Detail & Related papers (2022-09-22T18:57:49Z) - Adaptable Butterfly Accelerator for Attention-based NNs via Hardware and
Algorithm Co-design [66.39546326221176]
Attention-based neural networks have become pervasive in many AI tasks.
The use of the attention mechanism and feed-forward network (FFN) demands excessive computational and memory resources.
This paper proposes a hardware-friendly variant that adopts a unified butterfly sparsity pattern to approximate both the attention mechanism and the FFNs.
arXiv Detail & Related papers (2022-09-20T09:28:26Z) - Auto-ViT-Acc: An FPGA-Aware Automatic Acceleration Framework for Vision
Transformer with Mixed-Scheme Quantization [78.18328503396057]
Vision transformers (ViTs) are emerging with significantly improved accuracy in computer vision tasks.
This work proposes an FPGA-aware automatic ViT acceleration framework based on the proposed mixed-scheme quantization.
arXiv Detail & Related papers (2022-08-10T05:54:46Z) - FTRANS: Energy-Efficient Acceleration of Transformers using FPGA [11.032972017827248]
We propose an efficient acceleration framework, Ftrans, for transformer-based large scale language representations.
Our framework significantly reduces the model size of NLP models by up to 16 times.
Our FPGA design achieves 27.07x and 81x improvement in performance and energy efficiency compared to CPU, and up to 8.80x improvement in energy efficiency compared to GPU.
arXiv Detail & Related papers (2020-07-16T18:58:31Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.