Related papers: FPGA or GPU? Analyzing comparative research for application-specific guidance

FPGA or GPU? Analyzing comparative research for application-specific guidance

URL: http://arxiv.org/abs/2511.06565v1
Date: Sun, 09 Nov 2025 22:54:28 GMT
Title: FPGA or GPU? Analyzing comparative research for application-specific guidance
Authors: Arnab A Purkayastha, Jay Tharwani, Shobhit Aggarwal,
Abstract summary: This paper synthesizes insights from various research articles to guide users in selecting the appropriate accelerator for domain-specific applications.<n>By categorizing the reviewed studies and analyzing key performance metrics, this work highlights the strengths, limitations, and ideal use cases for FPGAs and GPU.<n>The findings offer actionable recommendations, helping researchers and practitioners navigate trade-offs in performance, energy efficiency, and programmability.
Score: 0.0
License: http://creativecommons.org/licenses/by/4.0/
Abstract: The growing complexity of computational workloads has amplified the need for efficient and specialized hardware accelerators. Field Programmable Gate Arrays (FPGAs) and Graphics Processing Units (GPUs) have emerged as prominent solutions, each excelling in specific domains. Although there is substantial research comparing FPGAs and GPUs, most of the work focuses primarily on performance metrics, offering limited insight into the specific types of applications that each accelerator benefits the most. This paper aims to bridge this gap by synthesizing insights from various research articles to guide users in selecting the appropriate accelerator for domain-specific applications. By categorizing the reviewed studies and analyzing key performance metrics, this work highlights the strengths, limitations, and ideal use cases for FPGAs and GPUs. The findings offer actionable recommendations, helping researchers and practitioners navigate trade-offs in performance, energy efficiency, and programmability.

Related papers

GPU-Accelerated Algorithms for Graph Vector Search: Taxonomy, Empirical Study, and Research Directions [54.570944939061555]
We present a comprehensive study of GPU-accelerated graph-based vector search algorithms.<n>We establish a detailed taxonomy of GPU optimization strategies and clarify the mapping between algorithmic tasks and hardware execution units.<n>Our findings offer clear guidelines for designing scalable and robust GPU-powered approximate nearest neighbor search systems.
arXiv Detail & Related papers (2026-02-10T16:18:04Z)
FPGA-based Acceleration for Convolutional Neural Networks: A Comprehensive Review [3.7810245817090906]
Convolutional Neural Networks (CNNs) are fundamental to deep learning, driving applications across various domains.<n>This paper provides a comprehensive review of FPGA-based hardware accelerators specifically designed for CNNs.
arXiv Detail & Related papers (2025-05-04T04:03:37Z)
Real-Time Semantic Segmentation of Aerial Images Using an Embedded U-Net: A Comparison of CPU, GPU, and FPGA Workflows [0.0]
This study introduces a lightweight U-Net model optimized for real-time semantic segmentation of aerial images.<n>We maintain the accuracy of the U-Net on a real-world dataset while significantly reducing the model's parameters and Multiply-Accumulate (MAC) operations by a factor of 16.
arXiv Detail & Related papers (2025-03-07T08:33:28Z)
Navigating Intelligence: A Survey of Google OR-Tools and Machine Learning for Global Path Planning in Autonomous Vehicles [49.1574468325115]
Global path planning is essential for an autonomous mining sampling robot named ROMIE.<n>Q-Learning stands out as the optimal strategy, demonstrating superior efficiency by deviating only 1.2% on average from the optimal solutions across our datasets.
arXiv Detail & Related papers (2025-03-05T10:12:22Z)
Enhancing Dropout-based Bayesian Neural Networks with Multi-Exit on FPGA [20.629635991749808]
This paper proposes an algorithm and hardware co-design framework that can generate field-programmable gate array (FPGA)-based accelerators for efficient BayesNNs. At the algorithm level, we propose novel multi-exit dropout-based BayesNNs with reduced computational and memory overheads. At the hardware level, this paper introduces a transformation framework that can generate FPGA-based accelerators for the proposed efficient BayesNNs.
arXiv Detail & Related papers (2024-06-20T17:08:42Z)
The Feasibility of Implementing Large-Scale Transformers on Multi-FPGA Platforms [1.0636475069923585]
There is merit to exploring the use of multiple FPGAs for large machine learning applications. There is no commonly-accepted flow for developing and deploying multi-FPGA applications. We develop a scalable multi-FPGA platform and some tools to map large applications to the platform.
arXiv Detail & Related papers (2024-04-24T19:25:58Z)
Parameter-Efficient Fine-Tuning for Large Models: A Comprehensive Survey [18.00772798876708]
Efficient Fine-Tuning (PEFT) provides a practical solution by efficiently adjusting the large models over the various downstream tasks. PEFT refers to the process of adjusting the parameters of a pre-trained large model to adapt it to a specific task or domain. We present comprehensive studies of various PEFT algorithms, examining their performance and computational overhead.
arXiv Detail & Related papers (2024-03-21T17:55:50Z)
Performance Tuning for GPU-Embedded Systems: Machine-Learning-based and Analytical Model-driven Tuning Methodologies [0.0]
The study introduces an analytical model-driven tuning methodology and a Machine Learning (ML)-based tuning methodology. We evaluate the performance of the two tuning methodologies for different parallel prefix implementations of the BPLG library in an NVIDIA Jetson system.
arXiv Detail & Related papers (2023-10-24T22:09:03Z)
Reconfigurable Distributed FPGA Cluster Design for Deep Learning Accelerators [59.11160990637615]
We propose a distributed system based on lowpower embedded FPGAs designed for edge computing applications. The proposed system can simultaneously execute diverse Neural Network (NN) models, arrange the graph in a pipeline structure, and manually allocate greater resources to the most computationally intensive layers of the NN graph.
arXiv Detail & Related papers (2023-05-24T16:08:55Z)
Design optimization for high-performance computing using FPGA [0.0]
We optimize Tensil AI's open-source inference accelerator for maximum performance using ResNet20 trained on CIFAR. Running the CIFAR test data set shows very little accuracy drop when rounding down from the original 32-bit floating point. The proposed accelerator achieves a throughput of 21.12 Giga-Operations Per Second (GOP/s) with a 5.21 W on-chip power consumption at 100 MHz.
arXiv Detail & Related papers (2023-04-24T22:20:42Z)
Energy-efficient Task Adaptation for NLP Edge Inference Leveraging Heterogeneous Memory Architectures [68.91874045918112]
adapter-ALBERT is an efficient model optimization for maximal data reuse across different tasks. We demonstrate the advantage of mapping the model to a heterogeneous on-chip memory architecture by performing simulations on a validated NLP edge accelerator.
arXiv Detail & Related papers (2023-03-25T14:40:59Z)
Semantic Scene Segmentation for Robotics Applications [51.66271681532262]
We investigate the behavior of the most successful semantic scene segmentation models, in terms of deployment (inference) speed, under various setups. The target of this work is to provide a comparative study of current state-of-the-art segmentation models so as to select the most compliant with the robotics applications requirements.
arXiv Detail & Related papers (2021-08-25T08:55:20Z)
JUWELS Booster -- A Supercomputer for Large-Scale AI Research [79.02246047353273]
We present JUWELS Booster, a recently commissioned high-performance computing system at the J"ulich Supercomputing Center. We detail its system architecture, parallel, distributed model training, and benchmarks indicating its outstanding performance.
arXiv Detail & Related papers (2021-06-30T21:37:02Z)

This list is automatically generated from the titles and abstracts of the papers in this site.