Related papers: Hardware-Efficient Deconvolution-Based GAN for Edge Computing

Hardware-Efficient Deconvolution-Based GAN for Edge Computing

URL: http://arxiv.org/abs/2201.06878v1
Date: Tue, 18 Jan 2022 11:16:59 GMT
Title: Hardware-Efficient Deconvolution-Based GAN for Edge Computing
Authors: Azzam Alhussain and Mingjie Lin
Abstract summary: Generative Adversarial Networks (GAN) are cutting-edge algorithms for generating new data samples based on the learned data distribution. We proposed an HW/SW co-design approach for training quantized deconvolution GAN (QDCGAN) implemented on FPGA using a scalable streaming dataflow architecture. Various precisions, datasets, and network scalability were analyzed for low-power inference on resource-constrained platforms.
Score: 1.5229257192293197
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Generative Adversarial Networks (GAN) are cutting-edge algorithms for generating new data samples based on the learned data distribution. However, its performance comes at a significant cost in terms of computation and memory requirements. In this paper, we proposed an HW/SW co-design approach for training quantized deconvolution GAN (QDCGAN) implemented on FPGA using a scalable streaming dataflow architecture capable of achieving higher throughput versus resource utilization trade-off. The developed accelerator is based on an efficient deconvolution engine that offers high parallelism with respect to scaling factors for GAN-based edge computing. Furthermore, various precisions, datasets, and network scalability were analyzed for low-power inference on resource-constrained platforms. Lastly, an end-to-end open-source framework is provided for training, implementation, state-space exploration, and scaling the inference using Vivado high-level synthesis for Xilinx SoC-FPGAs, and a comparison testbed with Jetson Nano.

Related papers

Implémentation Efficiente de Fonctions de Convolution sur FPGA à l'Aide de Blocs Paramétrables et d'Approximations Polynomiales [0.3966519779235704]
Implementing convolutional neural networks (CNNs) on field-programmable gate arrays (FPGAs) has emerged as a promising alternative to GPUs.<n>This paper proposes a library of convolution Blocks designed to optimize FPGA implementation and adapt to available resources.<n>It also presents a methodological framework for developing mathematical models that predict FPGA resources utilization.
arXiv Detail & Related papers (2025-10-03T15:58:20Z)
Performance Analysis of Convolutional Neural Network By Applying Unconstrained Binary Quadratic Programming [0.0]
Convolutional Neural Networks (CNNs) are pivotal in computer vision and Big Data analytics but demand significant computational resources when trained on large-scale datasets.<n>We propose a hybrid optimization method that combines Unconstrained Binary Quadratic Programming (UBQP) with Gradient Descent (SGD) to accelerate CNN training.<n>Our approach achieves a 10--15% accuracy improvement over a standard BP-CNN baseline while maintaining similar execution times.
arXiv Detail & Related papers (2025-05-30T21:25:31Z)
FPGA-based Acceleration for Convolutional Neural Networks: A Comprehensive Review [3.7810245817090906]
Convolutional Neural Networks (CNNs) are fundamental to deep learning, driving applications across various domains.<n>This paper provides a comprehensive review of FPGA-based hardware accelerators specifically designed for CNNs.
arXiv Detail & Related papers (2025-05-04T04:03:37Z)
Task-Oriented Real-time Visual Inference for IoVT Systems: A Co-design Framework of Neural Networks and Edge Deployment [61.20689382879937]
Task-oriented edge computing addresses this by shifting data analysis to the edge. Existing methods struggle to balance high model performance with low resource consumption. We propose a novel co-design framework to optimize neural network architecture.
arXiv Detail & Related papers (2024-10-29T19:02:54Z)
Exploiting FPGA Capabilities for Accelerated Biomedical Computing [0.0]
This study presents advanced neural network architectures for enhanced ECG signal analysis using Field Programmable Gate Arrays (FPGAs) We utilize the MIT-BIH Arrhythmia Database for training and validation, introducing Gaussian noise to improve robustness. The study ultimately offers a guide for optimizing neural network performance on FPGAs for various applications.
arXiv Detail & Related papers (2023-07-16T01:20:17Z)
Reconfigurable Distributed FPGA Cluster Design for Deep Learning Accelerators [59.11160990637615]
We propose a distributed system based on lowpower embedded FPGAs designed for edge computing applications. The proposed system can simultaneously execute diverse Neural Network (NN) models, arrange the graph in a pipeline structure, and manually allocate greater resources to the most computationally intensive layers of the NN graph.
arXiv Detail & Related papers (2023-05-24T16:08:55Z)
Vertical Federated Learning over Cloud-RAN: Convergence Analysis and System Optimization [82.12796238714589]
We propose a novel cloud radio access network (Cloud-RAN) based vertical FL system to enable fast and accurate model aggregation. We characterize the convergence behavior of the vertical FL algorithm considering both uplink and downlink transmissions. We establish a system optimization framework by joint transceiver and fronthaul quantization design, for which successive convex approximation and alternate convex search based system optimization algorithms are developed.
arXiv Detail & Related papers (2023-05-04T09:26:03Z)
Energy-efficient Task Adaptation for NLP Edge Inference Leveraging Heterogeneous Memory Architectures [68.91874045918112]
adapter-ALBERT is an efficient model optimization for maximal data reuse across different tasks. We demonstrate the advantage of mapping the model to a heterogeneous on-chip memory architecture by performing simulations on a validated NLP edge accelerator.
arXiv Detail & Related papers (2023-03-25T14:40:59Z)
An Adaptive Device-Edge Co-Inference Framework Based on Soft Actor-Critic [72.35307086274912]
High-dimension parameter model and large-scale mathematical calculation restrict execution efficiency, especially for Internet of Things (IoT) devices. We propose a new Deep Reinforcement Learning (DRL)-Soft Actor Critic for discrete (SAC-d), which generates the emphexit point, emphexit point, and emphcompressing bits by soft policy iterations. Based on the latency and accuracy aware reward design, such an computation can well adapt to the complex environment like dynamic wireless channel and arbitrary processing, and is capable of supporting the 5G URL
arXiv Detail & Related papers (2022-01-09T09:31:50Z)
N3H-Core: Neuron-designed Neural Network Accelerator via FPGA-based Heterogeneous Computing Cores [26.38812379700231]
We develop an FPGA-based heterogeneous computing system for neural network acceleration. The proposed accelerator consists of DSP- and LUT-based GEneral Matrix-Multiplication (GEMM) computing cores. Our design outperforms the state-of-the-art Mix&Match design with latency reduced by 1.12-1.32x with higher inference accuracy.
arXiv Detail & Related papers (2021-12-15T15:12:00Z)
A Graph Deep Learning Framework for High-Level Synthesis Design Space Exploration [11.154086943903696]
High-Level Synthesis is a solution for fast prototyping application-specific hardware. We propose HLS, for the first time in the literature, graph neural networks that jointly predict acceleration performance and hardware costs. We show that our approach achieves prediction accuracy comparable with that of commonly used simulators.
arXiv Detail & Related papers (2021-11-29T18:17:45Z)
Fully-parallel Convolutional Neural Network Hardware [0.7829352305480285]
We propose a new power-and-area-efficient architecture for implementing Articial Neural Networks (ANNs) in hardware. For the first time, a fully-parallel CNN as LENET-5 is embedded and tested in a single FPGA.
arXiv Detail & Related papers (2020-06-22T17:19:09Z)
Large-Scale Gradient-Free Deep Learning with Recursive Local Representation Alignment [84.57874289554839]
Training deep neural networks on large-scale datasets requires significant hardware resources. Backpropagation, the workhorse for training these networks, is an inherently sequential process that is difficult to parallelize. We propose a neuro-biologically-plausible alternative to backprop that can be used to train deep networks.
arXiv Detail & Related papers (2020-02-10T16:20:02Z)
GraphACT: Accelerating GCN Training on CPU-FPGA Heterogeneous Platforms [1.2183405753834562]
Graph Convolutional Networks (GCNs) have emerged as the state-of-the-art deep learning model for representation learning on graphs. It is challenging to accelerate training of GCNs due to substantial and irregular data communication. We design a novel accelerator for training GCNs on CPU-FPGA heterogeneous systems.
arXiv Detail & Related papers (2019-12-31T21:19:01Z)

This list is automatically generated from the titles and abstracts of the papers in this site.