Related papers: Reconfigurable Distributed FPGA Cluster Design for Deep Learning Accelerators

Reconfigurable Distributed FPGA Cluster Design for Deep Learning Accelerators

URL: http://arxiv.org/abs/2305.18332v1
Date: Wed, 24 May 2023 16:08:55 GMT
Title: Reconfigurable Distributed FPGA Cluster Design for Deep Learning Accelerators
Authors: Hans Johnson, Tianyang Fang, Alejandro Perez-Vicente, and Jafar Saniie
Abstract summary: We propose a distributed system based on lowpower embedded FPGAs designed for edge computing applications. The proposed system can simultaneously execute diverse Neural Network (NN) models, arrange the graph in a pipeline structure, and manually allocate greater resources to the most computationally intensive layers of the NN graph.
Score: 59.11160990637615
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: We propose a distributed system based on lowpower embedded FPGAs designed for edge computing applications focused on exploring distributing scheduling optimizations for Deep Learning (DL) workloads to obtain the best performance regarding latency and power efficiency. Our cluster was modular throughout the experiment, and we have implementations that consist of up to 12 Zynq-7020 chip-based boards as well as 5 UltraScale+ MPSoC FPGA boards connected through an ethernet switch, and the cluster will evaluate configurable Deep Learning Accelerator (DLA) Versatile Tensor Accelerator (VTA). This adaptable distributed architecture is distinguished by its capacity to evaluate and manage neural network workloads in numerous configurations which enables users to conduct multiple experiments tailored to their specific application needs. The proposed system can simultaneously execute diverse Neural Network (NN) models, arrange the computation graph in a pipeline structure, and manually allocate greater resources to the most computationally intensive layers of the NN graph.

Related papers

LUTMUL: Exceed Conventional FPGA Roofline Limit by LUT-based Efficient Multiplication for Neural Network Inference [25.342107763021147]
This paper introduces LUTMUL, which harnesses the potential of look-up tables (LUTs) for performing multiplications. By exploiting this advantage of LUTs, our method demonstrates a potential boost in the performance of FPGA-based neural network accelerators.
arXiv Detail & Related papers (2024-11-01T02:54:11Z)
FusionLLM: A Decentralized LLM Training System on Geo-distributed GPUs with Adaptive Compression [55.992528247880685]
Decentralized training faces significant challenges regarding system design and efficiency. We present FusionLLM, a decentralized training system designed and implemented for training large deep neural networks (DNNs) We show that our system and method can achieve 1.45 - 9.39x speedup compared to baseline methods while ensuring convergence.
arXiv Detail & Related papers (2024-10-16T16:13:19Z)
Harnessing FPGA Technology for Enhanced Biomedical Computation [0.0]
This research delves into sophisticated neural network frameworks like CNN, Recurrent Neural Networks (RNN), Long Short-Term Memory Networks (LSTMs), and Deep Belief Networks (DBNs) By evaluating performance indicators like latency and throughput, we showcase the efficacy of FPGAs in advanced biomedical computing.
arXiv Detail & Related papers (2023-11-21T08:51:58Z)
Exploiting FPGA Capabilities for Accelerated Biomedical Computing [0.0]
This study presents advanced neural network architectures for enhanced ECG signal analysis using Field Programmable Gate Arrays (FPGAs) We utilize the MIT-BIH Arrhythmia Database for training and validation, introducing Gaussian noise to improve robustness. The study ultimately offers a guide for optimizing neural network performance on FPGAs for various applications.
arXiv Detail & Related papers (2023-07-16T01:20:17Z)
Partitioning Distributed Compute Jobs with Reinforcement Learning and Graph Neural Networks [58.720142291102135]
Large-scale machine learning models are bringing advances to a broad range of fields. Many of these models are too large to be trained on a single machine, and must be distributed across multiple devices. We show that maximum parallelisation is sub-optimal in relation to user-critical metrics such as throughput and blocking rate.
arXiv Detail & Related papers (2023-01-31T17:41:07Z)
Training Spiking Neural Networks with Local Tandem Learning [96.32026780517097]
Spiking neural networks (SNNs) are shown to be more biologically plausible and energy efficient than their predecessors. In this paper, we put forward a generalized learning rule, termed Local Tandem Learning (LTL) We demonstrate rapid network convergence within five training epochs on the CIFAR-10 dataset while having low computational complexity.
arXiv Detail & Related papers (2022-10-10T10:05:00Z)
FPGA-based AI Smart NICs for Scalable Distributed AI Training Systems [62.20308752994373]
We propose a new smart network interface card (NIC) for distributed AI training systems using field-programmable gate arrays (FPGAs) Our proposed FPGA-based AI smart NIC enhances overall training performance by 1.6x at 6 nodes, with an estimated 2.5x performance improvement at 32 nodes, compared to the baseline system using conventional NICs.
arXiv Detail & Related papers (2022-04-22T21:57:00Z)
N3H-Core: Neuron-designed Neural Network Accelerator via FPGA-based Heterogeneous Computing Cores [26.38812379700231]
We develop an FPGA-based heterogeneous computing system for neural network acceleration. The proposed accelerator consists of DSP- and LUT-based GEneral Matrix-Multiplication (GEMM) computing cores. Our design outperforms the state-of-the-art Mix&Match design with latency reduced by 1.12-1.32x with higher inference accuracy.
arXiv Detail & Related papers (2021-12-15T15:12:00Z)
Deep Multi-Task Learning for Cooperative NOMA: System Design and Principles [52.79089414630366]
We develop a novel deep cooperative NOMA scheme, drawing upon the recent advances in deep learning (DL) We develop a novel hybrid-cascaded deep neural network (DNN) architecture such that the entire system can be optimized in a holistic manner.
arXiv Detail & Related papers (2020-07-27T12:38:37Z)

This list is automatically generated from the titles and abstracts of the papers in this site.