Reconfigurable Distributed FPGA Cluster Design for Deep Learning
Accelerators
- URL: http://arxiv.org/abs/2305.18332v1
- Date: Wed, 24 May 2023 16:08:55 GMT
- Title: Reconfigurable Distributed FPGA Cluster Design for Deep Learning
Accelerators
- Authors: Hans Johnson, Tianyang Fang, Alejandro Perez-Vicente, and Jafar Saniie
- Abstract summary: We propose a distributed system based on lowpower embedded FPGAs designed for edge computing applications.
The proposed system can simultaneously execute diverse Neural Network (NN) models, arrange the graph in a pipeline structure, and manually allocate greater resources to the most computationally intensive layers of the NN graph.
- Score: 59.11160990637615
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We propose a distributed system based on lowpower embedded FPGAs designed for
edge computing applications focused on exploring distributing scheduling
optimizations for Deep Learning (DL) workloads to obtain the best performance
regarding latency and power efficiency. Our cluster was modular throughout the
experiment, and we have implementations that consist of up to 12 Zynq-7020
chip-based boards as well as 5 UltraScale+ MPSoC FPGA boards connected through
an ethernet switch, and the cluster will evaluate configurable Deep Learning
Accelerator (DLA) Versatile Tensor Accelerator (VTA). This adaptable
distributed architecture is distinguished by its capacity to evaluate and
manage neural network workloads in numerous configurations which enables users
to conduct multiple experiments tailored to their specific application needs.
The proposed system can simultaneously execute diverse Neural Network (NN)
models, arrange the computation graph in a pipeline structure, and manually
allocate greater resources to the most computationally intensive layers of the
NN graph.
Related papers
- LUTMUL: Exceed Conventional FPGA Roofline Limit by LUT-based Efficient Multiplication for Neural Network Inference [25.342107763021147]
This paper introduces LUTMUL, which harnesses the potential of look-up tables (LUTs) for performing multiplications.
By exploiting this advantage of LUTs, our method demonstrates a potential boost in the performance of FPGA-based neural network accelerators.
arXiv Detail & Related papers (2024-11-01T02:54:11Z) - FusionLLM: A Decentralized LLM Training System on Geo-distributed GPUs with Adaptive Compression [55.992528247880685]
Decentralized training faces significant challenges regarding system design and efficiency.
We present FusionLLM, a decentralized training system designed and implemented for training large deep neural networks (DNNs)
We show that our system and method can achieve 1.45 - 9.39x speedup compared to baseline methods while ensuring convergence.
arXiv Detail & Related papers (2024-10-16T16:13:19Z) - Harnessing FPGA Technology for Enhanced Biomedical Computation [0.0]
This research delves into sophisticated neural network frameworks like CNN, Recurrent Neural Networks (RNN), Long Short-Term Memory Networks (LSTMs), and Deep Belief Networks (DBNs)
By evaluating performance indicators like latency and throughput, we showcase the efficacy of FPGAs in advanced biomedical computing.
arXiv Detail & Related papers (2023-11-21T08:51:58Z) - Exploiting FPGA Capabilities for Accelerated Biomedical Computing [0.0]
This study presents advanced neural network architectures for enhanced ECG signal analysis using Field Programmable Gate Arrays (FPGAs)
We utilize the MIT-BIH Arrhythmia Database for training and validation, introducing Gaussian noise to improve robustness.
The study ultimately offers a guide for optimizing neural network performance on FPGAs for various applications.
arXiv Detail & Related papers (2023-07-16T01:20:17Z) - Partitioning Distributed Compute Jobs with Reinforcement Learning and
Graph Neural Networks [58.720142291102135]
Large-scale machine learning models are bringing advances to a broad range of fields.
Many of these models are too large to be trained on a single machine, and must be distributed across multiple devices.
We show that maximum parallelisation is sub-optimal in relation to user-critical metrics such as throughput and blocking rate.
arXiv Detail & Related papers (2023-01-31T17:41:07Z) - Training Spiking Neural Networks with Local Tandem Learning [96.32026780517097]
Spiking neural networks (SNNs) are shown to be more biologically plausible and energy efficient than their predecessors.
In this paper, we put forward a generalized learning rule, termed Local Tandem Learning (LTL)
We demonstrate rapid network convergence within five training epochs on the CIFAR-10 dataset while having low computational complexity.
arXiv Detail & Related papers (2022-10-10T10:05:00Z) - FPGA-based AI Smart NICs for Scalable Distributed AI Training Systems [62.20308752994373]
We propose a new smart network interface card (NIC) for distributed AI training systems using field-programmable gate arrays (FPGAs)
Our proposed FPGA-based AI smart NIC enhances overall training performance by 1.6x at 6 nodes, with an estimated 2.5x performance improvement at 32 nodes, compared to the baseline system using conventional NICs.
arXiv Detail & Related papers (2022-04-22T21:57:00Z) - N3H-Core: Neuron-designed Neural Network Accelerator via FPGA-based
Heterogeneous Computing Cores [26.38812379700231]
We develop an FPGA-based heterogeneous computing system for neural network acceleration.
The proposed accelerator consists of DSP- and LUT-based GEneral Matrix-Multiplication (GEMM) computing cores.
Our design outperforms the state-of-the-art Mix&Match design with latency reduced by 1.12-1.32x with higher inference accuracy.
arXiv Detail & Related papers (2021-12-15T15:12:00Z) - Deep Multi-Task Learning for Cooperative NOMA: System Design and
Principles [52.79089414630366]
We develop a novel deep cooperative NOMA scheme, drawing upon the recent advances in deep learning (DL)
We develop a novel hybrid-cascaded deep neural network (DNN) architecture such that the entire system can be optimized in a holistic manner.
arXiv Detail & Related papers (2020-07-27T12:38:37Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.