Related papers: An FPGA Accelerated Method for Training Feed-forward Neural Networks Using Alternating Direction Method of Multipliers and LSMR

An FPGA Accelerated Method for Training Feed-forward Neural Networks Using Alternating Direction Method of Multipliers and LSMR

URL: http://arxiv.org/abs/2009.02784v1
Date: Sun, 6 Sep 2020 17:33:03 GMT
Title: An FPGA Accelerated Method for Training Feed-forward Neural Networks Using Alternating Direction Method of Multipliers and LSMR
Authors: Seyedeh Niusha Alavi Foumani, Ce Guo, Wayne Luk
Abstract summary: We have successfully designed, implemented, deployed and tested a novel FPGA accelerated algorithm for neural network training. The training method is based on Alternating Direction Method of Multipliers algorithm, which has strong parallel characteristics. We devised an FPGA accelerated version of the algorithm using Intel FPGA SDK for OpenCL and performed extensive stages followed by successful deployment of the program on an Intel Arria 10 GX FPGA.
Score: 2.8747398859585376
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: In this project, we have successfully designed, implemented, deployed and tested a novel FPGA accelerated algorithm for neural network training. The algorithm itself was developed in an independent study option. This training method is based on Alternating Direction Method of Multipliers algorithm, which has strong parallel characteristics and avoids procedures such as matrix inversion that are problematic in hardware designs by employing LSMR. As an intermediate stage, we fully implemented the ADMM-LSMR method in C language for feed-forward neural networks with a flexible number of layers and hidden size. We demonstrated that the method can operate with fixed-point arithmetic without compromising the accuracy. Next, we devised an FPGA accelerated version of the algorithm using Intel FPGA SDK for OpenCL and performed extensive optimisation stages followed by successful deployment of the program on an Intel Arria 10 GX FPGA. The FPGA accelerated program showed up to 6 times speed up comparing to equivalent CPU implementation while achieving promising accuracy.

Related papers

Progressive Mixed-Precision Decoding for Efficient LLM Inference [49.05448842542558]
We introduce Progressive Mixed-Precision Decoding (PMPD) to address the memory-boundedness of decoding. PMPD achieves 1.4$-$12.2$times$ speedup in matrix-vector multiplications over fp16 models. Our approach delivers a throughput gain of 3.8$-$8.0$times$ over fp16 models and up to 1.54$times$ over uniform quantization approaches.
arXiv Detail & Related papers (2024-10-17T11:46:33Z)
Many-body computing on Field Programmable Gate Arrays [5.612626580467746]
We leverage the capabilities of Field Programmable Gate Arrays (FPGAs) for conducting quantum many-body calculations. This has resulted in a remarkable tenfold speedup compared to CPU-based computation.
arXiv Detail & Related papers (2024-02-09T14:01:02Z)
Exploiting FPGA Capabilities for Accelerated Biomedical Computing [0.0]
This study presents advanced neural network architectures for enhanced ECG signal analysis using Field Programmable Gate Arrays (FPGAs) We utilize the MIT-BIH Arrhythmia Database for training and validation, introducing Gaussian noise to improve robustness. The study ultimately offers a guide for optimizing neural network performance on FPGAs for various applications.
arXiv Detail & Related papers (2023-07-16T01:20:17Z)
Reconfigurable Distributed FPGA Cluster Design for Deep Learning Accelerators [59.11160990637615]
We propose a distributed system based on lowpower embedded FPGAs designed for edge computing applications. The proposed system can simultaneously execute diverse Neural Network (NN) models, arrange the graph in a pipeline structure, and manually allocate greater resources to the most computationally intensive layers of the NN graph.
arXiv Detail & Related papers (2023-05-24T16:08:55Z)
FPGA-based AI Smart NICs for Scalable Distributed AI Training Systems [62.20308752994373]
We propose a new smart network interface card (NIC) for distributed AI training systems using field-programmable gate arrays (FPGAs) Our proposed FPGA-based AI smart NIC enhances overall training performance by 1.6x at 6 nodes, with an estimated 2.5x performance improvement at 32 nodes, compared to the baseline system using conventional NICs.
arXiv Detail & Related papers (2022-04-22T21:57:00Z)
Accelerated Charged Particle Tracking with Graph Neural Networks on FPGAs [0.0]
We develop and study FPGA implementations of algorithms for charged particle tracking based on graph neural networks. We find a considerable speedup over CPU-based execution is possible, potentially enabling such algorithms to be used effectively in future computing.
arXiv Detail & Related papers (2020-11-30T18:17:43Z)
AIN: Fast and Accurate Sequence Labeling with Approximate Inference Network [75.44925576268052]
The linear-chain Conditional Random Field (CRF) model is one of the most widely-used neural sequence labeling approaches. Exact probabilistic inference algorithms are typically applied in training and prediction stages of the CRF model. We propose to employ a parallelizable approximate variational inference algorithm for the CRF model.
arXiv Detail & Related papers (2020-09-17T12:18:43Z)
Iterative Algorithm Induced Deep-Unfolding Neural Networks: Precoding Design for Multiuser MIMO Systems [59.804810122136345]
We propose a framework for deep-unfolding, where a general form of iterative algorithm induced deep-unfolding neural network (IAIDNN) is developed. An efficient IAIDNN based on the structure of the classic weighted minimum mean-square error (WMMSE) iterative algorithm is developed. We show that the proposed IAIDNN efficiently achieves the performance of the iterative WMMSE algorithm with reduced computational complexity.
arXiv Detail & Related papers (2020-06-15T02:57:57Z)
An FPGA-Based On-Device Reinforcement Learning Approach using Online Sequential Learning [2.99321624683618]
We propose a lightweight on-device reinforcement learning approach for low-cost FPGA devices. It exploits a recently proposed neural-network based on-device learning approach that does not rely on the backpropagation method. The proposed reinforcement learning approach is designed for PYNQ-Z1 board as a low-cost FPGA platform.
arXiv Detail & Related papers (2020-05-10T12:37:26Z)
Minimal Filtering Algorithms for Convolutional Neural Networks [82.24592140096622]
We develop fully parallel hardware-oriented algorithms for implementing the basic filtering operation for M=3,5,7,9, and 11. A fully parallel hardware implementation of the proposed algorithms in each case gives approximately 30 percent savings in the number of embedded multipliers.
arXiv Detail & Related papers (2020-04-12T13:18:25Z)
A Supervised Learning Algorithm for Multilayer Spiking Neural Networks Based on Temporal Coding Toward Energy-Efficient VLSI Processor Design [2.6872737601772956]
Spiking neural networks (SNNs) are brain-inspired mathematical models with the ability to process information in the form of spikes. We propose a novel supervised learning algorithm for SNNs based on temporal coding.
arXiv Detail & Related papers (2020-01-08T03:37:08Z)

This list is automatically generated from the titles and abstracts of the papers in this site.