High-Performance FPGA-based Accelerator for Bayesian Neural Networks
- URL: http://arxiv.org/abs/2105.09163v1
- Date: Wed, 12 May 2021 06:20:44 GMT
- Title: High-Performance FPGA-based Accelerator for Bayesian Neural Networks
- Authors: Hongxiang Fan, Martin Ferianc, Miguel Rodrigues, Hongyu Zhou, Xinyu
Niu and Wayne Luk
- Abstract summary: This work proposes a novel FPGA-based hardware architecture to accelerate BNNs inferred through Monte Carlo Dropout.
Compared with other state-of-the-art BNN accelerators, the proposed accelerator can achieve up to 4 times higher energy efficiency and 9 times better compute efficiency.
- Score: 5.86877988129171
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Neural networks (NNs) have demonstrated their potential in a wide range of
applications such as image recognition, decision making or recommendation
systems. However, standard NNs are unable to capture their model uncertainty
which is crucial for many safety-critical applications including healthcare and
autonomous vehicles. In comparison, Bayesian neural networks (BNNs) are able to
express uncertainty in their prediction via a mathematical grounding.
Nevertheless, BNNs have not been as widely used in industrial practice, mainly
because of their expensive computational cost and limited hardware performance.
This work proposes a novel FPGA-based hardware architecture to accelerate BNNs
inferred through Monte Carlo Dropout. Compared with other state-of-the-art BNN
accelerators, the proposed accelerator can achieve up to 4 times higher energy
efficiency and 9 times better compute efficiency. Considering partial Bayesian
inference, an automatic framework is proposed, which explores the trade-off
between hardware and algorithmic performance. Extensive experiments are
conducted to demonstrate that our proposed framework can effectively find the
optimal points in the design space.
Related papers
- Enhancing Dropout-based Bayesian Neural Networks with Multi-Exit on FPGA [20.629635991749808]
This paper proposes an algorithm and hardware co-design framework that can generate field-programmable gate array (FPGA)-based accelerators for efficient BayesNNs.
At the algorithm level, we propose novel multi-exit dropout-based BayesNNs with reduced computational and memory overheads.
At the hardware level, this paper introduces a transformation framework that can generate FPGA-based accelerators for the proposed efficient BayesNNs.
arXiv Detail & Related papers (2024-06-20T17:08:42Z) - When Monte-Carlo Dropout Meets Multi-Exit: Optimizing Bayesian Neural
Networks on FPGA [11.648544516949533]
We propose a novel multi-exit Monte-Carlo Dropout (MCD)-based BayesNN that achieves well-calibrated predictions with low algorithmic complexity.
Our experiments demonstrate that our auto-generated accelerator achieves higher energy efficiency than CPU, GPU, and other state-of-the-art hardware implementations.
arXiv Detail & Related papers (2023-08-13T21:42:31Z) - Reconfigurable Distributed FPGA Cluster Design for Deep Learning
Accelerators [59.11160990637615]
We propose a distributed system based on lowpower embedded FPGAs designed for edge computing applications.
The proposed system can simultaneously execute diverse Neural Network (NN) models, arrange the graph in a pipeline structure, and manually allocate greater resources to the most computationally intensive layers of the NN graph.
arXiv Detail & Related papers (2023-05-24T16:08:55Z) - End-to-end codesign of Hessian-aware quantized neural networks for FPGAs
and ASICs [49.358119307844035]
We develop an end-to-end workflow for the training and implementation of co-designed neural networks (NNs)
This makes efficient NN implementations in hardware accessible to nonexperts, in a single open-sourced workflow.
We demonstrate the workflow in a particle physics application involving trigger decisions that must operate at the 40 MHz collision rate of the Large Hadron Collider (LHC)
We implement an optimized mixed-precision NN for high-momentum particle jets in simulated LHC proton-proton collisions.
arXiv Detail & Related papers (2023-04-13T18:00:01Z) - DeepAxe: A Framework for Exploration of Approximation and Reliability
Trade-offs in DNN Accelerators [0.9556128246747769]
The role of Deep Neural Networks (DNNs) in safety-critical applications is expanding.
DNNs experience massive growth in terms of computation power.
It raises the necessity of improving the reliability of DNN accelerators.
arXiv Detail & Related papers (2023-03-14T20:42:38Z) - Comparative Analysis of Interval Reachability for Robust Implicit and
Feedforward Neural Networks [64.23331120621118]
We use interval reachability analysis to obtain robustness guarantees for implicit neural networks (INNs)
INNs are a class of implicit learning models that use implicit equations as layers.
We show that our approach performs at least as well as, and generally better than, applying state-of-the-art interval bound propagation methods to INNs.
arXiv Detail & Related papers (2022-04-01T03:31:27Z) - FPGA-optimized Hardware acceleration for Spiking Neural Networks [69.49429223251178]
This work presents the development of a hardware accelerator for an SNN, with off-line training, applied to an image recognition task.
The design targets a Xilinx Artix-7 FPGA, using in total around the 40% of the available hardware resources.
It reduces the classification time by three orders of magnitude, with a small 4.5% impact on the accuracy, if compared to its software, full precision counterpart.
arXiv Detail & Related papers (2022-01-18T13:59:22Z) - E3NE: An End-to-End Framework for Accelerating Spiking Neural Networks
with Emerging Neural Encoding on FPGAs [6.047137174639418]
End-to-end framework E3NE automates the generation of efficient SNN inference logic for FPGAs.
E3NE uses less than 50% of hardware resources and 20% less power, while reducing the latency by an order of magnitude.
arXiv Detail & Related papers (2021-11-19T04:01:19Z) - Sub-bit Neural Networks: Learning to Compress and Accelerate Binary
Neural Networks [72.81092567651395]
Sub-bit Neural Networks (SNNs) are a new type of binary quantization design tailored to compress and accelerate BNNs.
SNNs are trained with a kernel-aware optimization framework, which exploits binary quantization in the fine-grained convolutional kernel space.
Experiments on visual recognition benchmarks and the hardware deployment on FPGA validate the great potentials of SNNs.
arXiv Detail & Related papers (2021-10-18T11:30:29Z) - Quantized Neural Networks via {-1, +1} Encoding Decomposition and
Acceleration [83.84684675841167]
We propose a novel encoding scheme using -1, +1 to decompose quantized neural networks (QNNs) into multi-branch binary networks.
We validate the effectiveness of our method on large-scale image classification, object detection, and semantic segmentation tasks.
arXiv Detail & Related papers (2021-06-18T03:11:15Z) - High-Performance FPGA-based Accelerator for Bayesian Recurrent Neural
Networks [2.0631735969348064]
We propose an FPGA-based hardware design to accelerate Bayesian LSTM-based RNNs.
Compared with GPU implementation, our FPGA-based design can achieve up to 10 times speedup with nearly 106 times higher energy efficiency.
arXiv Detail & Related papers (2021-06-04T14:30:39Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.