HALF: Holistic Auto Machine Learning for FPGAs
- URL: http://arxiv.org/abs/2106.14771v1
- Date: Mon, 28 Jun 2021 14:45:47 GMT
- Title: HALF: Holistic Auto Machine Learning for FPGAs
- Authors: Jonas Ney, Dominik Loroch, Vladimir Rybalkin, Nico Weber, Jens
Kr\"uger, Norbert Wehn
- Abstract summary: Deep Neural Networks (DNNs) are capable of solving complex problems in domains related to embedded systems, such as image and natural language processing.
To efficiently implement DNNs on a specific FPGA platform for a given cost criterion, e.g. energy efficiency, an enormous amount of design parameters has to be considered.
An automatic, holistic design approach can improve the quality of DNN implementations on FPGA significantly.
- Score: 1.9146960682777232
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Deep Neural Networks (DNNs) are capable of solving complex problems in
domains related to embedded systems, such as image and natural language
processing. To efficiently implement DNNs on a specific FPGA platform for a
given cost criterion, e.g. energy efficiency, an enormous amount of design
parameters has to be considered from the topology down to the final hardware
implementation. Interdependencies between the different design layers have to
be taken into account and explored efficiently, making it hardly possible to
find optimized solutions manually. An automatic, holistic design approach can
improve the quality of DNN implementations on FPGA significantly. To this end,
we present a cross-layer design space exploration methodology. It comprises
optimizations starting from a hardware-aware topology search for DNNs down to
the final optimized implementation for a given FPGA platform. The methodology
is implemented in our Holistic Auto machine Learning for FPGAs (HALF)
framework, which combines an evolutionary search algorithm, various
optimization steps and a library of parametrizable hardware DNN modules. HALF
automates both the exploration process and the implementation of optimized
solutions on a target FPGA platform for various applications. We demonstrate
the performance of HALF on a medical use case for arrhythmia detection for
three different design goals, i.e. low-energy, low-power and high-throughput
respectively. Our FPGA implementation outperforms a TensorRT optimized model on
an Nvidia Jetson platform in both throughput and energy consumption.
Related papers
- Hardware-Aware Neural Dropout Search for Reliable Uncertainty Prediction on FPGA [11.123116470454079]
Dropout-based Bayesian Neural Networks (BayesNNs) are prominent in this field, offering reliable uncertainty estimates.
Existing dropout-based BayesNNs typically employ a uniform dropout design across different layers, leading to suboptimal performance.
This paper proposes a novel neural dropout search framework that automatically optimize both the dropout-based BayesNNs and their hardware implementations on FPGA.
arXiv Detail & Related papers (2024-06-23T19:33:19Z) - Reconfigurable Distributed FPGA Cluster Design for Deep Learning
Accelerators [59.11160990637615]
We propose a distributed system based on lowpower embedded FPGAs designed for edge computing applications.
The proposed system can simultaneously execute diverse Neural Network (NN) models, arrange the graph in a pipeline structure, and manually allocate greater resources to the most computationally intensive layers of the NN graph.
arXiv Detail & Related papers (2023-05-24T16:08:55Z) - End-to-end codesign of Hessian-aware quantized neural networks for FPGAs
and ASICs [49.358119307844035]
We develop an end-to-end workflow for the training and implementation of co-designed neural networks (NNs)
This makes efficient NN implementations in hardware accessible to nonexperts, in a single open-sourced workflow.
We demonstrate the workflow in a particle physics application involving trigger decisions that must operate at the 40 MHz collision rate of the Large Hadron Collider (LHC)
We implement an optimized mixed-precision NN for high-momentum particle jets in simulated LHC proton-proton collisions.
arXiv Detail & Related papers (2023-04-13T18:00:01Z) - HARFLOW3D: A Latency-Oriented 3D-CNN Accelerator Toolflow for HAR on
FPGA Devices [71.45672882756001]
This study introduces a novel streaming architecture based toolflow for mapping 3D Convolutional Neural Networks onto FPGAs.
The HARFLOW3D toolflow takes as input a 3D CNN in ONNX format and a description of the FPGA characteristics.
The ability of the toolflow to support a broad range of models and devices is shown through a number of experiments on various 3D CNN and FPGA system pairs.
arXiv Detail & Related papers (2023-03-30T08:25:27Z) - Energy-efficient Task Adaptation for NLP Edge Inference Leveraging
Heterogeneous Memory Architectures [68.91874045918112]
adapter-ALBERT is an efficient model optimization for maximal data reuse across different tasks.
We demonstrate the advantage of mapping the model to a heterogeneous on-chip memory architecture by performing simulations on a validated NLP edge accelerator.
arXiv Detail & Related papers (2023-03-25T14:40:59Z) - Optimization of FPGA-based CNN Accelerators Using Metaheuristics [1.854931308524932]
convolutional neural networks (CNNs) have demonstrated their ability to solve problems in many fields.
FPGAs have seen a surge in interest for accelerating CNN inference.
Current trend in FPGA-based CNN accelerators is to implement multiple convolutional layer processors (CLPs)
arXiv Detail & Related papers (2022-09-22T18:57:49Z) - Open-source FPGA-ML codesign for the MLPerf Tiny Benchmark [11.575901540758574]
We present our development experience for the Tiny Inference Benchmark on field-programmable gate array (FPGA) platforms.
We use the open-source hls4ml and FINN perJ, which aim to democratize AI- hardware codesign of optimized neural networks on FPGAs.
The solutions are deployed on system-on-chip (Pynq-Z2) and pure FPGA (Arty A7-100T) platforms.
arXiv Detail & Related papers (2022-06-23T15:57:17Z) - Automatic Mapping of the Best-Suited DNN Pruning Schemes for Real-Time
Mobile Acceleration [71.80326738527734]
We propose a general, fine-grained structured pruning scheme and corresponding compiler optimizations.
We show that our pruning scheme mapping methods, together with the general fine-grained structured pruning scheme, outperform the state-of-the-art DNN optimization framework.
arXiv Detail & Related papers (2021-11-22T23:53:14Z) - SECDA: Efficient Hardware/Software Co-Design of FPGA-based DNN
Accelerators for Edge Inference [0.0]
We propose SECDA, a new hardware/software co-design methodology to reduce design time of optimized Deep Neural Networks (DNN) inference accelerators on edge devices with FPGAs.
We use SECDA to efficiently develop two different DNN accelerator designs on a PYNQ-Z1 board, a platform that includes an edge FPGA.
We evaluate the two accelerator designs with four common DNN models, achieving an average performance speedup across models of up to 3.5$times$ with a 2.9$times$ reduction in energy consumption over CPU-only inference.
arXiv Detail & Related papers (2021-10-01T15:20:29Z) - NullaNet Tiny: Ultra-low-latency DNN Inference Through Fixed-function
Combinational Logic [4.119948826527649]
Field-programmable gate array (FPGA)-based accelerators are gaining traction as a serious contender to replace graphics processing unit/central processing unit-based platforms.
This paper presents NullaNet Tiny, a framework for constructing resource and energy-efficient, ultra-low-latency FPGA-based neural network accelerators.
arXiv Detail & Related papers (2021-04-07T00:16:39Z) - Adaptive pruning-based optimization of parameterized quantum circuits [62.997667081978825]
Variisy hybrid quantum-classical algorithms are powerful tools to maximize the use of Noisy Intermediate Scale Quantum devices.
We propose a strategy for such ansatze used in variational quantum algorithms, which we call "Efficient Circuit Training" (PECT)
Instead of optimizing all of the ansatz parameters at once, PECT launches a sequence of variational algorithms.
arXiv Detail & Related papers (2020-10-01T18:14:11Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.