Related papers: DNN-Chip Predictor: An Analytical Performance Predictor for DNN Accelerators with Various Dataflows and Hardware Architectures

DNN-Chip Predictor: An Analytical Performance Predictor for DNN Accelerators with Various Dataflows and Hardware Architectures

URL: http://arxiv.org/abs/2002.11270v2
Date: Fri, 16 Apr 2021 02:52:32 GMT
Title: DNN-Chip Predictor: An Analytical Performance Predictor for DNN Accelerators with Various Dataflows and Hardware Architectures
Authors: Yang Zhao, Chaojian Li, Yue Wang, Pengfei Xu, Yongan Zhang, and Yingyan Lin
Abstract summary: The recent breakthroughs in deep neural networks (DNNs) have spurred a tremendously increased demand for DNN accelerators. DNN-Chip Predictor is an analytical performance predictor which can accurately predict DNN accelerators' energy, throughput, and latency prior to their actual implementation.
Score: 30.689015188050405
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: The recent breakthroughs in deep neural networks (DNNs) have spurred a tremendously increased demand for DNN accelerators. However, designing DNN accelerators is non-trivial as it often takes months/years and requires cross-disciplinary knowledge. To enable fast and effective DNN accelerator development, we propose DNN-Chip Predictor, an analytical performance predictor which can accurately predict DNN accelerators' energy, throughput, and latency prior to their actual implementation. Our Predictor features two highlights: (1) its analytical performance formulation of DNN ASIC/FPGA accelerators facilitates fast design space exploration and optimization; and (2) it supports DNN accelerators with different algorithm-to-hardware mapping methods (i.e., dataflows) and hardware architectures. Experiment results based on 2 DNN models and 3 different ASIC/FPGA implementations show that our DNN-Chip Predictor's predicted performance differs from those of chip measurements of FPGA/ASIC implementation by no more than 17.66% when using different DNN models, hardware architectures, and dataflows. We will release code upon acceptance.

Related papers

DCP: Learning Accelerator Dataflow for Neural Network via Propagation [52.06154296196845]
This work proposes an efficient data-centric approach, named Dataflow Code Propagation (DCP), to automatically find the optimal dataflow for DNN layers in seconds without human effort. DCP learns a neural predictor to efficiently update the dataflow codes towards the desired gradient directions to minimize various optimization objectives. For example, without using additional training data, DCP surpasses the GAMMA method that performs a full search using thousands of samples.
arXiv Detail & Related papers (2024-10-09T05:16:44Z)
FireFly v2: Advancing Hardware Support for High-Performance Spiking Neural Network with a Spatiotemporal FPGA Accelerator [8.0611988136866]
Spiking Neural Networks (SNNs) are expected to be a promising alternative to Artificial Neural Networks (ANNs) Specialized SNN hardware offers clear advantages over general-purpose devices in terms of power and performance. FireFly v2, an FPGA SNN accelerator, can address the issue of non-spike operation in current SOTA SNN algorithms.
arXiv Detail & Related papers (2023-09-28T04:17:02Z)
DGNN-Booster: A Generic FPGA Accelerator Framework For Dynamic Graph Neural Network Inference [2.2721856484014373]
We propose DGNN-Booster, which is a novel Field-Programmable Gate Array (FPGA) accelerator framework for real-time DGNN inference. We show that DGNN-Booster can achieve a speedup of up to 5.6x compared to the CPU baseline (6226R), 8.4x compared to the GPU baseline (A6000) and 2.1x compared to the FPGA baseline.
arXiv Detail & Related papers (2023-04-13T21:50:23Z)
End-to-end codesign of Hessian-aware quantized neural networks for FPGAs and ASICs [49.358119307844035]
We develop an end-to-end workflow for the training and implementation of co-designed neural networks (NNs) This makes efficient NN implementations in hardware accessible to nonexperts, in a single open-sourced workflow. We demonstrate the workflow in a particle physics application involving trigger decisions that must operate at the 40 MHz collision rate of the Large Hadron Collider (LHC) We implement an optimized mixed-precision NN for high-momentum particle jets in simulated LHC proton-proton collisions.
arXiv Detail & Related papers (2023-04-13T18:00:01Z)
DeepAxe: A Framework for Exploration of Approximation and Reliability Trade-offs in DNN Accelerators [0.9556128246747769]
The role of Deep Neural Networks (DNNs) in safety-critical applications is expanding. DNNs experience massive growth in terms of computation power. It raises the necessity of improving the reliability of DNN accelerators.
arXiv Detail & Related papers (2023-03-14T20:42:38Z)
SNN2ANN: A Fast and Memory-Efficient Training Framework for Spiking Neural Networks [117.56823277328803]
Spiking neural networks are efficient computation models for low-power environments. We propose a SNN-to-ANN (SNN2ANN) framework to train the SNN in a fast and memory-efficient way. Experiment results show that our SNN2ANN-based models perform well on the benchmark datasets.
arXiv Detail & Related papers (2022-06-19T16:52:56Z)
Automatic Mapping of the Best-Suited DNN Pruning Schemes for Real-Time Mobile Acceleration [71.80326738527734]
We propose a general, fine-grained structured pruning scheme and corresponding compiler optimizations. We show that our pruning scheme mapping methods, together with the general fine-grained structured pruning scheme, outperform the state-of-the-art DNN optimization framework.
arXiv Detail & Related papers (2021-11-22T23:53:14Z)
Two-Timescale End-to-End Learning for Channel Acquisition and Hybrid Precoding [94.40747235081466]
We propose an end-to-end deep learning-based joint transceiver design algorithm for millimeter wave (mmWave) massive multiple-input multiple-output (MIMO) systems. We develop a DNN architecture that maps the received pilots into feedback bits at the receiver, and then further maps the feedback bits into the hybrid precoder at the transmitter.
arXiv Detail & Related papers (2021-10-22T20:49:02Z)
Sub-bit Neural Networks: Learning to Compress and Accelerate Binary Neural Networks [72.81092567651395]
Sub-bit Neural Networks (SNNs) are a new type of binary quantization design tailored to compress and accelerate BNNs. SNNs are trained with a kernel-aware optimization framework, which exploits binary quantization in the fine-grained convolutional kernel space. Experiments on visual recognition benchmarks and the hardware deployment on FPGA validate the great potentials of SNNs.
arXiv Detail & Related papers (2021-10-18T11:30:29Z)
SECDA: Efficient Hardware/Software Co-Design of FPGA-based DNN Accelerators for Edge Inference [0.0]
We propose SECDA, a new hardware/software co-design methodology to reduce design time of optimized Deep Neural Networks (DNN) inference accelerators on edge devices with FPGAs. We use SECDA to efficiently develop two different DNN accelerator designs on a PYNQ-Z1 board, a platform that includes an edge FPGA. We evaluate the two accelerator designs with four common DNN models, achieving an average performance speedup across models of up to 3.5$times$ with a 2.9$times$ reduction in energy consumption over CPU-only inference.
arXiv Detail & Related papers (2021-10-01T15:20:29Z)
AutoDNNchip: An Automated DNN Chip Predictor and Builder for Both FPGAs and ASICs [36.490296335959485]
AutoDNNchip is a chip generator that can automatically generate both FPGA- and ASIC-based DNN chip implementation for a designated application and dataset. Our Chip Predictor's predicted performance differs from real-measured ones by 10% when validated. accelerators generated by our AutoDNNchip can achieve better (up to 3.86X improvement) performance than that of expert-crafted state-of-the-art accelerators.
arXiv Detail & Related papers (2020-01-06T05:32:15Z)

This list is automatically generated from the titles and abstracts of the papers in this site.