Bifrost: End-to-End Evaluation and Optimization of Reconfigurable DNN
Accelerators
- URL: http://arxiv.org/abs/2204.12418v1
- Date: Tue, 26 Apr 2022 16:22:24 GMT
- Title: Bifrost: End-to-End Evaluation and Optimization of Reconfigurable DNN
Accelerators
- Authors: Axel Stjerngren, Perry Gibson, Jos\'e Cano
- Abstract summary: Bifrost is an end-to-end framework for the evaluation and optimization of reconfigurable inference accelerators.
We discuss Bifrost's advantages over STONNE and other tools, and evaluate the MAERI and SIGMA architectures using Bifrost.
- Score: 0.0
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Reconfigurable accelerators for deep neural networks (DNNs) promise to
improve performance such as inference latency. STONNE is the first
cycle-accurate simulator for reconfigurable DNN inference accelerators which
allows for the exploration of accelerator designs and configuration space.
However, preparing models for evaluation and exploring configuration space in
STONNE is a manual developer-timeconsuming process, which is a barrier for
research. This paper introduces Bifrost, an end-to-end framework for the
evaluation and optimization of reconfigurable DNN inference accelerators.
Bifrost operates as a frontend for STONNE and leverages the TVM deep learning
compiler stack to parse models and automate offloading of accelerated
computations. We discuss Bifrost's advantages over STONNE and other tools, and
evaluate the MAERI and SIGMA architectures using Bifrost. Additionally, Bifrost
introduces a module leveraging AutoTVM to efficiently explore accelerator
designs and dataflow mapping space to optimize performance. This is
demonstrated by tuning the MAERI architecture and generating efficient dataflow
mappings for AlexNet, obtaining an average speedup of $50\times$ for the
convolutional layers and $11\times$ for the fully connected layers. Our code is
available at www.github.com/gicLAB/bifrost.
Related papers
- DCP: Learning Accelerator Dataflow for Neural Network via Propagation [52.06154296196845]
This work proposes an efficient data-centric approach, named Dataflow Code Propagation (DCP), to automatically find the optimal dataflow for DNN layers in seconds without human effort.
DCP learns a neural predictor to efficiently update the dataflow codes towards the desired gradient directions to minimize various optimization objectives.
For example, without using additional training data, DCP surpasses the GAMMA method that performs a full search using thousands of samples.
arXiv Detail & Related papers (2024-10-09T05:16:44Z) - Automatic Generation of Fast and Accurate Performance Models for Deep Neural Network Accelerators [33.18173790144853]
We present an automated generation approach for fast performance models to accurately estimate the latency of a Deep Neural Networks (DNNs)
We modeled representative DNN accelerators such as Gemmini, UltraTrail, Plasticine-derived, and a parameterizable systolic array.
We evaluate only 154 loop kernel iterations to estimate the performance for 4.19 billion instructions achieving a significant speedup.
arXiv Detail & Related papers (2024-09-13T07:27:55Z) - Intelligence Processing Units Accelerate Neuromorphic Learning [52.952192990802345]
Spiking neural networks (SNNs) have achieved orders of magnitude improvement in terms of energy consumption and latency.
We present an IPU-optimized release of our custom SNN Python package, snnTorch.
arXiv Detail & Related papers (2022-11-19T15:44:08Z) - NASA: Neural Architecture Search and Acceleration for Hardware Inspired
Hybrid Networks [24.95135135092478]
We propose a Neural Architecture Search and Acceleration framework dubbed NASA.
It enables automated multiplication-reduced DNN development and integrates a dedicated multiplication-reduced accelerator.
Experiments consistently validate the advantages of NASA's algorithm-hardware co-design framework in terms of achievable accuracy and efficiency tradeoffs.
arXiv Detail & Related papers (2022-10-24T16:03:42Z) - An Adaptive Device-Edge Co-Inference Framework Based on Soft
Actor-Critic [72.35307086274912]
High-dimension parameter model and large-scale mathematical calculation restrict execution efficiency, especially for Internet of Things (IoT) devices.
We propose a new Deep Reinforcement Learning (DRL)-Soft Actor Critic for discrete (SAC-d), which generates the emphexit point, emphexit point, and emphcompressing bits by soft policy iterations.
Based on the latency and accuracy aware reward design, such an computation can well adapt to the complex environment like dynamic wireless channel and arbitrary processing, and is capable of supporting the 5G URL
arXiv Detail & Related papers (2022-01-09T09:31:50Z) - Automatic Mapping of the Best-Suited DNN Pruning Schemes for Real-Time
Mobile Acceleration [71.80326738527734]
We propose a general, fine-grained structured pruning scheme and corresponding compiler optimizations.
We show that our pruning scheme mapping methods, together with the general fine-grained structured pruning scheme, outperform the state-of-the-art DNN optimization framework.
arXiv Detail & Related papers (2021-11-22T23:53:14Z) - FLASH: Fast Neural Architecture Search with Hardware Optimization [7.263481020106725]
Neural architecture search (NAS) is a promising technique to design efficient and high-performance deep neural networks (DNNs)
This paper proposes FLASH, a very fast NAS methodology that co-optimizes the DNN accuracy and performance on a real hardware platform.
arXiv Detail & Related papers (2021-08-01T23:46:48Z) - Learning N:M Fine-grained Structured Sparse Neural Networks From Scratch [75.69506249886622]
Sparsity in Deep Neural Networks (DNNs) has been widely studied to compress and accelerate the models on resource-constrained environments.
In this paper, we are the first to study training from scratch an N:M fine-grained structured sparse network.
arXiv Detail & Related papers (2021-02-08T05:55:47Z) - Procrustes: a Dataflow and Accelerator for Sparse Deep Neural Network
Training [0.5219568203653523]
We develop a sparse DNN training accelerator that produces pruned models with the same accuracy as dense models without first training, then pruning, and finally retraining, a dense model.
Compared to training the equivalent unpruned models using a state-of-the-art DNN accelerator without sparse training support, Procrustes consumes up to 3.26$times$ less energy and offers up to 4$times$ speedup across a range of models, while pruning weights by an order of magnitude and maintaining unpruned accuracy.
arXiv Detail & Related papers (2020-09-23T07:39:55Z) - STONNE: A Detailed Architectural Simulator for Flexible Neural Network
Accelerators [5.326345912766044]
STONNE is a cycle-accurate, highly-modular and highly-extensible simulation framework.
We show how it can closely approach the performance results of the publicly available BSV-coded MAERI implementation.
arXiv Detail & Related papers (2020-06-10T19:20:52Z) - PatDNN: Achieving Real-Time DNN Execution on Mobile Devices with
Pattern-based Weight Pruning [57.20262984116752]
We introduce a new dimension, fine-grained pruning patterns inside the coarse-grained structures, revealing a previously unknown point in design space.
With the higher accuracy enabled by fine-grained pruning patterns, the unique insight is to use the compiler to re-gain and guarantee high hardware efficiency.
arXiv Detail & Related papers (2020-01-01T04:52:07Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.