Bifrost: End-to-End Evaluation and Optimization of Reconfigurable DNN
Accelerators
- URL: http://arxiv.org/abs/2204.12418v1
- Date: Tue, 26 Apr 2022 16:22:24 GMT
- Title: Bifrost: End-to-End Evaluation and Optimization of Reconfigurable DNN
Accelerators
- Authors: Axel Stjerngren, Perry Gibson, Jos\'e Cano
- Abstract summary: Bifrost is an end-to-end framework for the evaluation and optimization of reconfigurable inference accelerators.
We discuss Bifrost's advantages over STONNE and other tools, and evaluate the MAERI and SIGMA architectures using Bifrost.
- Score: 0.0
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Reconfigurable accelerators for deep neural networks (DNNs) promise to
improve performance such as inference latency. STONNE is the first
cycle-accurate simulator for reconfigurable DNN inference accelerators which
allows for the exploration of accelerator designs and configuration space.
However, preparing models for evaluation and exploring configuration space in
STONNE is a manual developer-timeconsuming process, which is a barrier for
research. This paper introduces Bifrost, an end-to-end framework for the
evaluation and optimization of reconfigurable DNN inference accelerators.
Bifrost operates as a frontend for STONNE and leverages the TVM deep learning
compiler stack to parse models and automate offloading of accelerated
computations. We discuss Bifrost's advantages over STONNE and other tools, and
evaluate the MAERI and SIGMA architectures using Bifrost. Additionally, Bifrost
introduces a module leveraging AutoTVM to efficiently explore accelerator
designs and dataflow mapping space to optimize performance. This is
demonstrated by tuning the MAERI architecture and generating efficient dataflow
mappings for AlexNet, obtaining an average speedup of $50\times$ for the
convolutional layers and $11\times$ for the fully connected layers. Our code is
available at www.github.com/gicLAB/bifrost.
Related papers
- GPU-RANC: A CUDA Accelerated Simulation Framework for Neuromorphic Architectures [1.3401966602181168]
We introduce the GPU-based implementation of Reconfigurable Architecture for Neuromorphic Computing (RANC)
We demonstrate up to 780 times speedup compared to serial version of the RANC simulator based on a 512 neuromorphic core MNIST inference application.
arXiv Detail & Related papers (2024-04-24T21:08:21Z) - Performance Embeddings: A Similarity-based Approach to Automatic
Performance Optimization [71.69092462147292]
Performance embeddings enable knowledge transfer of performance tuning between applications.
We demonstrate this transfer tuning approach on case studies in deep neural networks, dense and sparse linear algebra compositions, and numerical weather prediction stencils.
arXiv Detail & Related papers (2023-03-14T15:51:35Z) - Intelligence Processing Units Accelerate Neuromorphic Learning [52.952192990802345]
Spiking neural networks (SNNs) have achieved orders of magnitude improvement in terms of energy consumption and latency.
We present an IPU-optimized release of our custom SNN Python package, snnTorch.
arXiv Detail & Related papers (2022-11-19T15:44:08Z) - NASA: Neural Architecture Search and Acceleration for Hardware Inspired
Hybrid Networks [24.95135135092478]
We propose a Neural Architecture Search and Acceleration framework dubbed NASA.
It enables automated multiplication-reduced DNN development and integrates a dedicated multiplication-reduced accelerator.
Experiments consistently validate the advantages of NASA's algorithm-hardware co-design framework in terms of achievable accuracy and efficiency tradeoffs.
arXiv Detail & Related papers (2022-10-24T16:03:42Z) - An Adaptive Device-Edge Co-Inference Framework Based on Soft
Actor-Critic [72.35307086274912]
High-dimension parameter model and large-scale mathematical calculation restrict execution efficiency, especially for Internet of Things (IoT) devices.
We propose a new Deep Reinforcement Learning (DRL)-Soft Actor Critic for discrete (SAC-d), which generates the emphexit point, emphexit point, and emphcompressing bits by soft policy iterations.
Based on the latency and accuracy aware reward design, such an computation can well adapt to the complex environment like dynamic wireless channel and arbitrary processing, and is capable of supporting the 5G URL
arXiv Detail & Related papers (2022-01-09T09:31:50Z) - Automatic Mapping of the Best-Suited DNN Pruning Schemes for Real-Time
Mobile Acceleration [71.80326738527734]
We propose a general, fine-grained structured pruning scheme and corresponding compiler optimizations.
We show that our pruning scheme mapping methods, together with the general fine-grained structured pruning scheme, outperform the state-of-the-art DNN optimization framework.
arXiv Detail & Related papers (2021-11-22T23:53:14Z) - FLASH: Fast Neural Architecture Search with Hardware Optimization [7.263481020106725]
Neural architecture search (NAS) is a promising technique to design efficient and high-performance deep neural networks (DNNs)
This paper proposes FLASH, a very fast NAS methodology that co-optimizes the DNN accuracy and performance on a real hardware platform.
arXiv Detail & Related papers (2021-08-01T23:46:48Z) - Learning N:M Fine-grained Structured Sparse Neural Networks From Scratch [75.69506249886622]
Sparsity in Deep Neural Networks (DNNs) has been widely studied to compress and accelerate the models on resource-constrained environments.
In this paper, we are the first to study training from scratch an N:M fine-grained structured sparse network.
arXiv Detail & Related papers (2021-02-08T05:55:47Z) - Procrustes: a Dataflow and Accelerator for Sparse Deep Neural Network
Training [0.5219568203653523]
We develop a sparse DNN training accelerator that produces pruned models with the same accuracy as dense models without first training, then pruning, and finally retraining, a dense model.
Compared to training the equivalent unpruned models using a state-of-the-art DNN accelerator without sparse training support, Procrustes consumes up to 3.26$times$ less energy and offers up to 4$times$ speedup across a range of models, while pruning weights by an order of magnitude and maintaining unpruned accuracy.
arXiv Detail & Related papers (2020-09-23T07:39:55Z) - STONNE: A Detailed Architectural Simulator for Flexible Neural Network
Accelerators [5.326345912766044]
STONNE is a cycle-accurate, highly-modular and highly-extensible simulation framework.
We show how it can closely approach the performance results of the publicly available BSV-coded MAERI implementation.
arXiv Detail & Related papers (2020-06-10T19:20:52Z) - PatDNN: Achieving Real-Time DNN Execution on Mobile Devices with
Pattern-based Weight Pruning [57.20262984116752]
We introduce a new dimension, fine-grained pruning patterns inside the coarse-grained structures, revealing a previously unknown point in design space.
With the higher accuracy enabled by fine-grained pruning patterns, the unique insight is to use the compiler to re-gain and guarantee high hardware efficiency.
arXiv Detail & Related papers (2020-01-01T04:52:07Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.