DRACO: Co-Optimizing Hardware Utilization, and Performance of DNNs on
Systolic Accelerator
- URL: http://arxiv.org/abs/2006.15103v1
- Date: Fri, 26 Jun 2020 17:06:41 GMT
- Title: DRACO: Co-Optimizing Hardware Utilization, and Performance of DNNs on
Systolic Accelerator
- Authors: Nandan Kumar Jha, Shreyas Ravishankar, Sparsh Mittal, Arvind Kaushik,
Dipan Mandal, Mahesh Chandra
- Abstract summary: We propose data reuse computation aware co-optimization (DRACO)
DRACO improves the PE utilization of memory-bound DNNs without any additional need for dataflow/micro-architecture modifications.
Unlike the previous co-optimization methods, DRACO not only maximizes performance and energy efficiency but also improves the predictive performance of DNNs.
- Score: 5.65116500037191
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The number of processing elements (PEs) in a fixed-sized systolic accelerator
is well matched for large and compute-bound DNNs; whereas, memory-bound DNNs
suffer from PE underutilization and fail to achieve peak performance and energy
efficiency. To mitigate this, specialized dataflow and/or micro-architectural
techniques have been proposed. However, due to the longer development cycle and
the rapid pace of evolution in the deep learning fields, these hardware-based
solutions can be obsolete and ineffective in dealing with PE underutilization
for state-of-the-art DNNs. In this work, we address the challenge of PE
underutilization at the algorithm front and propose data reuse aware
co-optimization (DRACO). This improves the PE utilization of memory-bound DNNs
without any additional need for dataflow/micro-architecture modifications.
Furthermore, unlike the previous co-optimization methods, DRACO not only
maximizes performance and energy efficiency but also improves the predictive
performance of DNNs. To the best of our knowledge, DRACO is the first work that
resolves the resource underutilization challenge at the algorithm level and
demonstrates a trade-off between computational efficiency, PE utilization, and
predictive performance of DNN. Compared to the state-of-the-art row stationary
dataflow, DRACO achieves 41.8% and 42.6% improvement in average PE utilization
and inference latency (respectively) with negligible loss in predictive
performance in MobileNetV1 on a $64\times64$ systolic array. DRACO provides
seminal insights for utilization-aware DNN design methodologies that can fully
leverage the computation power of systolic array-based hardware accelerators.
Related papers
- DCP: Learning Accelerator Dataflow for Neural Network via Propagation [52.06154296196845]
This work proposes an efficient data-centric approach, named Dataflow Code Propagation (DCP), to automatically find the optimal dataflow for DNN layers in seconds without human effort.
DCP learns a neural predictor to efficiently update the dataflow codes towards the desired gradient directions to minimize various optimization objectives.
For example, without using additional training data, DCP surpasses the GAMMA method that performs a full search using thousands of samples.
arXiv Detail & Related papers (2024-10-09T05:16:44Z) - DNN Partitioning, Task Offloading, and Resource Allocation in Dynamic Vehicular Networks: A Lyapunov-Guided Diffusion-Based Reinforcement Learning Approach [49.56404236394601]
We formulate the problem of joint DNN partitioning, task offloading, and resource allocation in Vehicular Edge Computing.
Our objective is to minimize the DNN-based task completion time while guaranteeing the system stability over time.
We propose a Multi-Agent Diffusion-based Deep Reinforcement Learning (MAD2RL) algorithm, incorporating the innovative use of diffusion models.
arXiv Detail & Related papers (2024-06-11T06:31:03Z) - Context-aware Multi-Model Object Detection for Diversely Heterogeneous
Compute Systems [0.32634122554914]
One-size-fits-all approach to object detection using deep neural networks (DNNs) leads to inefficient utilization of computational resources.
We propose SHIFT which continuously selects from a variety of DNN-based OD models depending on the dynamically changing contextual information and computational constraints.
Our proposed methodology results in improvements of up to 7.5x in energy usage and 2.8x in latency compared to state-of-the-art GPU-based single model OD approaches.
arXiv Detail & Related papers (2024-02-12T05:38:11Z) - Hardware-Aware DNN Compression via Diverse Pruning and Mixed-Precision
Quantization [1.0235078178220354]
We propose an automated framework to compress Deep Neural Networks (DNNs) in a hardware-aware manner by jointly employing pruning and quantization.
Our framework achieves $39%$ average energy reduction for datasets $1.7%$ average accuracy loss and outperforms significantly the state-of-the-art approaches.
arXiv Detail & Related papers (2023-12-23T18:50:13Z) - A Multi-Head Ensemble Multi-Task Learning Approach for Dynamical
Computation Offloading [62.34538208323411]
We propose a multi-head ensemble multi-task learning (MEMTL) approach with a shared backbone and multiple prediction heads (PHs)
MEMTL outperforms benchmark methods in both the inference accuracy and mean square error without requiring additional training data.
arXiv Detail & Related papers (2023-09-02T11:01:16Z) - Learning to Solve the AC-OPF using Sensitivity-Informed Deep Neural
Networks [52.32646357164739]
We propose a deep neural network (DNN) to solve the solutions of the optimal power flow (ACOPF)
The proposed SIDNN is compatible with a broad range of OPF schemes.
It can be seamlessly integrated in other learning-to-OPF schemes.
arXiv Detail & Related papers (2021-03-27T00:45:23Z) - FSpiNN: An Optimization Framework for Memory- and Energy-Efficient
Spiking Neural Networks [14.916996986290902]
Spiking Neural Networks (SNNs) offer unsupervised learning capability due to the spike-timing-dependent plasticity (STDP) rule.
However, state-of-the-art SNNs require a large memory footprint to achieve high accuracy.
We propose FSpiNN, an optimization framework for obtaining memory- and energy-efficient SNNs for training and inference processing.
arXiv Detail & Related papers (2020-07-17T09:40:26Z) - ESSOP: Efficient and Scalable Stochastic Outer Product Architecture for
Deep Learning [1.2019888796331233]
Matrix-vector multiplications (MVM) and vector-vector outer product (VVOP) are the two most expensive operations associated with the training of deep neural networks (DNNs)
We introduce efficient techniques to SC for weight update in DNNs with the activation functions required by many state-of-the-art networks.
Our architecture reduces the computational cost by re-using random numbers and replacing certain FP multiplication operations by bit shift scaling.
Hardware design of ESSOP at 14nm technology node shows that, compared to a highly pipelined FP16 multiplier, ESSOP is 82.2% and 93.7% better in energy
arXiv Detail & Related papers (2020-03-25T07:54:42Z) - Self-Directed Online Machine Learning for Topology Optimization [58.920693413667216]
Self-directed Online Learning Optimization integrates Deep Neural Network (DNN) with Finite Element Method (FEM) calculations.
Our algorithm was tested by four types of problems including compliance minimization, fluid-structure optimization, heat transfer enhancement and truss optimization.
It reduced the computational time by 2 5 orders of magnitude compared with directly using methods, and outperformed all state-of-the-art algorithms tested in our experiments.
arXiv Detail & Related papers (2020-02-04T20:00:28Z) - PatDNN: Achieving Real-Time DNN Execution on Mobile Devices with
Pattern-based Weight Pruning [57.20262984116752]
We introduce a new dimension, fine-grained pruning patterns inside the coarse-grained structures, revealing a previously unknown point in design space.
With the higher accuracy enabled by fine-grained pruning patterns, the unique insight is to use the compiler to re-gain and guarantee high hardware efficiency.
arXiv Detail & Related papers (2020-01-01T04:52:07Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.