ConfuciuX: Autonomous Hardware Resource Assignment for DNN Accelerators
using Reinforcement Learning
- URL: http://arxiv.org/abs/2009.02010v1
- Date: Fri, 4 Sep 2020 04:59:26 GMT
- Title: ConfuciuX: Autonomous Hardware Resource Assignment for DNN Accelerators
using Reinforcement Learning
- Authors: Sheng-Chun Kao, Geonhwa Jeong, Tushar Krishna
- Abstract summary: We propose an autonomous strategy called ConfuciuX to find optimized HW resource assignments for a given model and dataflow style.
It converges to the optimized hardware configuration 4.7 to 24 times faster than alternate techniques.
- Score: 5.251940442946459
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: DNN accelerators provide efficiency by leveraging reuse of
activations/weights/outputs during the DNN computations to reduce data movement
from DRAM to the chip. The reuse is captured by the accelerator's dataflow.
While there has been significant prior work in exploring and comparing various
dataflows, the strategy for assigning on-chip hardware resources (i.e., compute
and memory) given a dataflow that can optimize for performance/energy while
meeting platform constraints of area/power for DNN(s) of interest is still
relatively unexplored. The design-space of choices for balancing compute and
memory explodes combinatorially, as we show in this work (e.g., as large as
O(10^(72)) choices for running \mobilenet), making it infeasible to do
manual-tuning via exhaustive searches. It is also difficult to come up with a
specific heuristic given that different DNNs and layer types exhibit different
amounts of reuse.
In this paper, we propose an autonomous strategy called ConfuciuX to find
optimized HW resource assignments for a given model and dataflow style.
ConfuciuX leverages a reinforcement learning method, REINFORCE, to guide the
search process, leveraging a detailed HW performance cost model within the
training loop to estimate rewards. We also augment the RL approach with a
genetic algorithm for further fine-tuning. ConfuciuX demonstrates the highest
sample-efficiency for training compared to other techniques such as Bayesian
optimization, genetic algorithm, simulated annealing, and other RL methods. It
converges to the optimized hardware configuration 4.7 to 24 times faster than
alternate techniques.
Related papers
- DCP: Learning Accelerator Dataflow for Neural Network via Propagation [52.06154296196845]
This work proposes an efficient data-centric approach, named Dataflow Code Propagation (DCP), to automatically find the optimal dataflow for DNN layers in seconds without human effort.
DCP learns a neural predictor to efficiently update the dataflow codes towards the desired gradient directions to minimize various optimization objectives.
For example, without using additional training data, DCP surpasses the GAMMA method that performs a full search using thousands of samples.
arXiv Detail & Related papers (2024-10-09T05:16:44Z) - HASS: Hardware-Aware Sparsity Search for Dataflow DNN Accelerator [47.66463010685586]
We propose a novel approach to exploit unstructured weights and activations sparsity for dataflow accelerators, using software and hardware co-optimization.
We achieve an efficiency improvement ranging from 1.3$times$ to 4.2$times$ compared to existing sparse designs.
arXiv Detail & Related papers (2024-06-05T09:25:18Z) - RESPECT: Reinforcement Learning based Edge Scheduling on Pipelined Coral
Edge TPUs [12.952987240366781]
This work presents a reinforcement learning (RL) based scheduling framework, which learns the behaviors of optimal optimization algorithms.
RL generates near-optimal scheduling results with short solving runtime overhead.
Our framework has demonstrated up to $sim2.5times$ real-world on-chip runtime inference speedups over the commercial compiler.
arXiv Detail & Related papers (2023-04-10T17:22:12Z) - A Theory of I/O-Efficient Sparse Neural Network Inference [17.862408781750126]
Machine learning models increase their accuracy at a fast rate, so their demand for energy and compute resources increases.
On a low level, the major part of these resources is consumed by data movement between different memory units.
We provide a rigorous theoretical analysis of the I/Os needed in sparse feedforward neural network (FFNN) inference.
arXiv Detail & Related papers (2023-01-03T11:23:46Z) - Intelligence Processing Units Accelerate Neuromorphic Learning [52.952192990802345]
Spiking neural networks (SNNs) have achieved orders of magnitude improvement in terms of energy consumption and latency.
We present an IPU-optimized release of our custom SNN Python package, snnTorch.
arXiv Detail & Related papers (2022-11-19T15:44:08Z) - Recurrent Bilinear Optimization for Binary Neural Networks [58.972212365275595]
BNNs neglect the intrinsic bilinear relationship of real-valued weights and scale factors.
Our work is the first attempt to optimize BNNs from the bilinear perspective.
We obtain robust RBONNs, which show impressive performance over state-of-the-art BNNs on various models and datasets.
arXiv Detail & Related papers (2022-09-04T06:45:33Z) - A Hybrid Framework for Sequential Data Prediction with End-to-End
Optimization [0.0]
We investigate nonlinear prediction in an online setting and introduce a hybrid model that effectively mitigates hand-designed features and manual model selection issues.
We employ a recurrent neural network (LSTM) for adaptive feature extraction from sequential data and a gradient boosting machinery (soft GBDT) for effective supervised regression.
We demonstrate the learning behavior of our algorithm on synthetic data and the significant performance improvements over the conventional methods over various real life datasets.
arXiv Detail & Related papers (2022-03-25T17:13:08Z) - Optimizing Memory Placement using Evolutionary Graph Reinforcement
Learning [56.83172249278467]
We introduce Evolutionary Graph Reinforcement Learning (EGRL), a method designed for large search spaces.
We train and validate our approach directly on the Intel NNP-I chip for inference.
We additionally achieve 28-78% speed-up compared to the native NNP-I compiler on all three workloads.
arXiv Detail & Related papers (2020-07-14T18:50:12Z) - Self-Directed Online Machine Learning for Topology Optimization [58.920693413667216]
Self-directed Online Learning Optimization integrates Deep Neural Network (DNN) with Finite Element Method (FEM) calculations.
Our algorithm was tested by four types of problems including compliance minimization, fluid-structure optimization, heat transfer enhancement and truss optimization.
It reduced the computational time by 2 5 orders of magnitude compared with directly using methods, and outperformed all state-of-the-art algorithms tested in our experiments.
arXiv Detail & Related papers (2020-02-04T20:00:28Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.