Optimizing Memory-Access Patterns for Deep Learning Accelerators
- URL: http://arxiv.org/abs/2002.12798v1
- Date: Thu, 27 Feb 2020 05:06:19 GMT
- Title: Optimizing Memory-Access Patterns for Deep Learning Accelerators
- Authors: Hongbin Zheng, Sejong Oh, Huiqing Wang, Preston Briggs, Jiading Gai,
Animesh Jain, Yizhi Liu, Rich Heaton, Randy Huang, Yida Wang
- Abstract summary: Deep learning (DL) workloads are moving towards accelerators for faster processing and lower cost.
Modern DL accelerators are good at handling the large-scale multiply-accumulate operations that dominate DL workloads.
It is challenging to make full use of the compute power of an accelerator since the data must be properly staged in a software-managed scratchpad memory.
This paper proposes a systematic approach which leverages the polyhedral model to analyze all operators of a DL model together to minimize the number of memory accesses.
- Score: 6.931196464448543
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Deep learning (DL) workloads are moving towards accelerators for faster
processing and lower cost. Modern DL accelerators are good at handling the
large-scale multiply-accumulate operations that dominate DL workloads; however,
it is challenging to make full use of the compute power of an accelerator since
the data must be properly staged in a software-managed scratchpad memory.
Failing to do so can result in significant performance loss. This paper
proposes a systematic approach which leverages the polyhedral model to analyze
all operators of a DL model together to minimize the number of memory accesses.
Experiments show that our approach can substantially reduce the impact of
memory accesses required by common neural-network models on a homegrown AWS
machine-learning inference chip named Inferentia, which is available through
Amazon EC2 Inf1 instances.
Related papers
- SHERL: Synthesizing High Accuracy and Efficient Memory for Resource-Limited Transfer Learning [63.93193829913252]
We propose an innovative METL strategy called SHERL for resource-limited scenarios.
In the early route, intermediate outputs are consolidated via an anti-redundancy operation.
In the late route, utilizing minimal late pre-trained layers could alleviate the peak demand on memory overhead.
arXiv Detail & Related papers (2024-07-10T10:22:35Z) - TSB: Tiny Shared Block for Efficient DNN Deployment on NVCIM Accelerators [11.496631244103773]
"Tiny Shared Block (TSB)" integrates a small shared 1x1 convolution block into the Deep Neural Network architecture.
TSB achieves over 20x inference accuracy gap improvement, over 5x training speedup, and weights-to-device mapping cost reduction.
arXiv Detail & Related papers (2024-05-08T20:53:38Z) - CAMEL: Co-Designing AI Models and Embedded DRAMs for Efficient On-Device
Learning [8.339901980070616]
Training AI on resource-limited devices poses significant challenges due to the demanding computing workload and the substantial memory consumption and data access required by deep neural networks (DNNs)
We propose utilizing embedded dynamic random-access memory (eDRAM) as the primary storage medium for transient training data.
We present a highly efficient on-device training engine named textitCAMEL, which leverages eDRAM as the primary on-chip memory.
arXiv Detail & Related papers (2023-05-04T20:57:01Z) - ATTACC the Quadratic Bottleneck of Attention Layers [3.2741800634280245]
This paper introduces a new attention-tailored dataflow, termed FLAT, for deep neural network (DNN) accelerators.
It increases the effective memory bandwidth by efficiently utilizing the high-bandwidth, low-capacity on-chip buffer.
In our evaluation, ATTACC achieves 1.94x and 1.76x speedup and 49% and 42% of energy reduction compared to state-of-the-art edge and cloud accelerators.
arXiv Detail & Related papers (2021-07-13T22:23:40Z) - Improving Computational Efficiency in Visual Reinforcement Learning via
Stored Embeddings [89.63764845984076]
We present Stored Embeddings for Efficient Reinforcement Learning (SEER)
SEER is a simple modification of existing off-policy deep reinforcement learning methods.
We show that SEER does not degrade the performance of RLizable agents while significantly saving computation and memory.
arXiv Detail & Related papers (2021-03-04T08:14:10Z) - SmartDeal: Re-Modeling Deep Network Weights for Efficient Inference and
Training [82.35376405568975]
Deep neural networks (DNNs) come with heavy parameterization, leading to external dynamic random-access memory (DRAM) for storage.
We present SmartDeal (SD), an algorithm framework to trade higher-cost memory storage/access for lower-cost computation.
We show that SD leads to 10.56x and 4.48x reduction in the storage and training energy, with negligible accuracy loss compared to state-of-the-art training baselines.
arXiv Detail & Related papers (2021-01-04T18:54:07Z) - Scaling Distributed Deep Learning Workloads beyond the Memory Capacity
with KARMA [58.040931661693925]
We propose a strategy that combines redundant recomputing and out-of-core methods.
We achieve an average of 1.52x speedup in six different models over the state-of-the-art out-of-core methods.
Our data parallel out-of-core solution can outperform complex hybrid model parallelism in training large models, e.g. Megatron-LM and Turning-NLG.
arXiv Detail & Related papers (2020-08-26T07:24:34Z) - Improving compute efficacy frontiers with SliceOut [31.864949424541344]
We introduce SliceOut -- a dropout-inspired scheme to train deep learning models faster without impacting final test accuracy.
At test time, turning off SliceOut performs an implicit ensembling across a linear number of architectures that preserves test accuracy.
This leads to faster processing of large computational workloads overall, and significantly reduce the resulting energy consumption and CO2emissions.
arXiv Detail & Related papers (2020-07-21T15:59:09Z) - Joint Parameter-and-Bandwidth Allocation for Improving the Efficiency of
Partitioned Edge Learning [73.82875010696849]
Machine learning algorithms are deployed at the network edge for training artificial intelligence (AI) models.
This paper focuses on the novel joint design of parameter (computation load) allocation and bandwidth allocation.
arXiv Detail & Related papers (2020-03-10T05:52:15Z) - Model-Driven Beamforming Neural Networks [47.754731555563836]
This article introduces general data- and model-driven beamforming neural networks (BNNs)
It presents various possible learning strategies, and also discusses complexity reduction for the DL-based BNNs.
We also offer enhancement methods such as training-set augmentation and transfer learning in order to improve the generality of BNNs.
arXiv Detail & Related papers (2020-01-15T12:50:09Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.