Combined Scheduling, Memory Allocation and Tensor Replacement for
Minimizing Off-Chip Data Accesses of DNN Accelerators
- URL: http://arxiv.org/abs/2311.18246v1
- Date: Thu, 30 Nov 2023 04:36:25 GMT
- Title: Combined Scheduling, Memory Allocation and Tensor Replacement for
Minimizing Off-Chip Data Accesses of DNN Accelerators
- Authors: Yi Li, Aarti Gupta, Sharad Malik
- Abstract summary: We propose an optimization framework, named COSMA, for mapping Deep Neural Networks to specialized hardware accelerators.
COSMA finds the optimal operator schedule, memory allocation and tensor replacement that minimizes the additional data accesses.
We demonstrate that, using an off-the-shelf ILP solver, COSMA obtains the optimal solution in seconds for a wide-range of state-of-the-art DNNs for different applications.
- Score: 6.393909466547065
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Specialized hardware accelerators have been extensively used for Deep Neural
Networks (DNNs) to provide power/performance benefits. These accelerators
contain specialized hardware that supports DNN operators, and scratchpad memory
for storing the tensor operands. Often, the size of the scratchpad is
insufficient to store all the tensors needed for the computation, and
additional data accesses are needed to move tensors back and forth from host
memory during the computation with significant power/performance overhead. The
volume of these additional data accesses depends on the operator schedule, and
memory allocation (specific locations selected for the tensors in the
scratchpad). We propose an optimization framework, named COSMA, for mapping
DNNs to an accelerator that finds the optimal operator schedule, memory
allocation and tensor replacement that minimizes the additional data accesses.
COSMA provides an Integer Linear Programming (ILP) formulation to generate the
optimal solution for mapping a DNN to the accelerator for a given scratchpad
size. We demonstrate that, using an off-the-shelf ILP solver, COSMA obtains the
optimal solution in seconds for a wide-range of state-of-the-art DNNs for
different applications. Further, it out-performs existing methods by reducing
on average 84% of the non-compulsory data accesses. We further propose a
divide-and-conquer heuristic to scale up to certain complex DNNs generated by
Neural Architecture Search, and this heuristic solution reduces on average 85%
data accesses compared with other works.
Related papers
- NeuraChip: Accelerating GNN Computations with a Hash-based Decoupled Spatial Accelerator [3.926150707772004]
We introduce NeuraChip, a novel GNN spatial accelerator based on Gustavson's algorithm.
NeuraChip decouples the multiplication and addition computations in sparse matrix multiplication.
We also present NeuraSim, an open-source, cycle-accurate, multi-threaded, modular simulator for comprehensive performance analysis.
arXiv Detail & Related papers (2024-04-23T20:51:09Z) - Sparse-DySta: Sparsity-Aware Dynamic and Static Scheduling for Sparse
Multi-DNN Workloads [65.47816359465155]
Running multiple deep neural networks (DNNs) in parallel has become an emerging workload in both edge devices.
We propose Dysta, a novel scheduler that utilizes both static sparsity patterns and dynamic sparsity information for the sparse multi-DNN scheduling.
Our proposed approach outperforms the state-of-the-art methods with up to 10% decrease in latency constraint violation rate and nearly 4X reduction in average normalized turnaround time.
arXiv Detail & Related papers (2023-10-17T09:25:17Z) - RAMAN: A Re-configurable and Sparse tinyML Accelerator for Inference on
Edge [1.8293684411977293]
Deep Neural Network (DNN) based inference at the edge is challenging as these compute and data-intensive algorithms need to be implemented at low cost and low power.
We present RAMAN, a Re-configurable and spArse tinyML Accelerator for infereNce on edge, architected to exploit the sparsity to reduce area (storage), power as well as latency.
arXiv Detail & Related papers (2023-06-10T17:25:58Z) - ANT: Exploiting Adaptive Numerical Data Type for Low-bit Deep Neural
Network Quantization [31.494669469303954]
We propose a fixed-length adaptive numerical data type called ANT to achieve low-bit quantization with tiny hardware overheads.
Our design results in 2.8$times$ speedup and 2.5$times$ energy efficiency improvement over the state-of-the-art quantization accelerators.
arXiv Detail & Related papers (2022-08-30T14:12:49Z) - A Low-Complexity Approach to Rate-Distortion Optimized Variable Bit-Rate
Compression for Split DNN Computing [5.3221129103999125]
Split computing has emerged as a recent paradigm for implementation of DNN-based AI workloads.
We present an approach that addresses the challenge of optimizing the rate-accuracy-complexity trade-off.
Our approach is remarkably lightweight, both during training and inference, highly effective and achieves excellent rate-distortion performance.
arXiv Detail & Related papers (2022-08-24T15:02:11Z) - NumS: Scalable Array Programming for the Cloud [82.827921577004]
We present NumS, an array programming library which optimize NumPy-like expressions on task-based distributed systems.
This is achieved through a novel scheduler called Load Simulated Hierarchical Scheduling (LSHS)
We show that LSHS enhances performance on Ray by decreasing network load by a factor of 2x, requiring 4x less memory, and reducing execution time by 10x on the logistic regression problem.
arXiv Detail & Related papers (2022-06-28T20:13:40Z) - FPGA-optimized Hardware acceleration for Spiking Neural Networks [69.49429223251178]
This work presents the development of a hardware accelerator for an SNN, with off-line training, applied to an image recognition task.
The design targets a Xilinx Artix-7 FPGA, using in total around the 40% of the available hardware resources.
It reduces the classification time by three orders of magnitude, with a small 4.5% impact on the accuracy, if compared to its software, full precision counterpart.
arXiv Detail & Related papers (2022-01-18T13:59:22Z) - An Adaptive Device-Edge Co-Inference Framework Based on Soft
Actor-Critic [72.35307086274912]
High-dimension parameter model and large-scale mathematical calculation restrict execution efficiency, especially for Internet of Things (IoT) devices.
We propose a new Deep Reinforcement Learning (DRL)-Soft Actor Critic for discrete (SAC-d), which generates the emphexit point, emphexit point, and emphcompressing bits by soft policy iterations.
Based on the latency and accuracy aware reward design, such an computation can well adapt to the complex environment like dynamic wireless channel and arbitrary processing, and is capable of supporting the 5G URL
arXiv Detail & Related papers (2022-01-09T09:31:50Z) - Quantized Neural Networks via {-1, +1} Encoding Decomposition and
Acceleration [83.84684675841167]
We propose a novel encoding scheme using -1, +1 to decompose quantized neural networks (QNNs) into multi-branch binary networks.
We validate the effectiveness of our method on large-scale image classification, object detection, and semantic segmentation tasks.
arXiv Detail & Related papers (2021-06-18T03:11:15Z) - CoSA: Scheduling by Constrained Optimization for Spatial Accelerators [1.9149970150912705]
We present CoSA, a constrained-optimization-based approach for scheduling Deep Neural Networks (DNNs) accelerators.
As opposed to existing approaches that either rely on designers's or iterative methods to navigate the search space, CoSA expresses scheduling decisions as a constrained-optimization problem.
We demonstrate that CoSA-generated schedules significantly outperform state-of-the-art approaches by a geometric mean of up to 2.5x.
arXiv Detail & Related papers (2021-05-05T07:17:25Z) - PatDNN: Achieving Real-Time DNN Execution on Mobile Devices with
Pattern-based Weight Pruning [57.20262984116752]
We introduce a new dimension, fine-grained pruning patterns inside the coarse-grained structures, revealing a previously unknown point in design space.
With the higher accuracy enabled by fine-grained pruning patterns, the unique insight is to use the compiler to re-gain and guarantee high hardware efficiency.
arXiv Detail & Related papers (2020-01-01T04:52:07Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.