SoD$^2$: Statically Optimizing Dynamic Deep Neural Network
- URL: http://arxiv.org/abs/2403.00176v1
- Date: Thu, 29 Feb 2024 23:04:01 GMT
- Title: SoD$^2$: Statically Optimizing Dynamic Deep Neural Network
- Authors: Wei Niu, Gagan Agrawal, Bin Ren
- Abstract summary: SoD$2$ is a comprehensive framework for optimizing Dynamic DNNs.
This framework statically determines the shapes of operators as known constants, symbolic constants, or operations on these.
We show that SoD$2$ runs up to $3.9times$ faster than these systems while saving up to $88%$ peak memory consumption.
- Score: 13.958672527377722
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Though many compilation and runtime systems have been developed for DNNs in
recent years, the focus has largely been on static DNNs. Dynamic DNNs, where
tensor shapes and sizes and even the set of operators used are dependent upon
the input and/or execution, are becoming common. This paper presents SoD$^2$, a
comprehensive framework for optimizing Dynamic DNNs. The basis of our approach
is a classification of common operators that form DNNs, and the use of this
classification towards a Rank and Dimension Propagation (RDP) method. This
framework statically determines the shapes of operators as known constants,
symbolic constants, or operations on these. Next, using RDP we enable a series
of optimizations, like fused code generation, execution (order) planning, and
even runtime memory allocation plan generation. By evaluating the framework on
10 emerging Dynamic DNNs and comparing it against several existing systems, we
demonstrate both reductions in execution latency and memory requirements, with
RDP-enabled key optimizations responsible for much of the gains. Our evaluation
results show that SoD$^2$ runs up to $3.9\times$ faster than these systems
while saving up to $88\%$ peak memory consumption.
Related papers
- Towards Hyperparameter-Agnostic DNN Training via Dynamical System
Insights [4.513581513983453]
We present a first-order optimization method specialized for deep neural networks (DNNs), ECCO-DNN.
This method models the optimization variable trajectory as a dynamical system and develops a discretization algorithm that adaptively selects step sizes based on the trajectory's shape.
arXiv Detail & Related papers (2023-10-21T03:45:13Z) - Sparse-DySta: Sparsity-Aware Dynamic and Static Scheduling for Sparse
Multi-DNN Workloads [65.47816359465155]
Running multiple deep neural networks (DNNs) in parallel has become an emerging workload in both edge devices.
We propose Dysta, a novel scheduler that utilizes both static sparsity patterns and dynamic sparsity information for the sparse multi-DNN scheduling.
Our proposed approach outperforms the state-of-the-art methods with up to 10% decrease in latency constraint violation rate and nearly 4X reduction in average normalized turnaround time.
arXiv Detail & Related papers (2023-10-17T09:25:17Z) - SENSEi: Input-Sensitive Compilation for Accelerating GNNs [7.527596018706567]
We propose SENSEi, a system that exposes different sparse and dense matrix primitive compositions based on different matrix re-associations of GNN computations.
SENSEi executes in two stages: (1) an offline compilation stage that enumerates all valid re-associations leading to different sparse-dense matrix compositions and uses input-oblivious pruning techniques to prune away clearly unprofitable candidates.
On a wide range of configurations, SENSEi achieves speedups of up to $2.012times$ and $1.85times$ on graph convolutional networks and up to $6.294times$ and $16.274
arXiv Detail & Related papers (2023-06-27T02:24:05Z) - Intelligence Processing Units Accelerate Neuromorphic Learning [52.952192990802345]
Spiking neural networks (SNNs) have achieved orders of magnitude improvement in terms of energy consumption and latency.
We present an IPU-optimized release of our custom SNN Python package, snnTorch.
arXiv Detail & Related papers (2022-11-19T15:44:08Z) - An efficient and flexible inference system for serving heterogeneous
ensembles of deep neural networks [0.0]
Ensembles of Deep Neural Networks (DNNs) have achieved qualitative predictions but they are computing and memory intensive.
We propose a new software layer to serve with flexibility and efficiency ensembles of DNNs.
arXiv Detail & Related papers (2022-08-30T08:05:43Z) - Towards Optimal VPU Compiler Cost Modeling by using Neural Networks to
Infer Hardware Performances [58.720142291102135]
'VPUNN' is a neural network-based cost model trained on low-level task profiling.
It consistently outperforms the state-of-the-art cost modeling in Intel's line of VPU processors.
arXiv Detail & Related papers (2022-05-09T22:48:39Z) - DIRA: Dynamic Domain Incremental Regularised Adaptation [2.227417514684251]
We introduce Dynamic Incremental Regularised Adaptation (DIRA) for dynamic operational domain adaptions of Deep Neural Network (DNN)
DIRA improves on the problem of forgetting and achieves strong gains in performance when retraining using a few samples from the target domain.
Our approach shows improvements on different image classification benchmarks aimed at evaluating robustness to distribution shifts.
arXiv Detail & Related papers (2022-04-30T03:46:03Z) - PolyDL: Polyhedral Optimizations for Creation of High Performance DL
primitives [55.79741270235602]
We present compiler algorithms to automatically generate high performance implementations of Deep Learning primitives.
We develop novel data reuse analysis algorithms using the polyhedral model.
We also show that such a hybrid compiler plus a minimal library-use approach results in state-of-the-art performance.
arXiv Detail & Related papers (2020-06-02T06:44:09Z) - $\Pi-$nets: Deep Polynomial Neural Networks [86.36557534288535]
$Pi$-Nets are neural networks in which the output is a high-order of the input.
We empirically demonstrate that $Pi$-Nets have better representation power than standard DCNNs.
Our framework elucidates why recent generative models, such as StyleGAN, improve upon their predecessors.
arXiv Detail & Related papers (2020-03-08T18:48:43Z) - PatDNN: Achieving Real-Time DNN Execution on Mobile Devices with
Pattern-based Weight Pruning [57.20262984116752]
We introduce a new dimension, fine-grained pruning patterns inside the coarse-grained structures, revealing a previously unknown point in design space.
With the higher accuracy enabled by fine-grained pruning patterns, the unique insight is to use the compiler to re-gain and guarantee high hardware efficiency.
arXiv Detail & Related papers (2020-01-01T04:52:07Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.