Related papers: SoD$^2$: Statically Optimizing Dynamic Deep Neural Network

SoD$^2$: Statically Optimizing Dynamic Deep Neural Network

URL: http://arxiv.org/abs/2403.00176v1
Date: Thu, 29 Feb 2024 23:04:01 GMT
Title: SoD$^2$: Statically Optimizing Dynamic Deep Neural Network
Authors: Wei Niu, Gagan Agrawal, Bin Ren
Abstract summary: SoD$2$ is a comprehensive framework for optimizing Dynamic DNNs. This framework statically determines the shapes of operators as known constants, symbolic constants, or operations on these. We show that SoD$2$ runs up to $3.9times$ faster than these systems while saving up to $88%$ peak memory consumption.
Score: 13.958672527377722
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Though many compilation and runtime systems have been developed for DNNs in recent years, the focus has largely been on static DNNs. Dynamic DNNs, where tensor shapes and sizes and even the set of operators used are dependent upon the input and/or execution, are becoming common. This paper presents SoD$^2$, a comprehensive framework for optimizing Dynamic DNNs. The basis of our approach is a classification of common operators that form DNNs, and the use of this classification towards a Rank and Dimension Propagation (RDP) method. This framework statically determines the shapes of operators as known constants, symbolic constants, or operations on these. Next, using RDP we enable a series of optimizations, like fused code generation, execution (order) planning, and even runtime memory allocation plan generation. By evaluating the framework on 10 emerging Dynamic DNNs and comparing it against several existing systems, we demonstrate both reductions in execution latency and memory requirements, with RDP-enabled key optimizations responsible for much of the gains. Our evaluation results show that SoD$^2$ runs up to $3.9\times$ faster than these systems while saving up to $88\%$ peak memory consumption.

Related papers

An Attempt to Devise a Pairwise Ising-Type Maximum Entropy Model Integrated Cost Function for Optimizing SNN Deployment [0.0]
Spiking Neural Networks (SNNs) emulate the spiking behavior of biological neurons and are typically deployed on distributed-memory neuromorphic hardware. We model SNN dynamics using an Ising-type pairwise interaction framework, bridging microscopic neuron interactions with macroscopic network behavior. We evaluate our approach on two SNNs deployed on the sPyNNaker neuromorphic platform.
arXiv Detail & Related papers (2024-07-09T16:33:43Z)
Towards Hyperparameter-Agnostic DNN Training via Dynamical System Insights [4.513581513983453]
We present a first-order optimization method specialized for deep neural networks (DNNs), ECCO-DNN. This method models the optimization variable trajectory as a dynamical system and develops a discretization algorithm that adaptively selects step sizes based on the trajectory's shape.
arXiv Detail & Related papers (2023-10-21T03:45:13Z)
Sparse-DySta: Sparsity-Aware Dynamic and Static Scheduling for Sparse Multi-DNN Workloads [65.47816359465155]
Running multiple deep neural networks (DNNs) in parallel has become an emerging workload in both edge devices. We propose Dysta, a novel scheduler that utilizes both static sparsity patterns and dynamic sparsity information for the sparse multi-DNN scheduling. Our proposed approach outperforms the state-of-the-art methods with up to 10% decrease in latency constraint violation rate and nearly 4X reduction in average normalized turnaround time.
arXiv Detail & Related papers (2023-10-17T09:25:17Z)
SENSEi: Input-Sensitive Compilation for Accelerating GNNs [7.527596018706567]
We propose SENSEi, a system that exposes different sparse and dense matrix primitive compositions based on different matrix re-associations of GNN computations. SENSEi executes in two stages: (1) an offline compilation stage that enumerates all valid re-associations leading to different sparse-dense matrix compositions and uses input-oblivious pruning techniques to prune away clearly unprofitable candidates. On a wide range of configurations, SENSEi achieves speedups of up to $2.012times$ and $1.85times$ on graph convolutional networks and up to $6.294times$ and $16.274
arXiv Detail & Related papers (2023-06-27T02:24:05Z)
Intelligence Processing Units Accelerate Neuromorphic Learning [52.952192990802345]
Spiking neural networks (SNNs) have achieved orders of magnitude improvement in terms of energy consumption and latency. We present an IPU-optimized release of our custom SNN Python package, snnTorch.
arXiv Detail & Related papers (2022-11-19T15:44:08Z)
An efficient and flexible inference system for serving heterogeneous ensembles of deep neural networks [0.0]
Ensembles of Deep Neural Networks (DNNs) have achieved qualitative predictions but they are computing and memory intensive. We propose a new software layer to serve with flexibility and efficiency ensembles of DNNs.
arXiv Detail & Related papers (2022-08-30T08:05:43Z)
Towards Optimal VPU Compiler Cost Modeling by using Neural Networks to Infer Hardware Performances [58.720142291102135]
'VPUNN' is a neural network-based cost model trained on low-level task profiling. It consistently outperforms the state-of-the-art cost modeling in Intel's line of VPU processors.
arXiv Detail & Related papers (2022-05-09T22:48:39Z)
DIRA: Dynamic Domain Incremental Regularised Adaptation [2.227417514684251]
We introduce Dynamic Incremental Regularised Adaptation (DIRA) for dynamic operational domain adaptions of Deep Neural Network (DNN) DIRA improves on the problem of forgetting and achieves strong gains in performance when retraining using a few samples from the target domain. Our approach shows improvements on different image classification benchmarks aimed at evaluating robustness to distribution shifts.
arXiv Detail & Related papers (2022-04-30T03:46:03Z)
PolyDL: Polyhedral Optimizations for Creation of High Performance DL primitives [55.79741270235602]
We present compiler algorithms to automatically generate high performance implementations of Deep Learning primitives. We develop novel data reuse analysis algorithms using the polyhedral model. We also show that such a hybrid compiler plus a minimal library-use approach results in state-of-the-art performance.
arXiv Detail & Related papers (2020-06-02T06:44:09Z)
$\Pi-$nets: Deep Polynomial Neural Networks [86.36557534288535]
$Pi$-Nets are neural networks in which the output is a high-order of the input. We empirically demonstrate that $Pi$-Nets have better representation power than standard DCNNs. Our framework elucidates why recent generative models, such as StyleGAN, improve upon their predecessors.
arXiv Detail & Related papers (2020-03-08T18:48:43Z)
PatDNN: Achieving Real-Time DNN Execution on Mobile Devices with Pattern-based Weight Pruning [57.20262984116752]
We introduce a new dimension, fine-grained pruning patterns inside the coarse-grained structures, revealing a previously unknown point in design space. With the higher accuracy enabled by fine-grained pruning patterns, the unique insight is to use the compiler to re-gain and guarantee high hardware efficiency.
arXiv Detail & Related papers (2020-01-01T04:52:07Z)

This list is automatically generated from the titles and abstracts of the papers in this site.