DiviML: A Module-based Heuristic for Mapping Neural Networks onto
Heterogeneous Platforms
- URL: http://arxiv.org/abs/2308.00127v2
- Date: Wed, 2 Aug 2023 00:21:42 GMT
- Title: DiviML: A Module-based Heuristic for Mapping Neural Networks onto
Heterogeneous Platforms
- Authors: Yassine Ghannane and Mohamed S. Abdelfattah
- Abstract summary: We develop an approach for compiler-level partitioning of deep neural networks (DNNs) onto multiple interconnected hardware devices.
Our scheduler integrates both an exact solver, through a mixed integer linear programming (MILP) formulation, and a modularity-based runtime.
We show how we can extend our framework to schedule large language models across multiple heterogeneous servers.
- Score: 5.970091958678456
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Datacenters are increasingly becoming heterogeneous, and are starting to
include specialized hardware for networking, video processing, and especially
deep learning. To leverage the heterogeneous compute capability of modern
datacenters, we develop an approach for compiler-level partitioning of deep
neural networks (DNNs) onto multiple interconnected hardware devices. We
present a general framework for heterogeneous DNN compilation, offering
automatic partitioning and device mapping. Our scheduler integrates both an
exact solver, through a mixed integer linear programming (MILP) formulation,
and a modularity-based heuristic for scalability. Furthermore, we propose a
theoretical lower bound formula for the optimal solution, which enables the
assessment of the heuristic solutions' quality. We evaluate our scheduler in
optimizing both conventional DNNs and randomly-wired neural networks, subject
to latency and throughput constraints, on a heterogeneous system comprised of a
CPU and two distinct GPUs. Compared to na\"ively running DNNs on the fastest
GPU, he proposed framework can achieve more than 3$\times$ times lower latency
and up to 2.9$\times$ higher throughput by automatically leveraging both data
and model parallelism to deploy DNNs on our sample heterogeneous server node.
Moreover, our modularity-based "splitting" heuristic improves the solution
runtime up to 395$\times$ without noticeably sacrificing solution quality
compared to an exact MILP solution, and outperforms all other heuristics by
30-60% solution quality. Finally, our case study shows how we can extend our
framework to schedule large language models across multiple heterogeneous
servers by exploiting symmetry in the hardware setup. Our code can be easily
plugged in to existing frameworks, and is available at
https://github.com/abdelfattah-lab/diviml.
Related papers
- Accelerating Split Federated Learning over Wireless Communication
Networks [17.97006656280742]
We consider a split federated learning (SFL) framework that combines the parallel model training mechanism of federated learning (FL) and the model splitting structure of split learning (SL)
We formulate a joint problem of split point selection and bandwidth allocation to minimize the system latency.
Experiment results demonstrate the superiority of our work in latency reduction and accuracy improvement.
arXiv Detail & Related papers (2023-10-24T07:49:56Z) - Sparse-DySta: Sparsity-Aware Dynamic and Static Scheduling for Sparse
Multi-DNN Workloads [65.47816359465155]
Running multiple deep neural networks (DNNs) in parallel has become an emerging workload in both edge devices.
We propose Dysta, a novel scheduler that utilizes both static sparsity patterns and dynamic sparsity information for the sparse multi-DNN scheduling.
Our proposed approach outperforms the state-of-the-art methods with up to 10% decrease in latency constraint violation rate and nearly 4X reduction in average normalized turnaround time.
arXiv Detail & Related papers (2023-10-17T09:25:17Z) - Combining Multi-Objective Bayesian Optimization with Reinforcement Learning for TinyML [4.2019872499238256]
We propose a novel strategy for deploying Deep Neural Networks on microcontrollers (TinyML) based on Multi-Objective Bayesian optimization (MOBOpt)
Our methodology aims at efficiently finding tradeoffs between a DNN's predictive accuracy, memory consumption on a given target system, and computational complexity.
arXiv Detail & Related papers (2023-05-23T14:31:52Z) - An efficient and flexible inference system for serving heterogeneous
ensembles of deep neural networks [0.0]
Ensembles of Deep Neural Networks (DNNs) have achieved qualitative predictions but they are computing and memory intensive.
We propose a new software layer to serve with flexibility and efficiency ensembles of DNNs.
arXiv Detail & Related papers (2022-08-30T08:05:43Z) - Receptive Field-based Segmentation for Distributed CNN Inference
Acceleration in Collaborative Edge Computing [93.67044879636093]
We study inference acceleration using distributed convolutional neural networks (CNNs) in collaborative edge computing network.
We propose a novel collaborative edge computing using fused-layer parallelization to partition a CNN model into multiple blocks of convolutional layers.
arXiv Detail & Related papers (2022-07-22T18:38:11Z) - Instant Neural Graphics Primitives with a Multiresolution Hash Encoding [67.33850633281803]
We present a versatile new input encoding that permits the use of a smaller network without sacrificing quality.
A small neural network is augmented by a multiresolution hash table of trainable feature vectors whose values are optimized through a gradient descent.
We achieve a combined speed of several orders of magnitude, enabling training of high-quality neural graphics primitives in a matter of seconds.
arXiv Detail & Related papers (2022-01-16T07:22:47Z) - An Adaptive Device-Edge Co-Inference Framework Based on Soft
Actor-Critic [72.35307086274912]
High-dimension parameter model and large-scale mathematical calculation restrict execution efficiency, especially for Internet of Things (IoT) devices.
We propose a new Deep Reinforcement Learning (DRL)-Soft Actor Critic for discrete (SAC-d), which generates the emphexit point, emphexit point, and emphcompressing bits by soft policy iterations.
Based on the latency and accuracy aware reward design, such an computation can well adapt to the complex environment like dynamic wireless channel and arbitrary processing, and is capable of supporting the 5G URL
arXiv Detail & Related papers (2022-01-09T09:31:50Z) - Solving Mixed Integer Programs Using Neural Networks [57.683491412480635]
This paper applies learning to the two key sub-tasks of a MIP solver, generating a high-quality joint variable assignment, and bounding the gap in objective value between that assignment and an optimal one.
Our approach constructs two corresponding neural network-based components, Neural Diving and Neural Branching, to use in a base MIP solver such as SCIP.
We evaluate our approach on six diverse real-world datasets, including two Google production datasets and MIPLIB, by training separate neural networks on each.
arXiv Detail & Related papers (2020-12-23T09:33:11Z) - PatDNN: Achieving Real-Time DNN Execution on Mobile Devices with
Pattern-based Weight Pruning [57.20262984116752]
We introduce a new dimension, fine-grained pruning patterns inside the coarse-grained structures, revealing a previously unknown point in design space.
With the higher accuracy enabled by fine-grained pruning patterns, the unique insight is to use the compiler to re-gain and guarantee high hardware efficiency.
arXiv Detail & Related papers (2020-01-01T04:52:07Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.