DiviML: A Module-based Heuristic for Mapping Neural Networks onto
Heterogeneous Platforms
- URL: http://arxiv.org/abs/2308.00127v2
- Date: Wed, 2 Aug 2023 00:21:42 GMT
- Title: DiviML: A Module-based Heuristic for Mapping Neural Networks onto
Heterogeneous Platforms
- Authors: Yassine Ghannane and Mohamed S. Abdelfattah
- Abstract summary: We develop an approach for compiler-level partitioning of deep neural networks (DNNs) onto multiple interconnected hardware devices.
Our scheduler integrates both an exact solver, through a mixed integer linear programming (MILP) formulation, and a modularity-based runtime.
We show how we can extend our framework to schedule large language models across multiple heterogeneous servers.
- Score: 5.970091958678456
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Datacenters are increasingly becoming heterogeneous, and are starting to
include specialized hardware for networking, video processing, and especially
deep learning. To leverage the heterogeneous compute capability of modern
datacenters, we develop an approach for compiler-level partitioning of deep
neural networks (DNNs) onto multiple interconnected hardware devices. We
present a general framework for heterogeneous DNN compilation, offering
automatic partitioning and device mapping. Our scheduler integrates both an
exact solver, through a mixed integer linear programming (MILP) formulation,
and a modularity-based heuristic for scalability. Furthermore, we propose a
theoretical lower bound formula for the optimal solution, which enables the
assessment of the heuristic solutions' quality. We evaluate our scheduler in
optimizing both conventional DNNs and randomly-wired neural networks, subject
to latency and throughput constraints, on a heterogeneous system comprised of a
CPU and two distinct GPUs. Compared to na\"ively running DNNs on the fastest
GPU, he proposed framework can achieve more than 3$\times$ times lower latency
and up to 2.9$\times$ higher throughput by automatically leveraging both data
and model parallelism to deploy DNNs on our sample heterogeneous server node.
Moreover, our modularity-based "splitting" heuristic improves the solution
runtime up to 395$\times$ without noticeably sacrificing solution quality
compared to an exact MILP solution, and outperforms all other heuristics by
30-60% solution quality. Finally, our case study shows how we can extend our
framework to schedule large language models across multiple heterogeneous
servers by exploiting symmetry in the hardware setup. Our code can be easily
plugged in to existing frameworks, and is available at
https://github.com/abdelfattah-lab/diviml.
Related papers
- FusionLLM: A Decentralized LLM Training System on Geo-distributed GPUs with Adaptive Compression [55.992528247880685]
Decentralized training faces significant challenges regarding system design and efficiency.
We present FusionLLM, a decentralized training system designed and implemented for training large deep neural networks (DNNs)
We show that our system and method can achieve 1.45 - 9.39x speedup compared to baseline methods while ensuring convergence.
arXiv Detail & Related papers (2024-10-16T16:13:19Z) - RNC: Efficient RRAM-aware NAS and Compilation for DNNs on Resource-Constrained Edge Devices [0.30458577208819987]
We aim to develop edge-friendly deep neural networks (DNNs) for accelerators based on resistive random-access memory (RRAM)
We propose an edge compilation and resource-constrained RRAM-aware neural architecture search (NAS) framework to search for optimized neural networks meeting specific hardware constraints.
The resulting model from NAS optimized for speed achieved 5x-30x speedup.
arXiv Detail & Related papers (2024-09-27T15:35:36Z) - Optimizing DNN Inference on Multi-Accelerator SoCs at Training-time [5.05866540830123]
We present ODiMO, a hardware-aware tool that efficiently explores fine-grain mapping of Deep Neural Networks (DNNs) among various on-chip CUs.
We show that ODiMO reduces the latency of a DNN executed on the Darkside by up to 8x at iso-accuracy, compared to a manual mappings.
When targeting energy, ODiMO produced up to 50.8x more efficient mappings, with minimal accuracy drop.
arXiv Detail & Related papers (2024-09-27T09:10:44Z) - Accelerating Split Federated Learning over Wireless Communication
Networks [17.97006656280742]
We consider a split federated learning (SFL) framework that combines the parallel model training mechanism of federated learning (FL) and the model splitting structure of split learning (SL)
We formulate a joint problem of split point selection and bandwidth allocation to minimize the system latency.
Experiment results demonstrate the superiority of our work in latency reduction and accuracy improvement.
arXiv Detail & Related papers (2023-10-24T07:49:56Z) - Receptive Field-based Segmentation for Distributed CNN Inference
Acceleration in Collaborative Edge Computing [93.67044879636093]
We study inference acceleration using distributed convolutional neural networks (CNNs) in collaborative edge computing network.
We propose a novel collaborative edge computing using fused-layer parallelization to partition a CNN model into multiple blocks of convolutional layers.
arXiv Detail & Related papers (2022-07-22T18:38:11Z) - Instant Neural Graphics Primitives with a Multiresolution Hash Encoding [67.33850633281803]
We present a versatile new input encoding that permits the use of a smaller network without sacrificing quality.
A small neural network is augmented by a multiresolution hash table of trainable feature vectors whose values are optimized through a gradient descent.
We achieve a combined speed of several orders of magnitude, enabling training of high-quality neural graphics primitives in a matter of seconds.
arXiv Detail & Related papers (2022-01-16T07:22:47Z) - An Adaptive Device-Edge Co-Inference Framework Based on Soft
Actor-Critic [72.35307086274912]
High-dimension parameter model and large-scale mathematical calculation restrict execution efficiency, especially for Internet of Things (IoT) devices.
We propose a new Deep Reinforcement Learning (DRL)-Soft Actor Critic for discrete (SAC-d), which generates the emphexit point, emphexit point, and emphcompressing bits by soft policy iterations.
Based on the latency and accuracy aware reward design, such an computation can well adapt to the complex environment like dynamic wireless channel and arbitrary processing, and is capable of supporting the 5G URL
arXiv Detail & Related papers (2022-01-09T09:31:50Z) - Solving Mixed Integer Programs Using Neural Networks [57.683491412480635]
This paper applies learning to the two key sub-tasks of a MIP solver, generating a high-quality joint variable assignment, and bounding the gap in objective value between that assignment and an optimal one.
Our approach constructs two corresponding neural network-based components, Neural Diving and Neural Branching, to use in a base MIP solver such as SCIP.
We evaluate our approach on six diverse real-world datasets, including two Google production datasets and MIPLIB, by training separate neural networks on each.
arXiv Detail & Related papers (2020-12-23T09:33:11Z) - PatDNN: Achieving Real-Time DNN Execution on Mobile Devices with
Pattern-based Weight Pruning [57.20262984116752]
We introduce a new dimension, fine-grained pruning patterns inside the coarse-grained structures, revealing a previously unknown point in design space.
With the higher accuracy enabled by fine-grained pruning patterns, the unique insight is to use the compiler to re-gain and guarantee high hardware efficiency.
arXiv Detail & Related papers (2020-01-01T04:52:07Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.