DistIR: An Intermediate Representation and Simulator for Efficient
Neural Network Distribution
- URL: http://arxiv.org/abs/2111.05426v1
- Date: Tue, 9 Nov 2021 21:32:51 GMT
- Title: DistIR: An Intermediate Representation and Simulator for Efficient
Neural Network Distribution
- Authors: Keshav Santhanam, Siddharth Krishna, Ryota Tomioka, Tim Harris, Matei
Zaharia
- Abstract summary: DistIR is a representation for distributed computation that is tailored for efficient analyses.
We show how DistIR and its simulator enable fast grid searches over complex distribution spaces spanning up to 1000+ configurations.
- Score: 15.086401550425125
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The rapidly growing size of deep neural network (DNN) models and datasets has
given rise to a variety of distribution strategies such as data, tensor-model,
pipeline parallelism, and hybrid combinations thereof. Each of these strategies
offers its own trade-offs and exhibits optimal performance across different
models and hardware topologies. Selecting the best set of strategies for a
given setup is challenging because the search space grows combinatorially, and
debugging and testing on clusters is expensive. In this work we propose DistIR,
an expressive intermediate representation for distributed DNN computation that
is tailored for efficient analyses, such as simulation. This enables
automatically identifying the top-performing strategies without having to
execute on physical hardware. Unlike prior work, DistIR can naturally express
many distribution strategies including pipeline parallelism with arbitrary
schedules. Our evaluation on MLP training and GPT-2 inference models
demonstrates how DistIR and its simulator enable fast grid searches over
complex distribution spaces spanning up to 1000+ configurations, reducing
optimization time by an order of magnitude for certain regimes.
Related papers
- Partitioned Neural Network Training via Synthetic Intermediate Labels [0.0]
GPU memory constraints have become a notable bottleneck in training such sizable models.
This study advocates partitioning the model across GPU and generating synthetic intermediate labels to train individual segments.
This approach results in a more efficient training process that minimizes data communication while maintaining model accuracy.
arXiv Detail & Related papers (2024-03-17T13:06:29Z) - Online Network Source Optimization with Graph-Kernel MAB [62.6067511147939]
We propose Grab-UCB, a graph- kernel multi-arms bandit algorithm to learn online the optimal source placement in large scale networks.
We describe the network processes with an adaptive graph dictionary model, which typically leads to sparse spectral representations.
We derive the performance guarantees that depend on network parameters, which further influence the learning curve of the sequential decision strategy.
arXiv Detail & Related papers (2023-07-07T15:03:42Z) - TAP: Accelerating Large-Scale DNN Training Through Tensor Automatic
Parallelisation [19.009600866053923]
We present a model parallelism framework TAP that automatically searches for the best data and tensor parallel schedules.
Experiments show that TAP is $20times- 160times$ faster than the state-of-the-art automatic parallelism framework.
arXiv Detail & Related papers (2023-02-01T05:22:28Z) - Partitioning Distributed Compute Jobs with Reinforcement Learning and
Graph Neural Networks [58.720142291102135]
Large-scale machine learning models are bringing advances to a broad range of fields.
Many of these models are too large to be trained on a single machine, and must be distributed across multiple devices.
We show that maximum parallelisation is sub-optimal in relation to user-critical metrics such as throughput and blocking rate.
arXiv Detail & Related papers (2023-01-31T17:41:07Z) - Intelligence Processing Units Accelerate Neuromorphic Learning [52.952192990802345]
Spiking neural networks (SNNs) have achieved orders of magnitude improvement in terms of energy consumption and latency.
We present an IPU-optimized release of our custom SNN Python package, snnTorch.
arXiv Detail & Related papers (2022-11-19T15:44:08Z) - Complexity-Driven CNN Compression for Resource-constrained Edge AI [1.6114012813668934]
We propose a novel and computationally efficient pruning pipeline by exploiting the inherent layer-level complexities of CNNs.
We define three modes of pruning, namely parameter-aware (PA), FLOPs-aware (FA), and memory-aware (MA), to introduce versatile compression of CNNs.
arXiv Detail & Related papers (2022-08-26T16:01:23Z) - Accelerating Training and Inference of Graph Neural Networks with Fast
Sampling and Pipelining [58.10436813430554]
Mini-batch training of graph neural networks (GNNs) requires a lot of computation and data movement.
We argue in favor of performing mini-batch training with neighborhood sampling in a distributed multi-GPU environment.
We present a sequence of improvements to mitigate these bottlenecks, including a performance-engineered neighborhood sampler.
We also conduct an empirical analysis that supports the use of sampling for inference, showing that test accuracies are not materially compromised.
arXiv Detail & Related papers (2021-10-16T02:41:35Z) - DBS: Dynamic Batch Size For Distributed Deep Neural Network Training [19.766163856388694]
We propose the Dynamic Batch Size (DBS) strategy for the distributedtraining of Deep Neural Networks (DNNs)
Specifically, the performance of each worker is evaluatedfirst based on the fact in the previous epoch, and then the batch size and dataset partition are dynamically adjusted.
The experimental results indicate that the proposed strategy can fully utilizethe performance of the cluster, reduce the training time, and have good robustness with disturbance by irrelevant tasks.
arXiv Detail & Related papers (2020-07-23T07:31:55Z) - Policy-GNN: Aggregation Optimization for Graph Neural Networks [60.50932472042379]
Graph neural networks (GNNs) aim to model the local graph structures and capture the hierarchical patterns by aggregating the information from neighbors.
It is a challenging task to develop an effective aggregation strategy for each node, given complex graphs and sparse features.
We propose Policy-GNN, a meta-policy framework that models the sampling procedure and message passing of GNNs into a combined learning process.
arXiv Detail & Related papers (2020-06-26T17:03:06Z) - Fitting the Search Space of Weight-sharing NAS with Graph Convolutional
Networks [100.14670789581811]
We train a graph convolutional network to fit the performance of sampled sub-networks.
With this strategy, we achieve a higher rank correlation coefficient in the selected set of candidates.
arXiv Detail & Related papers (2020-04-17T19:12:39Z) - TensorOpt: Exploring the Tradeoffs in Distributed DNN Training with
Auto-Parallelism [21.980316675614787]
A good parallelization strategy can significantly improve the efficiency or reduce the cost for the distributed training of deep neural networks (DNNs)
We propose FT, an efficient algorithm that searches for an optimal set of parallelization strategies to allow the trade-off among different objectives.
We also develop a user-friendly system, calledOpt, which allows users to run their distributed DNN training jobs without caring the details of parallelization strategies.
arXiv Detail & Related papers (2020-04-16T02:57:35Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.