An Oracle for Guiding Large-Scale Model/Hybrid Parallel Training of
Convolutional Neural Networks
- URL: http://arxiv.org/abs/2104.09075v1
- Date: Mon, 19 Apr 2021 06:45:51 GMT
- Title: An Oracle for Guiding Large-Scale Model/Hybrid Parallel Training of
Convolutional Neural Networks
- Authors: Albert Njoroge Kahira, Truong Thao Nguyen, Leonardo Bautista Gomez,
Ryousei Takano, Rosa M Badia, Mohamed Wahib
- Abstract summary: We analyze the compute, communication, and memory requirements of Convolutional Neural Networks (CNNs)
We leverage our model-driven analysis to be the basis for an oracle utility which can help in detecting the limitations and bottlenecks of different parallelism approaches at scale.
- Score: 0.3653697742557465
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Deep Neural Network (DNN) frameworks use distributed training to enable
faster time to convergence and alleviate memory capacity limitations when
training large models and/or using high dimension inputs. With the steady
increase in datasets and model sizes, model/hybrid parallelism is deemed to
have an important role in the future of distributed training of DNNs. We
analyze the compute, communication, and memory requirements of Convolutional
Neural Networks (CNNs) to understand the trade-offs between different
parallelism approaches on performance and scalability. We leverage our
model-driven analysis to be the basis for an oracle utility which can help in
detecting the limitations and bottlenecks of different parallelism approaches
at scale. We evaluate the oracle on six parallelization strategies, with four
CNN models and multiple datasets (2D and 3D), on up to 1024 GPUs. The results
demonstrate that the oracle has an average accuracy of about 86.74% when
compared to empirical results, and as high as 97.57% for data parallelism.
Related papers
- Task-Oriented Real-time Visual Inference for IoVT Systems: A Co-design Framework of Neural Networks and Edge Deployment [61.20689382879937]
Task-oriented edge computing addresses this by shifting data analysis to the edge.
Existing methods struggle to balance high model performance with low resource consumption.
We propose a novel co-design framework to optimize neural network architecture.
arXiv Detail & Related papers (2024-10-29T19:02:54Z) - Partitioned Neural Network Training via Synthetic Intermediate Labels [0.0]
GPU memory constraints have become a notable bottleneck in training such sizable models.
This study advocates partitioning the model across GPU and generating synthetic intermediate labels to train individual segments.
This approach results in a more efficient training process that minimizes data communication while maintaining model accuracy.
arXiv Detail & Related papers (2024-03-17T13:06:29Z) - Towards a Better Theoretical Understanding of Independent Subnetwork Training [56.24689348875711]
We take a closer theoretical look at Independent Subnetwork Training (IST)
IST is a recently proposed and highly effective technique for solving the aforementioned problems.
We identify fundamental differences between IST and alternative approaches, such as distributed methods with compressed communication.
arXiv Detail & Related papers (2023-06-28T18:14:22Z) - SWARM Parallelism: Training Large Models Can Be Surprisingly
Communication-Efficient [69.61083127540776]
Deep learning applications benefit from using large models with billions of parameters.
Training these models is notoriously expensive due to the need for specialized HPC clusters.
We consider alternative setups for training large models: using cheap "preemptible" instances or pooling existing resources from multiple regions.
arXiv Detail & Related papers (2023-01-27T18:55:19Z) - SplitBrain: Hybrid Data and Model Parallel Deep Learning [11.63431725146897]
This paper presents SplitBrain, a high performance distributed deep learning framework supporting hybrid data and model parallelism.
Specifically, SplitBrain provides layer-specific partitioning that co-locates compute intensive convolutional layers while sharding memory demanding layers.
Results show that SplitBrain can achieve nearly linear speedup while saving up to 67% of memory consumption for data and model parallel VGG over CIFAR-10.
arXiv Detail & Related papers (2021-12-31T06:25:38Z) - Parallel Training of Deep Networks with Local Updates [84.30918922367442]
Local parallelism is a framework which parallelizes training of individual layers in deep networks by replacing global backpropagation with truncated layer-wise backpropagation.
We show results in both vision and language domains across a diverse set of architectures, and find that local parallelism is particularly effective in the high-compute regime.
arXiv Detail & Related papers (2020-12-07T16:38:45Z) - AgEBO-Tabular: Joint Neural Architecture and Hyperparameter Search with
Autotuned Data-Parallel Training for Tabular Data [11.552769149674544]
Development of high-performing predictive models for large data sets is a challenging task.
Recent automated machine learning (AutoML) is emerging as a promising approach to automate predictive model development.
We have developed AgEBO-Tabular, an approach to combine aging evolution (AgE) and a parallel NAS method that searches over neural architecture space.
arXiv Detail & Related papers (2020-10-30T16:28:48Z) - A Linear Algebraic Approach to Model Parallelism in Deep Learning [0.0]
Training deep neural networks (DNNs) in large-cluster computing environments is increasingly necessary, as networks grow in size and complexity.
We propose a linear-algebraic approach to model parallelism in deep learning, which allows parallel distribution of any tensor in the DNN.
We build distributed DNN layers using these parallel primitives, composed with sequential layer implementations, and demonstrate their application by building and training a distributed DNN using DistDL, a PyTorch and MPI-based distributed deep learning toolkit.
arXiv Detail & Related papers (2020-06-04T19:38:05Z) - Understanding the Effects of Data Parallelism and Sparsity on Neural
Network Training [126.49572353148262]
We study two factors in neural network training: data parallelism and sparsity.
Despite their promising benefits, understanding of their effects on neural network training remains elusive.
arXiv Detail & Related papers (2020-03-25T10:49:22Z) - Model Fusion via Optimal Transport [64.13185244219353]
We present a layer-wise model fusion algorithm for neural networks.
We show that this can successfully yield "one-shot" knowledge transfer between neural networks trained on heterogeneous non-i.i.d. data.
arXiv Detail & Related papers (2019-10-12T22:07:15Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.