SplitBrain: Hybrid Data and Model Parallel Deep Learning
- URL: http://arxiv.org/abs/2112.15317v1
- Date: Fri, 31 Dec 2021 06:25:38 GMT
- Title: SplitBrain: Hybrid Data and Model Parallel Deep Learning
- Authors: Farley Lai, Asim Kadav, Erik Kruus
- Abstract summary: This paper presents SplitBrain, a high performance distributed deep learning framework supporting hybrid data and model parallelism.
Specifically, SplitBrain provides layer-specific partitioning that co-locates compute intensive convolutional layers while sharding memory demanding layers.
Results show that SplitBrain can achieve nearly linear speedup while saving up to 67% of memory consumption for data and model parallel VGG over CIFAR-10.
- Score: 11.63431725146897
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The recent success of deep learning applications has coincided with those
widely available powerful computational resources for training sophisticated
machine learning models with huge datasets. Nonetheless, training large models
such as convolutional neural networks using model parallelism (as opposed to
data parallelism) is challenging because the complex nature of communication
between model shards makes it difficult to partition the computation
efficiently across multiple machines with an acceptable trade-off. This paper
presents SplitBrain, a high performance distributed deep learning framework
supporting hybrid data and model parallelism. Specifically, SplitBrain provides
layer-specific partitioning that co-locates compute intensive convolutional
layers while sharding memory demanding layers. A novel scalable group
communication is proposed to further improve the training throughput with
reduced communication overhead. The results show that SplitBrain can achieve
nearly linear speedup while saving up to 67\% of memory consumption for data
and model parallel VGG over CIFAR-10.
Related papers
- Partitioned Neural Network Training via Synthetic Intermediate Labels [0.0]
GPU memory constraints have become a notable bottleneck in training such sizable models.
This study advocates partitioning the model across GPU and generating synthetic intermediate labels to train individual segments.
This approach results in a more efficient training process that minimizes data communication while maintaining model accuracy.
arXiv Detail & Related papers (2024-03-17T13:06:29Z) - Towards a Better Theoretical Understanding of Independent Subnetwork Training [56.24689348875711]
We take a closer theoretical look at Independent Subnetwork Training (IST)
IST is a recently proposed and highly effective technique for solving the aforementioned problems.
We identify fundamental differences between IST and alternative approaches, such as distributed methods with compressed communication.
arXiv Detail & Related papers (2023-06-28T18:14:22Z) - Training Deep Surrogate Models with Large Scale Online Learning [48.7576911714538]
Deep learning algorithms have emerged as a viable alternative for obtaining fast solutions for PDEs.
Models are usually trained on synthetic data generated by solvers, stored on disk and read back for training.
It proposes an open source online training framework for deep surrogate models.
arXiv Detail & Related papers (2023-06-28T12:02:27Z) - In Situ Framework for Coupling Simulation and Machine Learning with
Application to CFD [51.04126395480625]
Recent years have seen many successful applications of machine learning (ML) to facilitate fluid dynamic computations.
As simulations grow, generating new training datasets for traditional offline learning creates I/O and storage bottlenecks.
This work offers a solution by simplifying this coupling and enabling in situ training and inference on heterogeneous clusters.
arXiv Detail & Related papers (2023-06-22T14:07:54Z) - SWARM Parallelism: Training Large Models Can Be Surprisingly
Communication-Efficient [69.61083127540776]
Deep learning applications benefit from using large models with billions of parameters.
Training these models is notoriously expensive due to the need for specialized HPC clusters.
We consider alternative setups for training large models: using cheap "preemptible" instances or pooling existing resources from multiple regions.
arXiv Detail & Related papers (2023-01-27T18:55:19Z) - An Oracle for Guiding Large-Scale Model/Hybrid Parallel Training of
Convolutional Neural Networks [0.3653697742557465]
We analyze the compute, communication, and memory requirements of Convolutional Neural Networks (CNNs)
We leverage our model-driven analysis to be the basis for an oracle utility which can help in detecting the limitations and bottlenecks of different parallelism approaches at scale.
arXiv Detail & Related papers (2021-04-19T06:45:51Z) - Automatic Graph Partitioning for Very Large-scale Deep Learning [4.472135966077758]
This work proposes RaNNC (Rapid Neural Network Connector) as for automatic hybrid parallelism.
RaNNC automatically partitions the model into a set of subcomponents so that each subcomponent fits a device memory.
RaNNC successfully trained models five times larger than those Megatron-LM could, and RaNNC's training throughputs were comparable to Megatron-LM's when pre-training the same models.
arXiv Detail & Related papers (2021-03-30T04:26:04Z) - Parallel Training of Deep Networks with Local Updates [84.30918922367442]
Local parallelism is a framework which parallelizes training of individual layers in deep networks by replacing global backpropagation with truncated layer-wise backpropagation.
We show results in both vision and language domains across a diverse set of architectures, and find that local parallelism is particularly effective in the high-compute regime.
arXiv Detail & Related papers (2020-12-07T16:38:45Z) - Scaling Distributed Deep Learning Workloads beyond the Memory Capacity
with KARMA [58.040931661693925]
We propose a strategy that combines redundant recomputing and out-of-core methods.
We achieve an average of 1.52x speedup in six different models over the state-of-the-art out-of-core methods.
Our data parallel out-of-core solution can outperform complex hybrid model parallelism in training large models, e.g. Megatron-LM and Turning-NLG.
arXiv Detail & Related papers (2020-08-26T07:24:34Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.