GSplit: Scaling Graph Neural Network Training on Large Graphs via Split-Parallelism
- URL: http://arxiv.org/abs/2303.13775v2
- Date: Thu, 27 Jun 2024 16:51:27 GMT
- Title: GSplit: Scaling Graph Neural Network Training on Large Graphs via Split-Parallelism
- Authors: Sandeep Polisetty, Juelin Liu, Kobi Falus, Yi Ren Fung, Seung-Hwan Lim, Hui Guan, Marco Serafini,
- Abstract summary: Mini-batch training is commonly used to train Graph neural networks (GNNs) on large graphs.
In this paper, we introduce a hybrid parallel mini-batch training paradigm called split parallelism.
We show that split parallelism outperforms state-of-the-art mini-batch training systems like DGL, Quiver, and $P3$.
- Score: 6.3568605707961
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Graph neural networks (GNNs), an emerging class of machine learning models for graphs, have gained popularity for their superior performance in various graph analytical tasks. Mini-batch training is commonly used to train GNNs on large graphs, and data parallelism is the standard approach to scale mini-batch training across multiple GPUs. One of the major performance costs in GNN training is the loading of input features, which prevents GPUs from being fully utilized. In this paper, we argue that this problem is exacerbated by redundancies that are inherent to the data parallel approach. To address this issue, we introduce a hybrid parallel mini-batch training paradigm called split parallelism. Split parallelism avoids redundant data loads and splits the sampling and training of each mini-batch across multiple GPUs online, at each iteration, using a lightweight splitting algorithm. We implement split parallelism in GSplit and show that it outperforms state-of-the-art mini-batch training systems like DGL, Quiver, and $P^3$.
Related papers
- GNNPipe: Scaling Deep GNN Training with Pipelined Model Parallelism [10.723541176359452]
Communication is a key bottleneck for distributed graph neural network (GNN) training.
GNNPipe is a new approach that scales the distributed full-graph deep GNN training.
arXiv Detail & Related papers (2023-08-19T18:44:14Z) - Communication-Free Distributed GNN Training with Vertex Cut [63.22674903170953]
CoFree-GNN is a novel distributed GNN training framework that significantly speeds up the training process by implementing communication-free training.
We demonstrate that CoFree-GNN speeds up the GNN training process by up to 10 times over the existing state-of-the-art GNN training approaches.
arXiv Detail & Related papers (2023-08-06T21:04:58Z) - DistTGL: Distributed Memory-Based Temporal Graph Neural Network Training [18.52206409432894]
DistTGL is an efficient and scalable solution to train memory-based TGNNs on distributed GPU clusters.
In experiments, DistTGL achieves near-linear convergence speedup, outperforming state-of-the-art single-machine method by 14.5% in accuracy and 10.17x in training throughput.
arXiv Detail & Related papers (2023-07-14T22:52:27Z) - Graph Ladling: Shockingly Simple Parallel GNN Training without
Intermediate Communication [100.51884192970499]
GNNs are a powerful family of neural networks for learning over graphs.
scaling GNNs either by deepening or widening suffers from prevalent issues of unhealthy gradients, over-smoothening, information squashing.
We propose not to deepen or widen current GNNs, but instead present a data-centric perspective of model soups tailored for GNNs.
arXiv Detail & Related papers (2023-06-18T03:33:46Z) - Scalable Graph Convolutional Network Training on Distributed-Memory
Systems [5.169989177779801]
Graph Convolutional Networks (GCNs) are extensively utilized for deep learning on graphs.
Since the convolution operation on graphs induces irregular memory access patterns, designing a memory- and communication-efficient parallel algorithm for GCN training poses unique challenges.
We propose a highly parallel training algorithm that scales to large processor counts.
arXiv Detail & Related papers (2022-12-09T17:51:13Z) - Accelerating Training and Inference of Graph Neural Networks with Fast
Sampling and Pipelining [58.10436813430554]
Mini-batch training of graph neural networks (GNNs) requires a lot of computation and data movement.
We argue in favor of performing mini-batch training with neighborhood sampling in a distributed multi-GPU environment.
We present a sequence of improvements to mitigate these bottlenecks, including a performance-engineered neighborhood sampler.
We also conduct an empirical analysis that supports the use of sampling for inference, showing that test accuracies are not materially compromised.
arXiv Detail & Related papers (2021-10-16T02:41:35Z) - Adaptive Elastic Training for Sparse Deep Learning on Heterogeneous
Multi-GPU Servers [65.60007071024629]
We show that Adaptive SGD outperforms four state-of-the-art solutions in time-to-accuracy.
We show experimentally that Adaptive SGD outperforms four state-of-the-art solutions in time-to-accuracy.
arXiv Detail & Related papers (2021-10-13T20:58:15Z) - DistGNN: Scalable Distributed Training for Large-Scale Graph Neural
Networks [58.48833325238537]
Full-batch training on Graph Neural Networks (GNN) to learn the structure of large graphs is a critical problem that needs to scale to hundreds of compute nodes to be feasible.
In this paper, we presentGNN that optimize the well-known Deep Graph Library (DGL) for full-batch training on CPU clusters.
Our results on four common GNN benchmark datasets show up to 3.7x speed-up using a single CPU socket and up to 97x speed-up using 128 CPU sockets.
arXiv Detail & Related papers (2021-04-14T08:46:35Z) - GIST: Distributed Training for Large-Scale Graph Convolutional Networks [18.964079367668262]
GIST is a hybrid layer and graph sampling method, which disjointly partitions the global model into several, smaller sub-GCNs.
This distributed framework improves model performance and significantly decreases wall-clock training time.
GIST seeks to enable large-scale GCN experimentation with the goal of bridging the existing gap in scale between graph machine learning and deep learning.
arXiv Detail & Related papers (2021-02-20T19:25:38Z) - Distributed Training of Graph Convolutional Networks using Subgraph
Approximation [72.89940126490715]
We propose a training strategy that mitigates the lost information across multiple partitions of a graph through a subgraph approximation scheme.
The subgraph approximation approach helps the distributed training system converge at single-machine accuracy.
arXiv Detail & Related papers (2020-12-09T09:23:49Z) - Accurate, Efficient and Scalable Training of Graph Neural Networks [9.569918335816963]
Graph Neural Networks (GNNs) are powerful deep learning models to generate node embeddings on graphs.
It is still challenging to perform training in an efficient and scalable way.
We propose a novel parallel training framework that reduces training workload by orders of magnitude compared with state-of-the-art minibatch methods.
arXiv Detail & Related papers (2020-10-05T22:06:23Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.