GNNPipe: Scaling Deep GNN Training with Pipelined Model Parallelism
- URL: http://arxiv.org/abs/2308.10087v2
- Date: Sun, 24 Sep 2023 17:04:05 GMT
- Title: GNNPipe: Scaling Deep GNN Training with Pipelined Model Parallelism
- Authors: Jingji Chen, Zhuoming Chen, Xuehai Qian
- Abstract summary: Communication is a key bottleneck for distributed graph neural network (GNN) training.
GNNPipe is a new approach that scales the distributed full-graph deep GNN training.
- Score: 10.723541176359452
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Communication is a key bottleneck for distributed graph neural network (GNN)
training. This paper proposes GNNPipe, a new approach that scales the
distributed full-graph deep GNN training. Being the first to use layer-level
model parallelism for GNN training, GNNPipe partitions GNN layers among GPUs,
each device performs the computation for a disjoint subset of consecutive GNN
layers on the whole graph. Compared to graph parallelism with each GPU handling
a graph partition, GNNPipe reduces the communication volume by a factor of the
number of GNN layers. GNNPipe overcomes the unique challenges for pipelined
layer-level model parallelism on the whole graph by partitioning it into
dependent chunks, allowing the use of historical vertex embeddings, and
applying specific training techniques to ensure convergence. We also propose a
hybrid approach by combining GNNPipe with graph parallelism to handle large
graphs, achieve better computer resource utilization and ensure model
convergence. We build a general GNN training system supporting all three
parallelism setting. Extensive experiments show that our method reduces the
per-epoch training time by up to 2.45x (on average 1.58x) and reduces the
communication volume and overhead by up to 22.89x and 27.21x (on average 8.69x
and 11.60x), respectively, while achieving a comparable level of model accuracy
and convergence speed compared to graph parallelism.
Related papers
- Learning to Reweight for Graph Neural Network [63.978102332612906]
Graph Neural Networks (GNNs) show promising results for graph tasks.
Existing GNNs' generalization ability will degrade when there exist distribution shifts between testing and training graph data.
We propose a novel nonlinear graph decorrelation method, which can substantially improve the out-of-distribution generalization ability.
arXiv Detail & Related papers (2023-12-19T12:25:10Z) - T-GAE: Transferable Graph Autoencoder for Network Alignment [79.89704126746204]
T-GAE is a graph autoencoder framework that leverages transferability and stability of GNNs to achieve efficient network alignment without retraining.
Our experiments demonstrate that T-GAE outperforms the state-of-the-art optimization method and the best GNN approach by up to 38.7% and 50.8%, respectively.
arXiv Detail & Related papers (2023-10-05T02:58:29Z) - Communication-Free Distributed GNN Training with Vertex Cut [63.22674903170953]
CoFree-GNN is a novel distributed GNN training framework that significantly speeds up the training process by implementing communication-free training.
We demonstrate that CoFree-GNN speeds up the GNN training process by up to 10 times over the existing state-of-the-art GNN training approaches.
arXiv Detail & Related papers (2023-08-06T21:04:58Z) - Graph Ladling: Shockingly Simple Parallel GNN Training without
Intermediate Communication [100.51884192970499]
GNNs are a powerful family of neural networks for learning over graphs.
scaling GNNs either by deepening or widening suffers from prevalent issues of unhealthy gradients, over-smoothening, information squashing.
We propose not to deepen or widen current GNNs, but instead present a data-centric perspective of model soups tailored for GNNs.
arXiv Detail & Related papers (2023-06-18T03:33:46Z) - Training Graph Neural Networks on Growing Stochastic Graphs [114.75710379125412]
Graph Neural Networks (GNNs) rely on graph convolutions to exploit meaningful patterns in networked data.
We propose to learn GNNs on very large graphs by leveraging the limit object of a sequence of growing graphs, the graphon.
arXiv Detail & Related papers (2022-10-27T16:00:45Z) - Distributed Graph Neural Network Training with Periodic Historical
Embedding Synchronization [9.503080586294406]
Graph Neural Networks (GNNs) are prevalent in various applications such as social network, recommender systems, and knowledge graphs.
Traditional sampling-based methods accelerate GNN by dropping edges and nodes, which impairs the graph integrity and model performance.
This paper proposes DIstributed Graph Embedding SynchronizaTion (DIGEST), a novel distributed GNN training framework.
arXiv Detail & Related papers (2022-05-31T18:44:53Z) - Parallel and Distributed Graph Neural Networks: An In-Depth Concurrency
Analysis [28.464210819376593]
Graph neural networks (GNNs) are among the most powerful tools in deep learning.
They routinely solve complex problems on unstructured networks, such as node classification, graph classification, or link prediction, with high accuracy.
However, both inference and training of GNNs are complex, and they uniquely combine the features of irregular graph processing with dense and regular computations.
This complexity makes it very challenging to execute GNNs efficiently on modern massively parallel architectures.
arXiv Detail & Related papers (2022-05-19T17:11:45Z) - A Unified Lottery Ticket Hypothesis for Graph Neural Networks [82.31087406264437]
We present a unified GNN sparsification (UGS) framework that simultaneously prunes the graph adjacency matrix and the model weights.
We further generalize the popular lottery ticket hypothesis to GNNs for the first time, by defining a graph lottery ticket (GLT) as a pair of core sub-dataset and sparse sub-network.
arXiv Detail & Related papers (2021-02-12T21:52:43Z) - Accurate, Efficient and Scalable Training of Graph Neural Networks [9.569918335816963]
Graph Neural Networks (GNNs) are powerful deep learning models to generate node embeddings on graphs.
It is still challenging to perform training in an efficient and scalable way.
We propose a novel parallel training framework that reduces training workload by orders of magnitude compared with state-of-the-art minibatch methods.
arXiv Detail & Related papers (2020-10-05T22:06:23Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.