PiPAD: Pipelined and Parallel Dynamic GNN Training on GPUs
- URL: http://arxiv.org/abs/2301.00391v2
- Date: Thu, 5 Jan 2023 11:30:22 GMT
- Title: PiPAD: Pipelined and Parallel Dynamic GNN Training on GPUs
- Authors: Chunyang Wang, Desen Sun, Yuebin Bai
- Abstract summary: Dynamic Graph Neural Networks (DGNNs) have been broadly applied in various real-life applications, such as link prediction and pandemic forecast.
DGNNs manifest substantial parallel computation and data reuse potentials, but suffer from severe memory access inefficiency and data transfer overhead.
We propose PiPAD, a $underlinetextbfPipelined$ and $underlinetextbfDGNN$ training framework for the end-to-end performance optimization on GPUs.
- Score: 3.3019914257038168
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Dynamic Graph Neural Networks (DGNNs) have been broadly applied in various
real-life applications, such as link prediction and pandemic forecast, to
capture both static structural information and temporal characteristics from
dynamic graphs. Combining both time-dependent and -independent components,
DGNNs manifest substantial parallel computation and data reuse potentials, but
suffer from severe memory access inefficiency and data transfer overhead under
the canonical one-graph-at-a-time training pattern. To tackle the challenges,
we propose PiPAD, a $\underline{\textbf{Pi}}pelined$ and
$\underline{\textbf{PA}}rallel$ $\underline{\textbf{D}}GNN$ training framework
for the end-to-end performance optimization on GPUs. From both the algorithm
and runtime level, PiPAD holistically reconstructs the overall training
paradigm from the data organization to computation manner. Capable of
processing multiple graph snapshots in parallel, PiPAD eliminates the
unnecessary data transmission and alleviates memory access inefficiency to
improve the overall performance. Our evaluation across various datasets shows
PiPAD achieves $1.22\times$-$9.57\times$ speedup over the state-of-the-art DGNN
frameworks on three representative models.
Related papers
- MassiveGNN: Efficient Training via Prefetching for Massively Connected Distributed Graphs [11.026326555186333]
This paper develops a parameterized continuous prefetch and eviction scheme on top of the state-of-the-art Amazon DistDGL distributed GNN framework.
It demonstrates about 15-40% improvement in end-to-end training performance on the National Energy Research Scientific Computing Center's (NERSC) Perlmutter supercomputer.
arXiv Detail & Related papers (2024-10-30T05:10:38Z) - DCP: Learning Accelerator Dataflow for Neural Network via Propagation [52.06154296196845]
This work proposes an efficient data-centric approach, named Dataflow Code Propagation (DCP), to automatically find the optimal dataflow for DNN layers in seconds without human effort.
DCP learns a neural predictor to efficiently update the dataflow codes towards the desired gradient directions to minimize various optimization objectives.
For example, without using additional training data, DCP surpasses the GAMMA method that performs a full search using thousands of samples.
arXiv Detail & Related papers (2024-10-09T05:16:44Z) - GNNFlow: A Distributed Framework for Continuous Temporal GNN Learning on
Dynamic Graphs [11.302970701867844]
We introduce GNNFlow, a distributed framework for efficient continuous temporal graph representation learning.
GNNFlow supports distributed training across multiple machines with static scheduling to ensure load balance.
Our experimental results show that GNNFlow provides up to 21.1x faster continuous learning than existing systems.
arXiv Detail & Related papers (2023-11-29T07:30:32Z) - SPEED: Streaming Partition and Parallel Acceleration for Temporal
Interaction Graph Embedding [22.68416593780539]
We introduce a novel training approach namely Streaming Edge Partitioning and Parallel Acceleration for Temporal Interaction Graph Embedding.
Our method can achieve a good balance in computing resources, computing time, and downstream task performance.
Empirical validation across 7 real-world datasets demonstrates the potential to expedite training speeds by a factor of up to 19.29x.
arXiv Detail & Related papers (2023-08-27T15:11:44Z) - Communication-Free Distributed GNN Training with Vertex Cut [63.22674903170953]
CoFree-GNN is a novel distributed GNN training framework that significantly speeds up the training process by implementing communication-free training.
We demonstrate that CoFree-GNN speeds up the GNN training process by up to 10 times over the existing state-of-the-art GNN training approaches.
arXiv Detail & Related papers (2023-08-06T21:04:58Z) - DistTGL: Distributed Memory-Based Temporal Graph Neural Network Training [18.52206409432894]
DistTGL is an efficient and scalable solution to train memory-based TGNNs on distributed GPU clusters.
In experiments, DistTGL achieves near-linear convergence speedup, outperforming state-of-the-art single-machine method by 14.5% in accuracy and 10.17x in training throughput.
arXiv Detail & Related papers (2023-07-14T22:52:27Z) - Temporal Aggregation and Propagation Graph Neural Networks for Dynamic
Representation [67.26422477327179]
Temporal graphs exhibit dynamic interactions between nodes over continuous time.
We propose a novel method of temporal graph convolution with the whole neighborhood.
Our proposed TAP-GNN outperforms existing temporal graph methods by a large margin in terms of both predictive performance and online inference latency.
arXiv Detail & Related papers (2023-04-15T08:17:18Z) - Dynamic Graph Message Passing Networks for Visual Recognition [112.49513303433606]
Modelling long-range dependencies is critical for scene understanding tasks in computer vision.
A fully-connected graph is beneficial for such modelling, but its computational overhead is prohibitive.
We propose a dynamic graph message passing network, that significantly reduces the computational complexity.
arXiv Detail & Related papers (2022-09-20T14:41:37Z) - Efficient Dynamic Graph Representation Learning at Scale [66.62859857734104]
We propose Efficient Dynamic Graph lEarning (EDGE), which selectively expresses certain temporal dependency via training loss to improve the parallelism in computations.
We show that EDGE can scale to dynamic graphs with millions of nodes and hundreds of millions of temporal events and achieve new state-of-the-art (SOTA) performance.
arXiv Detail & Related papers (2021-12-14T22:24:53Z) - Scheduling Optimization Techniques for Neural Network Training [3.1617796705744547]
This paper proposes out-of-order (ooo) backprop, an effective scheduling technique for neural network training.
We show that the GPU utilization in single-GPU, data-parallel, and pipeline-parallel training can be commonly improve by applying ooo backprop.
arXiv Detail & Related papers (2021-10-03T05:45:06Z) - DS-Net++: Dynamic Weight Slicing for Efficient Inference in CNNs and
Transformers [105.74546828182834]
We show a hardware-efficient dynamic inference regime, named dynamic weight slicing, which adaptively slice a part of network parameters for inputs with diverse difficulty levels.
We present dynamic slimmable network (DS-Net) and dynamic slice-able network (DS-Net++) by input-dependently adjusting filter numbers of CNNs and multiple dimensions in both CNNs and transformers.
arXiv Detail & Related papers (2021-09-21T09:57:21Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.