MGG: Accelerating Graph Neural Networks with Fine-grained intra-kernel
Communication-Computation Pipelining on Multi-GPU Platforms
- URL: http://arxiv.org/abs/2209.06800v3
- Date: Tue, 27 Jun 2023 01:07:09 GMT
- Title: MGG: Accelerating Graph Neural Networks with Fine-grained intra-kernel
Communication-Computation Pipelining on Multi-GPU Platforms
- Authors: Yuke Wang, Boyuan Feng, Zheng Wang, Tong Geng, Kevin Barker, Ang Li,
and Yufei Ding
- Abstract summary: We propose MGG, a novel system design to accelerate full-graph GNNs on multi-GPU platforms.
The core of MGG is its novel dynamic software pipeline to facilitate fine-grained computation-communication overlapping within a GPU kernel.
MGG outperforms state-of-the-art full-graph GNN systems across various settings.
- Score: 28.25823488936712
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The increasing size of input graphs for graph neural networks (GNNs)
highlights the demand for using multi-GPU platforms. However, existing
multi-GPU GNN systems optimize the computation and communication individually
based on the conventional practice of scaling dense DNNs. For irregularly
sparse and fine-grained GNN workloads, such solutions miss the opportunity to
jointly schedule/optimize the computation and communication operations for
high-performance delivery. To this end, we propose MGG, a novel system design
to accelerate full-graph GNNs on multi-GPU platforms. The core of MGG is its
novel dynamic software pipeline to facilitate fine-grained
computation-communication overlapping within a GPU kernel. Specifically, MGG
introduces GNN-tailored pipeline construction and GPU-aware pipeline mapping to
facilitate workload balancing and operation overlapping. MGG also incorporates
an intelligent runtime design with analytical modeling and optimization
heuristics to dynamically improve the execution performance. Extensive
evaluation reveals that MGG outperforms state-of-the-art full-graph GNN systems
across various settings: on average 4.41X, 4.81X, and 10.83X faster than DGL,
MGG-UVM, and ROC, respectively.
Related papers
- Spatio-Spectral Graph Neural Networks [50.277959544420455]
We propose Spatio-Spectral Graph Networks (S$2$GNNs)
S$2$GNNs combine spatially and spectrally parametrized graph filters.
We show that S$2$GNNs vanquish over-squashing and yield strictly tighter approximation-theoretic error bounds than MPGNNs.
arXiv Detail & Related papers (2024-05-29T14:28:08Z) - MaxK-GNN: Extremely Fast GPU Kernel Design for Accelerating Graph Neural Networks Training [7.193336207798203]
We present MaxK-GNN, an advanced high-performance GPU training system integrating algorithm and system innovation.
Experiments show that MaxK-GNN system could approach the theoretical speedup limit according to Amdahl's law.
We achieve comparable accuracy to SOTA GNNs, but at a significantly increased speed: 3.22/4.24 times speedup (vs. theoretical limits, 5.52/7.27 times) on Reddit.
arXiv Detail & Related papers (2023-12-14T05:00:49Z) - T-GAE: Transferable Graph Autoencoder for Network Alignment [79.89704126746204]
T-GAE is a graph autoencoder framework that leverages transferability and stability of GNNs to achieve efficient network alignment without retraining.
Our experiments demonstrate that T-GAE outperforms the state-of-the-art optimization method and the best GNN approach by up to 38.7% and 50.8%, respectively.
arXiv Detail & Related papers (2023-10-05T02:58:29Z) - Communication-Free Distributed GNN Training with Vertex Cut [63.22674903170953]
CoFree-GNN is a novel distributed GNN training framework that significantly speeds up the training process by implementing communication-free training.
We demonstrate that CoFree-GNN speeds up the GNN training process by up to 10 times over the existing state-of-the-art GNN training approaches.
arXiv Detail & Related papers (2023-08-06T21:04:58Z) - DistTGL: Distributed Memory-Based Temporal Graph Neural Network Training [18.52206409432894]
DistTGL is an efficient and scalable solution to train memory-based TGNNs on distributed GPU clusters.
In experiments, DistTGL achieves near-linear convergence speedup, outperforming state-of-the-art single-machine method by 14.5% in accuracy and 10.17x in training throughput.
arXiv Detail & Related papers (2023-07-14T22:52:27Z) - Accelerating Sampling and Aggregation Operations in GNN Frameworks with
GPU Initiated Direct Storage Accesses [9.773813896475264]
Graph Neural Networks (GNNs) are emerging as a powerful tool for learning from graph-structured data.
Training GNNs on large-scale graphs remains a significant challenge due to lack of efficient data access and data movement methods.
We propose the GPU Initiated Direct Storage Access (GIDS) dataloader to enable GPU-oriented GNN training for large-scale graphs.
arXiv Detail & Related papers (2023-06-28T17:22:15Z) - Communication-Efficient Graph Neural Networks with Probabilistic
Neighborhood Expansion Analysis and Caching [59.8522166385372]
Training and inference with graph neural networks (GNNs) on massive graphs has been actively studied since the inception of GNNs.
This paper is concerned with minibatch training and inference with GNNs that employ node-wise sampling in distributed settings.
We present SALIENT++, which extends the prior state-of-the-art SALIENT system to work with partitioned feature data.
arXiv Detail & Related papers (2023-05-04T21:04:01Z) - MG-GCN: Scalable Multi-GPU GCN Training Framework [1.7188280334580197]
Full batch training of Graph Convolutional Network (GCN) models is not feasible on a single GPU for large graphs.
MG-GCN employs multiple High-Performance Computing optimizations, including efficient re-use of memory buffers.
MG-GCN achieves super-linear speedup with respect to DGL, on the Reddit graph on both DGX-1 (V100) and DGX-A100.
arXiv Detail & Related papers (2021-10-17T00:41:43Z) - Adaptive Elastic Training for Sparse Deep Learning on Heterogeneous
Multi-GPU Servers [65.60007071024629]
We show that Adaptive SGD outperforms four state-of-the-art solutions in time-to-accuracy.
We show experimentally that Adaptive SGD outperforms four state-of-the-art solutions in time-to-accuracy.
arXiv Detail & Related papers (2021-10-13T20:58:15Z) - DistGNN: Scalable Distributed Training for Large-Scale Graph Neural
Networks [58.48833325238537]
Full-batch training on Graph Neural Networks (GNN) to learn the structure of large graphs is a critical problem that needs to scale to hundreds of compute nodes to be feasible.
In this paper, we presentGNN that optimize the well-known Deep Graph Library (DGL) for full-batch training on CPU clusters.
Our results on four common GNN benchmark datasets show up to 3.7x speed-up using a single CPU socket and up to 97x speed-up using 128 CPU sockets.
arXiv Detail & Related papers (2021-04-14T08:46:35Z) - BlockGNN: Towards Efficient GNN Acceleration Using Block-Circulant
Weight Matrices [9.406007544032848]
Graph Neural Networks (GNNs) are state-of-the-art algorithms for analyzing non-euclidean graph data.
How to inference GNNs in real time has become a challenging problem for some resource-limited edge-computing platforms.
We propose BlockGNN, a software- hardware co-design approach to realize efficient GNN acceleration.
arXiv Detail & Related papers (2021-04-13T14:09:22Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.