MG-GCN: Scalable Multi-GPU GCN Training Framework
- URL: http://arxiv.org/abs/2110.08688v1
- Date: Sun, 17 Oct 2021 00:41:43 GMT
- Title: MG-GCN: Scalable Multi-GPU GCN Training Framework
- Authors: Muhammed Fatih Bal{\i}n and Kaan Sancak and \"Umit V. \c{C}ataly\"urek
- Abstract summary: Full batch training of Graph Convolutional Network (GCN) models is not feasible on a single GPU for large graphs.
MG-GCN employs multiple High-Performance Computing optimizations, including efficient re-use of memory buffers.
MG-GCN achieves super-linear speedup with respect to DGL, on the Reddit graph on both DGX-1 (V100) and DGX-A100.
- Score: 1.7188280334580197
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Full batch training of Graph Convolutional Network (GCN) models is not
feasible on a single GPU for large graphs containing tens of millions of
vertices or more. Recent work has shown that, for the graphs used in the
machine learning community, communication becomes a bottleneck and scaling is
blocked outside of the single machine regime. Thus, we propose MG-GCN, a
multi-GPU GCN training framework taking advantage of the high-speed
communication links between the GPUs present in multi-GPU systems. MG-GCN
employs multiple High-Performance Computing optimizations, including efficient
re-use of memory buffers to reduce the memory footprint of training GNN models,
as well as communication and computation overlap. These optimizations enable
execution on larger datasets, that generally do not fit into memory of a single
GPU in state-of-the-art implementations. Furthermore, they contribute to
achieve superior speedup compared to the state-of-the-art. For example, MG-GCN
achieves super-linear speedup with respect to DGL, on the Reddit graph on both
DGX-1 (V100) and DGX-A100.
Related papers
- Communication-Free Distributed GNN Training with Vertex Cut [63.22674903170953]
CoFree-GNN is a novel distributed GNN training framework that significantly speeds up the training process by implementing communication-free training.
We demonstrate that CoFree-GNN speeds up the GNN training process by up to 10 times over the existing state-of-the-art GNN training approaches.
arXiv Detail & Related papers (2023-08-06T21:04:58Z) - DistTGL: Distributed Memory-Based Temporal Graph Neural Network Training [18.52206409432894]
DistTGL is an efficient and scalable solution to train memory-based TGNNs on distributed GPU clusters.
In experiments, DistTGL achieves near-linear convergence speedup, outperforming state-of-the-art single-machine method by 14.5% in accuracy and 10.17x in training throughput.
arXiv Detail & Related papers (2023-07-14T22:52:27Z) - Accelerating Sampling and Aggregation Operations in GNN Frameworks with
GPU Initiated Direct Storage Accesses [9.773813896475264]
Graph Neural Networks (GNNs) are emerging as a powerful tool for learning from graph-structured data.
Training GNNs on large-scale graphs remains a significant challenge due to lack of efficient data access and data movement methods.
We propose the GPU Initiated Direct Storage Access (GIDS) dataloader to enable GPU-oriented GNN training for large-scale graphs.
arXiv Detail & Related papers (2023-06-28T17:22:15Z) - A Comprehensive Study on Large-Scale Graph Training: Benchmarking and
Rethinking [124.21408098724551]
Large-scale graph training is a notoriously challenging problem for graph neural networks (GNNs)
We present a new ensembling training manner, named EnGCN, to address the existing issues.
Our proposed method has achieved new state-of-the-art (SOTA) performance on large-scale datasets.
arXiv Detail & Related papers (2022-10-14T03:43:05Z) - MGG: Accelerating Graph Neural Networks with Fine-grained intra-kernel
Communication-Computation Pipelining on Multi-GPU Platforms [28.25823488936712]
We propose MGG, a novel system design to accelerate full-graph GNNs on multi-GPU platforms.
The core of MGG is its novel dynamic software pipeline to facilitate fine-grained computation-communication overlapping within a GPU kernel.
MGG outperforms state-of-the-art full-graph GNN systems across various settings.
arXiv Detail & Related papers (2022-09-14T17:32:28Z) - Scaling R-GCN Training with Graph Summarization [71.06855946732296]
Training of Relation Graph Convolutional Networks (R-GCN) does not scale well with the size of the graph.
In this work, we experiment with the use of graph summarization techniques to compress the graph.
We obtain reasonable results on the AIFB, MUTAG and AM datasets.
arXiv Detail & Related papers (2022-03-05T00:28:43Z) - Adaptive Elastic Training for Sparse Deep Learning on Heterogeneous
Multi-GPU Servers [65.60007071024629]
We show that Adaptive SGD outperforms four state-of-the-art solutions in time-to-accuracy.
We show experimentally that Adaptive SGD outperforms four state-of-the-art solutions in time-to-accuracy.
arXiv Detail & Related papers (2021-10-13T20:58:15Z) - Efficient Scaling of Dynamic Graph Neural Networks [7.313571385612325]
This is the first scaling study on dynamic Graph Neural Networks.
We devise mechanisms for reducing the GPU memory usage.
We design a graph difference-based strategy to significantly reduce the transfer time.
arXiv Detail & Related papers (2021-09-16T11:51:20Z) - DistGNN: Scalable Distributed Training for Large-Scale Graph Neural
Networks [58.48833325238537]
Full-batch training on Graph Neural Networks (GNN) to learn the structure of large graphs is a critical problem that needs to scale to hundreds of compute nodes to be feasible.
In this paper, we presentGNN that optimize the well-known Deep Graph Library (DGL) for full-batch training on CPU clusters.
Our results on four common GNN benchmark datasets show up to 3.7x speed-up using a single CPU socket and up to 97x speed-up using 128 CPU sockets.
arXiv Detail & Related papers (2021-04-14T08:46:35Z) - Large Graph Convolutional Network Training with GPU-Oriented Data
Communication Architecture [19.2129567657739]
Graph Convolutional Networks (GCNs) are increasingly adopted in large-scale graph-based recommender systems.
Current GCN training systems keep the feature table in host memory and rely on the CPU to collect sparse features.
This approach, however, puts tremendous pressure on host memory bandwidth and the CPU.
We propose a novel GPU-oriented data communication approach for GCN training, where GPU threads directly access sparse features in host memory.
arXiv Detail & Related papers (2021-03-04T21:00:17Z) - L$^2$-GCN: Layer-Wise and Learned Efficient Training of Graph
Convolutional Networks [118.37805042816784]
Graph convolution networks (GCN) are increasingly popular in many applications, yet remain notoriously hard to train over large graph datasets.
We propose a novel efficient layer-wise training framework for GCN (L-GCN), that disentangles feature aggregation and feature transformation during training.
Experiments show that L-GCN is faster than state-of-the-arts by at least an order of magnitude, with a consistent of memory usage not dependent on dataset size.
arXiv Detail & Related papers (2020-03-30T16:37:56Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.