DistGNN: Scalable Distributed Training for Large-Scale Graph Neural
Networks
- URL: http://arxiv.org/abs/2104.06700v3
- Date: Fri, 16 Apr 2021 15:04:55 GMT
- Title: DistGNN: Scalable Distributed Training for Large-Scale Graph Neural
Networks
- Authors: Vasimuddin Md, Sanchit Misra, Guixiang Ma, Ramanarayan Mohanty,
Evangelos Georganas, Alexander Heinecke, Dhiraj Kalamkar, Nesreen K. Ahmed,
Sasikanth Avancha
- Abstract summary: Full-batch training on Graph Neural Networks (GNN) to learn the structure of large graphs is a critical problem that needs to scale to hundreds of compute nodes to be feasible.
In this paper, we presentGNN that optimize the well-known Deep Graph Library (DGL) for full-batch training on CPU clusters.
Our results on four common GNN benchmark datasets show up to 3.7x speed-up using a single CPU socket and up to 97x speed-up using 128 CPU sockets.
- Score: 58.48833325238537
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Full-batch training on Graph Neural Networks (GNN) to learn the structure of
large graphs is a critical problem that needs to scale to hundreds of compute
nodes to be feasible. It is challenging due to large memory capacity and
bandwidth requirements on a single compute node and high communication volumes
across multiple nodes. In this paper, we present DistGNN that optimizes the
well-known Deep Graph Library (DGL) for full-batch training on CPU clusters via
an efficient shared memory implementation, communication reduction using a
minimum vertex-cut graph partitioning algorithm and communication avoidance
using a family of delayed-update algorithms. Our results on four common GNN
benchmark datasets: Reddit, OGB-Products, OGB-Papers and Proteins, show up to
3.7x speed-up using a single CPU socket and up to 97x speed-up using 128 CPU
sockets, respectively, over baseline DGL implementations running on a single
CPU socket
Related papers
- LSM-GNN: Large-scale Storage-based Multi-GPU GNN Training by Optimizing Data Transfer Scheme [12.64360444043247]
Graph Neural Networks (GNNs) are widely used today in recommendation systems, fraud detection, and node/link classification tasks.
To address limited memory capacities, traditional GNN training approaches use graph partitioning and sharding techniques.
We propose Large-scale Storage-based Multi- GPU GNN framework (LSM-GNN)
LSM-GNN incorporates a hybrid eviction policy that intelligently manages cache space by using both static and dynamic node information.
arXiv Detail & Related papers (2024-07-21T20:41:39Z) - Distributed Training of Large Graph Neural Networks with Variable Communication Rates [71.7293735221656]
Training Graph Neural Networks (GNNs) on large graphs presents unique challenges due to the large memory and computing requirements.
Distributed GNN training, where the graph is partitioned across multiple machines, is a common approach to training GNNs on large graphs.
We introduce a variable compression scheme for reducing the communication volume in distributed GNN training without compromising the accuracy of the learned model.
arXiv Detail & Related papers (2024-06-25T14:57:38Z) - Communication-Free Distributed GNN Training with Vertex Cut [63.22674903170953]
CoFree-GNN is a novel distributed GNN training framework that significantly speeds up the training process by implementing communication-free training.
We demonstrate that CoFree-GNN speeds up the GNN training process by up to 10 times over the existing state-of-the-art GNN training approaches.
arXiv Detail & Related papers (2023-08-06T21:04:58Z) - DistTGL: Distributed Memory-Based Temporal Graph Neural Network Training [18.52206409432894]
DistTGL is an efficient and scalable solution to train memory-based TGNNs on distributed GPU clusters.
In experiments, DistTGL achieves near-linear convergence speedup, outperforming state-of-the-art single-machine method by 14.5% in accuracy and 10.17x in training throughput.
arXiv Detail & Related papers (2023-07-14T22:52:27Z) - BatchGNN: Efficient CPU-Based Distributed GNN Training on Very Large
Graphs [2.984386665258243]
BatchGNN is a distributed CPU system that showcases techniques to efficiently train GNNs on terabyte-sized graphs.
BatchGNN achieves an average $3times$ speedup over DistDGL on three GNN models trained on OGBN graphs.
arXiv Detail & Related papers (2023-06-23T23:25:34Z) - Communication-Efficient Graph Neural Networks with Probabilistic
Neighborhood Expansion Analysis and Caching [59.8522166385372]
Training and inference with graph neural networks (GNNs) on massive graphs has been actively studied since the inception of GNNs.
This paper is concerned with minibatch training and inference with GNNs that employ node-wise sampling in distributed settings.
We present SALIENT++, which extends the prior state-of-the-art SALIENT system to work with partitioned feature data.
arXiv Detail & Related papers (2023-05-04T21:04:01Z) - DistGNN-MB: Distributed Large-Scale Graph Neural Network Training on x86
via Minibatch Sampling [3.518762870118332]
DistGNN-MB trains GraphSAGE 5.2x faster than the widely-used DistDGL.
At this scale, DistGNN-MB trains GraphSAGE and GAT 10x and 17.2x faster, respectively, as compute nodes scale from 2 to 32.
arXiv Detail & Related papers (2022-11-11T18:07:33Z) - GNNIE: GNN Inference Engine with Load-balancing and Graph-Specific
Caching [2.654276707313136]
GNNIE is an accelerator designed to run a broad range of Graph Neural Networks (GNNs)
It tackles workload imbalance by (i) splitting node feature operands into blocks, (ii) reordering and redistributing computations, and (iii) using a flexible MAC architecture with low communication overheads among the processing elements.
GNNIE achieves average speedups of over 8890x over a CPU and 295x over a GPU over multiple datasets on graph attention networks (GATs), graph convolutional networks (GCNs), GraphSAGE, GINConv, and DiffPool.
arXiv Detail & Related papers (2021-05-21T20:07:14Z) - VersaGNN: a Versatile accelerator for Graph neural networks [81.1667080640009]
We propose textitVersaGNN, an ultra-efficient, systolic-array-based versatile hardware accelerator.
textitVersaGNN achieves on average 3712$times$ speedup with 1301.25$times$ energy reduction on CPU, and 35.4$times$ speedup with 17.66$times$ energy reduction on GPU.
arXiv Detail & Related papers (2021-05-04T04:10:48Z) - Scaling Graph Neural Networks with Approximate PageRank [64.92311737049054]
We present the PPRGo model which utilizes an efficient approximation of information diffusion in GNNs.
In addition to being faster, PPRGo is inherently scalable, and can be trivially parallelized for large datasets like those found in industry settings.
We show that training PPRGo and predicting labels for all nodes in this graph takes under 2 minutes on a single machine, far outpacing other baselines on the same graph.
arXiv Detail & Related papers (2020-07-03T09:30:07Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.