Related papers: Plexus: Taming Billion-edge Graphs with 3D Parallel GNN Training

Plexus: Taming Billion-edge Graphs with 3D Parallel GNN Training

URL: http://arxiv.org/abs/2505.04083v1
Date: Wed, 07 May 2025 02:49:52 GMT
Title: Plexus: Taming Billion-edge Graphs with 3D Parallel GNN Training
Authors: Aditya K. Ranjan, Siddharth Singh, Cunyang Wei, Abhinav Bhatele,
Abstract summary: Graph neural networks (GNNs) can leverage the connectivity and structure of real-world graphs to learn intricate properties and relationships between nodes.<n>Many real-world graphs exceed the memory capacity of a GPU due to their sheer size, and using GNNs on them requires techniques such as mini-batch sampling to scale.<n>We propose a three-dimensional (3D) parallel approach for full-graph training that tackles these issues and scales to billion-edge graphs.
Score: 1.6954729278440728
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Graph neural networks have emerged as a potent class of neural networks capable of leveraging the connectivity and structure of real-world graphs to learn intricate properties and relationships between nodes. Many real-world graphs exceed the memory capacity of a GPU due to their sheer size, and using GNNs on them requires techniques such as mini-batch sampling to scale. However, this can lead to reduced accuracy in some cases, and sampling and data transfer from the CPU to the GPU can also slow down training. On the other hand, distributed full-graph training suffers from high communication overhead and load imbalance due to the irregular structure of graphs. We propose Plexus, a three-dimensional (3D) parallel approach for full-graph training that tackles these issues and scales to billion-edge graphs. Additionally, we introduce optimizations such as a permutation scheme for load balancing, and a performance model to predict the optimal 3D configuration. We evaluate Plexus on several graph datasets and show scaling results for up to 2048 GPUs on Perlmutter, which is 33% of the machine, and 2048 GCDs on Frontier. Plexus achieves unprecedented speedups of 2.3x-12.5x over existing methods and a reduction in the time to solution by 5.2-8.7x on Perlmutter and 7-54.2x on Frontier.

Related papers

Distributed Equivariant Graph Neural Networks for Large-Scale Electronic Structure Prediction [76.62155593340763]
Equivariant Graph Neural Networks (eGNNs) trained on density-functional theory (DFT) data can potentially perform electronic structure prediction at unprecedented scales.<n>However, the graph representations required for this task tend to be densely connected.<n>We present a distributed eGNN implementation which leverages direct GPU communication and introduce a partitioning strategy of the input graph.
arXiv Detail & Related papers (2025-07-04T23:53:47Z)
FastGL: A GPU-Efficient Framework for Accelerating Sampling-Based GNN Training at Large Scale [29.272368697268433]
Graph Neural Networks (GNNs) have shown great superiority on non-Euclidean graph data. We propose FastGL, a GPU-efficient Framework for accelerating sampling-based training of GNN at Large scale. FastGL can achieve an average speedup of 11.8x, 2.2x and 1.5x over the state-of-the-art frameworks PyG, DGL, and GNNLab, respectively.
arXiv Detail & Related papers (2024-09-23T11:45:47Z)
Graph Transformers for Large Graphs [57.19338459218758]
This work advances representation learning on single large-scale graphs with a focus on identifying model characteristics and critical design constraints. A key innovation of this work lies in the creation of a fast neighborhood sampling technique coupled with a local attention mechanism. We report a 3x speedup and 16.8% performance gain on ogbn-products and snap-patents, while we also scale LargeGT on ogbn-100M with a 5.9% performance improvement.
arXiv Detail & Related papers (2023-12-18T11:19:23Z)
GNNPipe: Scaling Deep GNN Training with Pipelined Model Parallelism [10.723541176359452]
Communication is a key bottleneck for distributed graph neural network (GNN) training. GNNPipe is a new approach that scales the distributed full-graph deep GNN training.
arXiv Detail & Related papers (2023-08-19T18:44:14Z)
Communication-Free Distributed GNN Training with Vertex Cut [63.22674903170953]
CoFree-GNN is a novel distributed GNN training framework that significantly speeds up the training process by implementing communication-free training. We demonstrate that CoFree-GNN speeds up the GNN training process by up to 10 times over the existing state-of-the-art GNN training approaches.
arXiv Detail & Related papers (2023-08-06T21:04:58Z)
DistTGL: Distributed Memory-Based Temporal Graph Neural Network Training [18.52206409432894]
DistTGL is an efficient and scalable solution to train memory-based TGNNs on distributed GPU clusters. In experiments, DistTGL achieves near-linear convergence speedup, outperforming state-of-the-art single-machine method by 14.5% in accuracy and 10.17x in training throughput.
arXiv Detail & Related papers (2023-07-14T22:52:27Z)
DistGNN-MB: Distributed Large-Scale Graph Neural Network Training on x86 via Minibatch Sampling [3.518762870118332]
DistGNN-MB trains GraphSAGE 5.2x faster than the widely-used DistDGL. At this scale, DistGNN-MB trains GraphSAGE and GAT 10x and 17.2x faster, respectively, as compute nodes scale from 2 to 32.
arXiv Detail & Related papers (2022-11-11T18:07:33Z)
Nimble GNN Embedding with Tensor-Train Decomposition [10.726368002799765]
This paper describes a new method for representing embedding tables of graph neural networks (GNNs) more compactly via tensor-train (TT) decomposition. In some cases, our model without explicit node features on input can even match the accuracy of models that use node features.
arXiv Detail & Related papers (2022-06-21T17:57:35Z)
Accelerating Training and Inference of Graph Neural Networks with Fast Sampling and Pipelining [58.10436813430554]
Mini-batch training of graph neural networks (GNNs) requires a lot of computation and data movement. We argue in favor of performing mini-batch training with neighborhood sampling in a distributed multi-GPU environment. We present a sequence of improvements to mitigate these bottlenecks, including a performance-engineered neighborhood sampler. We also conduct an empirical analysis that supports the use of sampling for inference, showing that test accuracies are not materially compromised.
arXiv Detail & Related papers (2021-10-16T02:41:35Z)
DistGNN: Scalable Distributed Training for Large-Scale Graph Neural Networks [58.48833325238537]
Full-batch training on Graph Neural Networks (GNN) to learn the structure of large graphs is a critical problem that needs to scale to hundreds of compute nodes to be feasible. In this paper, we presentGNN that optimize the well-known Deep Graph Library (DGL) for full-batch training on CPU clusters. Our results on four common GNN benchmark datasets show up to 3.7x speed-up using a single CPU socket and up to 97x speed-up using 128 CPU sockets.
arXiv Detail & Related papers (2021-04-14T08:46:35Z)
Scaling Graph Neural Networks with Approximate PageRank [64.92311737049054]
We present the PPRGo model which utilizes an efficient approximation of information diffusion in GNNs. In addition to being faster, PPRGo is inherently scalable, and can be trivially parallelized for large datasets like those found in industry settings. We show that training PPRGo and predicting labels for all nodes in this graph takes under 2 minutes on a single machine, far outpacing other baselines on the same graph.
arXiv Detail & Related papers (2020-07-03T09:30:07Z)

This list is automatically generated from the titles and abstracts of the papers in this site.