Demystifying Distributed Training of Graph Neural Networks for Link Prediction
- URL: http://arxiv.org/abs/2506.20818v1
- Date: Wed, 25 Jun 2025 20:32:23 GMT
- Title: Demystifying Distributed Training of Graph Neural Networks for Link Prediction
- Authors: Xin Huang, Chul-Ho Lee,
- Abstract summary: Graph neural networks (GNNs) are powerful tools for solving graph-related problems.<n>This paper demystifies distributed training of GNNs for link prediction by investigating the issue of performance degradation.<n>We propose SpLPG, which effectively leverages graph sparsification to mitigate the issue of performance degradation at a reduced communication cost.
- Score: 7.125495588888206
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Graph neural networks (GNNs) are powerful tools for solving graph-related problems. Distributed GNN frameworks and systems enhance the scalability of GNNs and accelerate model training, yet most are optimized for node classification. Their performance on link prediction remains underexplored. This paper demystifies distributed training of GNNs for link prediction by investigating the issue of performance degradation when each worker trains a GNN on its assigned partitioned subgraph without having access to the entire graph. We discover that the main sources of the issue come from not only the information loss caused by graph partitioning but also the ways of drawing negative samples during model training. While sharing the complete graph information with each worker resolves the issue and preserves link prediction accuracy, it incurs a high communication cost. We propose SpLPG, which effectively leverages graph sparsification to mitigate the issue of performance degradation at a reduced communication cost. Experiment results on several public real-world datasets demonstrate the effectiveness of SpLPG, which reduces the communication overhead by up to about 80% while mostly preserving link prediction accuracy.
Related papers
- OMEGA: A Low-Latency GNN Serving System for Large Graphs [8.51634655687174]
Graph Neural Networks (GNNs) have been widely adopted for their ability to compute expressive node representations in graph datasets.<n>Existing approximation techniques in training can mitigate the overheads but, in serving, still lead to high latency and/or accuracy loss.<n>We propose OMEGA, a system that enables low-latency GNN serving for large graphs with minimal accuracy loss.
arXiv Detail & Related papers (2025-01-15T03:14:18Z) - Stealing Training Graphs from Graph Neural Networks [54.52392250297907]
Graph Neural Networks (GNNs) have shown promising results in modeling graphs in various tasks.<n>As neural networks can memorize the training samples, the model parameters of GNNs have a high risk of leaking private training data.<n>We investigate a novel problem of stealing graphs from trained GNNs.
arXiv Detail & Related papers (2024-11-17T23:15:36Z) - MassiveGNN: Efficient Training via Prefetching for Massively Connected Distributed Graphs [11.026326555186333]
This paper develops a parameterized continuous prefetch and eviction scheme on top of the state-of-the-art Amazon DistDGL distributed GNN framework.
It demonstrates about 15-40% improvement in end-to-end training performance on the National Energy Research Scientific Computing Center's (NERSC) Perlmutter supercomputer.
arXiv Detail & Related papers (2024-10-30T05:10:38Z) - Learning to Reweight for Graph Neural Network [63.978102332612906]
Graph Neural Networks (GNNs) show promising results for graph tasks.
Existing GNNs' generalization ability will degrade when there exist distribution shifts between testing and training graph data.
We propose a novel nonlinear graph decorrelation method, which can substantially improve the out-of-distribution generalization ability.
arXiv Detail & Related papers (2023-12-19T12:25:10Z) - Addressing the Impact of Localized Training Data in Graph Neural
Networks [0.0]
Graph Neural Networks (GNNs) have achieved notable success in learning from graph-structured data.
This article aims to assess the impact of training GNNs on localized subsets of the graph.
We propose a regularization method to minimize distributional discrepancies between localized training data and graph inference.
arXiv Detail & Related papers (2023-07-24T11:04:22Z) - MentorGNN: Deriving Curriculum for Pre-Training GNNs [61.97574489259085]
We propose an end-to-end model named MentorGNN that aims to supervise the pre-training process of GNNs across graphs.
We shed new light on the problem of domain adaption on relational data (i.e., graphs) by deriving a natural and interpretable upper bound on the generalization error of the pre-trained GNNs.
arXiv Detail & Related papers (2022-08-21T15:12:08Z) - Neural Graph Matching for Pre-training Graph Neural Networks [72.32801428070749]
Graph neural networks (GNNs) have been shown powerful capacity at modeling structural data.
We present a novel Graph Matching based GNN Pre-Training framework, called GMPT.
The proposed method can be applied to fully self-supervised pre-training and coarse-grained supervised pre-training.
arXiv Detail & Related papers (2022-03-03T09:53:53Z) - OOD-GNN: Out-of-Distribution Generalized Graph Neural Network [73.67049248445277]
Graph neural networks (GNNs) have achieved impressive performance when testing and training graph data come from identical distribution.
Existing GNNs lack out-of-distribution generalization abilities so that their performance substantially degrades when there exist distribution shifts between testing and training graph data.
We propose an out-of-distribution generalized graph neural network (OOD-GNN) for achieving satisfactory performance on unseen testing graphs that have different distributions with training graphs.
arXiv Detail & Related papers (2021-12-07T16:29:10Z) - Learning to Drop: Robust Graph Neural Network via Topological Denoising [50.81722989898142]
We propose PTDNet, a parameterized topological denoising network, to improve the robustness and generalization performance of Graph Neural Networks (GNNs)
PTDNet prunes task-irrelevant edges by penalizing the number of edges in the sparsified graph with parameterized networks.
We show that PTDNet can improve the performance of GNNs significantly and the performance gain becomes larger for more noisy datasets.
arXiv Detail & Related papers (2020-11-13T18:53:21Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.