Staleness-Alleviated Distributed GNN Training via Online
Dynamic-Embedding Prediction
- URL: http://arxiv.org/abs/2308.13466v2
- Date: Sun, 10 Dec 2023 14:56:21 GMT
- Title: Staleness-Alleviated Distributed GNN Training via Online
Dynamic-Embedding Prediction
- Authors: Guangji Bai, Ziyang Yu, Zheng Chai, Yue Cheng, Liang Zhao
- Abstract summary: This paper proposes SAT (Staleness-Alleviated Training), a novel and scalable distributed GNN training framework.
The key idea of SAT is to model the GNN's embedding evolution as a temporal graph and build a model upon it to predict future embedding.
Empirically, we demonstrate that SAT can effectively reduce embedding staleness and thus achieve better performance and convergence speed.
- Score: 13.575053193557697
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Despite the recent success of Graph Neural Networks (GNNs), it remains
challenging to train GNNs on large-scale graphs due to neighbor explosions. As
a remedy, distributed computing becomes a promising solution by leveraging
abundant computing resources (e.g., GPU). However, the node dependency of graph
data increases the difficulty of achieving high concurrency in distributed GNN
training, which suffers from the massive communication overhead. To address it,
Historical value approximation is deemed a promising class of distributed
training techniques. It utilizes an offline memory to cache historical
information (e.g., node embedding) as an affordable approximation of the exact
value and achieves high concurrency. However, such benefits come at the cost of
involving dated training information, leading to staleness, imprecision, and
convergence issues. To overcome these challenges, this paper proposes SAT
(Staleness-Alleviated Training), a novel and scalable distributed GNN training
framework that reduces the embedding staleness adaptively. The key idea of SAT
is to model the GNN's embedding evolution as a temporal graph and build a model
upon it to predict future embedding, which effectively alleviates the staleness
of the cached historical embedding. We propose an online algorithm to train the
embedding predictor and the distributed GNN alternatively and further provide a
convergence analysis. Empirically, we demonstrate that SAT can effectively
reduce embedding staleness and thus achieve better performance and convergence
speed on multiple large-scale graph datasets.
Related papers
- CATGNN: Cost-Efficient and Scalable Distributed Training for Graph Neural Networks [7.321893519281194]
Existing distributed systems load the entire graph in memory for graph partitioning.
We propose CATGNN, a cost-efficient and scalable distributed GNN training system.
We also propose a novel streaming partitioning algorithm named SPRING for distributed GNN training.
arXiv Detail & Related papers (2024-04-02T20:55:39Z) - Learning to Reweight for Graph Neural Network [63.978102332612906]
Graph Neural Networks (GNNs) show promising results for graph tasks.
Existing GNNs' generalization ability will degrade when there exist distribution shifts between testing and training graph data.
We propose a novel nonlinear graph decorrelation method, which can substantially improve the out-of-distribution generalization ability.
arXiv Detail & Related papers (2023-12-19T12:25:10Z) - GNN-Ensemble: Towards Random Decision Graph Neural Networks [3.7620848582312405]
Graph Neural Networks (GNNs) have enjoyed wide spread applications in graph-structured data.
GNNs are required to learn latent patterns from a limited amount of training data to perform inferences on a vast amount of test data.
In this paper, we push one step forward on the ensemble learning of GNNs with improved accuracy, robustness, and adversarial attacks.
arXiv Detail & Related papers (2023-03-20T18:24:01Z) - Distributed Graph Neural Network Training: A Survey [51.77035975191926]
Graph neural networks (GNNs) are a type of deep learning models that are trained on graphs and have been successfully applied in various domains.
Despite the effectiveness of GNNs, it is still challenging for GNNs to efficiently scale to large graphs.
As a remedy, distributed computing becomes a promising solution of training large-scale GNNs.
arXiv Detail & Related papers (2022-11-01T01:57:00Z) - Training Graph Neural Networks on Growing Stochastic Graphs [114.75710379125412]
Graph Neural Networks (GNNs) rely on graph convolutions to exploit meaningful patterns in networked data.
We propose to learn GNNs on very large graphs by leveraging the limit object of a sequence of growing graphs, the graphon.
arXiv Detail & Related papers (2022-10-27T16:00:45Z) - A Comprehensive Study on Large-Scale Graph Training: Benchmarking and
Rethinking [124.21408098724551]
Large-scale graph training is a notoriously challenging problem for graph neural networks (GNNs)
We present a new ensembling training manner, named EnGCN, to address the existing issues.
Our proposed method has achieved new state-of-the-art (SOTA) performance on large-scale datasets.
arXiv Detail & Related papers (2022-10-14T03:43:05Z) - Distributed Graph Neural Network Training with Periodic Historical
Embedding Synchronization [9.503080586294406]
Graph Neural Networks (GNNs) are prevalent in various applications such as social network, recommender systems, and knowledge graphs.
Traditional sampling-based methods accelerate GNN by dropping edges and nodes, which impairs the graph integrity and model performance.
This paper proposes DIstributed Graph Embedding SynchronizaTion (DIGEST), a novel distributed GNN training framework.
arXiv Detail & Related papers (2022-05-31T18:44:53Z) - Distributionally Robust Semi-Supervised Learning Over Graphs [68.29280230284712]
Semi-supervised learning (SSL) over graph-structured data emerges in many network science applications.
To efficiently manage learning over graphs, variants of graph neural networks (GNNs) have been developed recently.
Despite their success in practice, most of existing methods are unable to handle graphs with uncertain nodal attributes.
Challenges also arise due to distributional uncertainties associated with data acquired by noisy measurements.
A distributionally robust learning framework is developed, where the objective is to train models that exhibit quantifiable robustness against perturbations.
arXiv Detail & Related papers (2021-10-20T14:23:54Z) - Increase and Conquer: Training Graph Neural Networks on Growing Graphs [116.03137405192356]
We consider the problem of learning a graphon neural network (WNN) by training GNNs on graphs sampled Bernoulli from the graphon.
Inspired by these results, we propose an algorithm to learn GNNs on large-scale graphs that, starting from a moderate number of nodes, successively increases the size of the graph during training.
arXiv Detail & Related papers (2021-06-07T15:05:59Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.