Learn Locally, Correct Globally: A Distributed Algorithm for Training
Graph Neural Networks
- URL: http://arxiv.org/abs/2111.08202v2
- Date: Wed, 17 Nov 2021 17:06:22 GMT
- Title: Learn Locally, Correct Globally: A Distributed Algorithm for Training
Graph Neural Networks
- Authors: Morteza Ramezani, Weilin Cong, Mehrdad Mahdavi, Mahmut T. Kandemir,
Anand Sivasubramaniam
- Abstract summary: We propose a communication-efficient distributed GNN training technique named $textLearn Locally, Correct Globally$ (LLCG)
LLCG trains a GNN on its local data by ignoring the dependency between nodes among different machines, then sends the locally trained model to the server for periodic model averaging.
We rigorously analyze the convergence of distributed methods with periodic model averaging for training GNNs and show that naively applying periodic model averaging but ignoring the dependency between nodes will suffer from an irreducible residual error.
- Score: 22.728439336309858
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Despite the recent success of Graph Neural Networks (GNNs), training GNNs on
large graphs remains challenging. The limited resource capacities of the
existing servers, the dependency between nodes in a graph, and the privacy
concern due to the centralized storage and model learning have spurred the need
to design an effective distributed algorithm for GNN training. However,
existing distributed GNN training methods impose either excessive communication
costs or large memory overheads that hinders their scalability. To overcome
these issues, we propose a communication-efficient distributed GNN training
technique named $\text{{Learn Locally, Correct Globally}}$ (LLCG). To reduce
the communication and memory overhead, each local machine in LLCG first trains
a GNN on its local data by ignoring the dependency between nodes among
different machines, then sends the locally trained model to the server for
periodic model averaging. However, ignoring node dependency could result in
significant performance degradation. To solve the performance degradation, we
propose to apply $\text{{Global Server Corrections}}$ on the server to refine
the locally learned models. We rigorously analyze the convergence of
distributed methods with periodic model averaging for training GNNs and show
that naively applying periodic model averaging but ignoring the dependency
between nodes will suffer from an irreducible residual error. However, this
residual error can be eliminated by utilizing the proposed global corrections
to entail fast convergence rate. Extensive experiments on real-world datasets
show that LLCG can significantly improve the efficiency without hurting the
performance.
Related papers
- MassiveGNN: Efficient Training via Prefetching for Massively Connected Distributed Graphs [11.026326555186333]
This paper develops a parameterized continuous prefetch and eviction scheme on top of the state-of-the-art Amazon DistDGL distributed GNN framework.
It demonstrates about 15-40% improvement in end-to-end training performance on the National Energy Research Scientific Computing Center's (NERSC) Perlmutter supercomputer.
arXiv Detail & Related papers (2024-10-30T05:10:38Z) - Distributed Training of Large Graph Neural Networks with Variable Communication Rates [71.7293735221656]
Training Graph Neural Networks (GNNs) on large graphs presents unique challenges due to the large memory and computing requirements.
Distributed GNN training, where the graph is partitioned across multiple machines, is a common approach to training GNNs on large graphs.
We introduce a variable compression scheme for reducing the communication volume in distributed GNN training without compromising the accuracy of the learned model.
arXiv Detail & Related papers (2024-06-25T14:57:38Z) - Learning to Reweight for Graph Neural Network [63.978102332612906]
Graph Neural Networks (GNNs) show promising results for graph tasks.
Existing GNNs' generalization ability will degrade when there exist distribution shifts between testing and training graph data.
We propose a novel nonlinear graph decorrelation method, which can substantially improve the out-of-distribution generalization ability.
arXiv Detail & Related papers (2023-12-19T12:25:10Z) - T-GAE: Transferable Graph Autoencoder for Network Alignment [79.89704126746204]
T-GAE is a graph autoencoder framework that leverages transferability and stability of GNNs to achieve efficient network alignment without retraining.
Our experiments demonstrate that T-GAE outperforms the state-of-the-art optimization method and the best GNN approach by up to 38.7% and 50.8%, respectively.
arXiv Detail & Related papers (2023-10-05T02:58:29Z) - GNN-Ensemble: Towards Random Decision Graph Neural Networks [3.7620848582312405]
Graph Neural Networks (GNNs) have enjoyed wide spread applications in graph-structured data.
GNNs are required to learn latent patterns from a limited amount of training data to perform inferences on a vast amount of test data.
In this paper, we push one step forward on the ensemble learning of GNNs with improved accuracy, robustness, and adversarial attacks.
arXiv Detail & Related papers (2023-03-20T18:24:01Z) - Expediting Distributed DNN Training with Device Topology-Aware Graph
Deployment [18.021259939659874]
TAG is an automatic system to derive optimized DNN training graph and its deployment onto any device topology.
We show that it can achieve up to 4.56x training speed-up as compared to existing schemes.
arXiv Detail & Related papers (2023-02-13T06:30:24Z) - Distributed Graph Neural Network Training: A Survey [51.77035975191926]
Graph neural networks (GNNs) are a type of deep learning models that are trained on graphs and have been successfully applied in various domains.
Despite the effectiveness of GNNs, it is still challenging for GNNs to efficiently scale to large graphs.
As a remedy, distributed computing becomes a promising solution of training large-scale GNNs.
arXiv Detail & Related papers (2022-11-01T01:57:00Z) - Comprehensive Graph Gradual Pruning for Sparse Training in Graph Neural
Networks [52.566735716983956]
We propose a graph gradual pruning framework termed CGP to dynamically prune GNNs.
Unlike LTH-based methods, the proposed CGP approach requires no re-training, which significantly reduces the computation costs.
Our proposed strategy greatly improves both training and inference efficiency while matching or even exceeding the accuracy of existing methods.
arXiv Detail & Related papers (2022-07-18T14:23:31Z) - Distributed Graph Neural Network Training with Periodic Historical
Embedding Synchronization [9.503080586294406]
Graph Neural Networks (GNNs) are prevalent in various applications such as social network, recommender systems, and knowledge graphs.
Traditional sampling-based methods accelerate GNN by dropping edges and nodes, which impairs the graph integrity and model performance.
This paper proposes DIstributed Graph Embedding SynchronizaTion (DIGEST), a novel distributed GNN training framework.
arXiv Detail & Related papers (2022-05-31T18:44:53Z) - Shift-Robust GNNs: Overcoming the Limitations of Localized Graph
Training data [52.771780951404565]
Shift-Robust GNN (SR-GNN) is designed to account for distributional differences between biased training data and the graph's true inference distribution.
We show that SR-GNN outperforms other GNN baselines by accuracy, eliminating at least (40%) of the negative effects introduced by biased training data.
arXiv Detail & Related papers (2021-08-02T18:00:38Z) - SpreadGNN: Serverless Multi-task Federated Learning for Graph Neural
Networks [13.965982814292971]
Graph Neural Networks (GNNs) are the first choice methods for graph machine learning problems.
Centralizing a massive amount of real-world graph data for GNN training is prohibitive due to user-side privacy concerns.
This work proposes SpreadGNN, a novel multi-task federated training framework.
arXiv Detail & Related papers (2021-06-04T22:20:47Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.