Scalable Neural Network Training over Distributed Graphs
- URL: http://arxiv.org/abs/2302.13053v3
- Date: Sun, 11 Feb 2024 08:52:32 GMT
- Title: Scalable Neural Network Training over Distributed Graphs
- Authors: Aashish Kolluri, Sarthak Choudhary, Bryan Hooi, Prateek Saxena
- Abstract summary: Realworld graph data must often be stored across many machines just because capacity constraints.
Network communication is costly and becomes main bottleneck to train GNNs.
First framework that can be used to train GNNs at all network decentralization levels.
- Score: 45.151244961817454
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Graph neural networks (GNNs) fuel diverse machine learning tasks involving
graph-structured data, ranging from predicting protein structures to serving
personalized recommendations. Real-world graph data must often be stored
distributed across many machines not just because of capacity constraints, but
because of compliance with data residency or privacy laws. In such setups,
network communication is costly and becomes the main bottleneck to train GNNs.
Optimizations for distributed GNN training have targeted data-level
improvements so far -- via caching, network-aware partitioning, and
sub-sampling -- that work for data center-like setups where graph data is
accessible to a single entity and data transfer costs are ignored.
We present RETEXO, the first framework which eliminates the severe
communication bottleneck in distributed GNN training while respecting any given
data partitioning configuration. The key is a new training procedure, lazy
message passing, that reorders the sequence of training GNN elements. RETEXO
achieves 1-2 orders of magnitude reduction in network data costs compared to
standard GNN training, while retaining accuracy. RETEXO scales gracefully with
increasing decentralization and decreasing bandwidth. It is the first framework
that can be used to train GNNs at all network decentralization levels --
including centralized data-center networks, wide area networks, proximity
networks, and edge networks.
Related papers
- MassiveGNN: Efficient Training via Prefetching for Massively Connected Distributed Graphs [11.026326555186333]
This paper develops a parameterized continuous prefetch and eviction scheme on top of the state-of-the-art Amazon DistDGL distributed GNN framework.
It demonstrates about 15-40% improvement in end-to-end training performance on the National Energy Research Scientific Computing Center's (NERSC) Perlmutter supercomputer.
arXiv Detail & Related papers (2024-10-30T05:10:38Z) - Distributed Training of Large Graph Neural Networks with Variable Communication Rates [71.7293735221656]
Training Graph Neural Networks (GNNs) on large graphs presents unique challenges due to the large memory and computing requirements.
Distributed GNN training, where the graph is partitioned across multiple machines, is a common approach to training GNNs on large graphs.
We introduce a variable compression scheme for reducing the communication volume in distributed GNN training without compromising the accuracy of the learned model.
arXiv Detail & Related papers (2024-06-25T14:57:38Z) - Communication-Free Distributed GNN Training with Vertex Cut [63.22674903170953]
CoFree-GNN is a novel distributed GNN training framework that significantly speeds up the training process by implementing communication-free training.
We demonstrate that CoFree-GNN speeds up the GNN training process by up to 10 times over the existing state-of-the-art GNN training approaches.
arXiv Detail & Related papers (2023-08-06T21:04:58Z) - ABC: Aggregation before Communication, a Communication Reduction
Framework for Distributed Graph Neural Network Training and Effective
Partition [0.0]
Graph Neural Networks (GNNs) are neural models tailored for graph-structure data and have shown superior performance in learning representations for graph-structured data.
In this paper, we study the communication complexity during distributed GNNs training.
We show that the new partition paradigm is particularly ideal in the case of dynamic graphs where it is infeasible to control the edge placement due to the unknown of the graph-changing process.
arXiv Detail & Related papers (2022-12-11T04:54:01Z) - Neural Graph Matching for Pre-training Graph Neural Networks [72.32801428070749]
Graph neural networks (GNNs) have been shown powerful capacity at modeling structural data.
We present a novel Graph Matching based GNN Pre-Training framework, called GMPT.
The proposed method can be applied to fully self-supervised pre-training and coarse-grained supervised pre-training.
arXiv Detail & Related papers (2022-03-03T09:53:53Z) - Learn Locally, Correct Globally: A Distributed Algorithm for Training
Graph Neural Networks [22.728439336309858]
We propose a communication-efficient distributed GNN training technique named $textLearn Locally, Correct Globally$ (LLCG)
LLCG trains a GNN on its local data by ignoring the dependency between nodes among different machines, then sends the locally trained model to the server for periodic model averaging.
We rigorously analyze the convergence of distributed methods with periodic model averaging for training GNNs and show that naively applying periodic model averaging but ignoring the dependency between nodes will suffer from an irreducible residual error.
arXiv Detail & Related papers (2021-11-16T03:07:01Z) - SpreadGNN: Serverless Multi-task Federated Learning for Graph Neural
Networks [13.965982814292971]
Graph Neural Networks (GNNs) are the first choice methods for graph machine learning problems.
Centralizing a massive amount of real-world graph data for GNN training is prohibitive due to user-side privacy concerns.
This work proposes SpreadGNN, a novel multi-task federated training framework.
arXiv Detail & Related papers (2021-06-04T22:20:47Z) - A Unified Lottery Ticket Hypothesis for Graph Neural Networks [82.31087406264437]
We present a unified GNN sparsification (UGS) framework that simultaneously prunes the graph adjacency matrix and the model weights.
We further generalize the popular lottery ticket hypothesis to GNNs for the first time, by defining a graph lottery ticket (GLT) as a pair of core sub-dataset and sparse sub-network.
arXiv Detail & Related papers (2021-02-12T21:52:43Z) - An Uncoupled Training Architecture for Large Graph Learning [20.784230322205232]
We present Node2Grids, a flexible uncoupled training framework for embedding graph data into grid-like data.
By ranking each node's influence through degree, Node2Grids selects the most influential first-order as well as second-order neighbors with central node fusion information.
For further improving the efficiency of downstream tasks, a simple CNN-based neural network is employed to capture the significant information from the mapped grid-like data.
arXiv Detail & Related papers (2020-03-21T11:49:16Z) - Graphs, Convolutions, and Neural Networks: From Graph Filters to Graph
Neural Networks [183.97265247061847]
We leverage graph signal processing to characterize the representation space of graph neural networks (GNNs)
We discuss the role of graph convolutional filters in GNNs and show that any architecture built with such filters has the fundamental properties of permutation equivariance and stability to changes in the topology.
We also study the use of GNNs in recommender systems and learning decentralized controllers for robot swarms.
arXiv Detail & Related papers (2020-03-08T13:02:15Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.