Adaptive Message Quantization and Parallelization for Distributed
Full-graph GNN Training
- URL: http://arxiv.org/abs/2306.01381v1
- Date: Fri, 2 Jun 2023 09:02:09 GMT
- Title: Adaptive Message Quantization and Parallelization for Distributed
Full-graph GNN Training
- Authors: Borui Wan, Juntao Zhao, Chuan Wu
- Abstract summary: Distributed full-graph training of Graph Neural Networks (GNNs) over large graphs is bandwidth-demanding and time-consuming.
This paper proposes an efficient GNN training system, AdaQP, to expedite distributed full-graph training.
- Score: 6.557328947642343
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Distributed full-graph training of Graph Neural Networks (GNNs) over large
graphs is bandwidth-demanding and time-consuming. Frequent exchanges of node
features, embeddings and embedding gradients (all referred to as messages)
across devices bring significant communication overhead for nodes with remote
neighbors on other devices (marginal nodes) and unnecessary waiting time for
nodes without remote neighbors (central nodes) in the training graph. This
paper proposes an efficient GNN training system, AdaQP, to expedite distributed
full-graph GNN training. We stochastically quantize messages transferred across
devices to lower-precision integers for communication traffic reduction and
advocate communication-computation parallelization between marginal nodes and
central nodes. We provide theoretical analysis to prove fast training
convergence (at the rate of O(T^{-1}) with T being the total number of training
epochs) and design an adaptive quantization bit-width assignment scheme for
each message based on the analysis, targeting a good trade-off between training
convergence and efficiency. Extensive experiments on mainstream graph datasets
show that AdaQP substantially improves distributed full-graph training's
throughput (up to 3.01 X) with negligible accuracy drop (at most 0.30%) or even
accuracy improvement (up to 0.19%) in most cases, showing significant
advantages over the state-of-the-art works.
Related papers
- Distributed Training of Large Graph Neural Networks with Variable Communication Rates [71.7293735221656]
Training Graph Neural Networks (GNNs) on large graphs presents unique challenges due to the large memory and computing requirements.
Distributed GNN training, where the graph is partitioned across multiple machines, is a common approach to training GNNs on large graphs.
We introduce a variable compression scheme for reducing the communication volume in distributed GNN training without compromising the accuracy of the learned model.
arXiv Detail & Related papers (2024-06-25T14:57:38Z) - Label Deconvolution for Node Representation Learning on Large-scale
Attributed Graphs against Learning Bias [75.44877675117749]
We propose an efficient label regularization technique, namely Label Deconvolution (LD), to alleviate the learning bias by a novel and highly scalable approximation to the inverse mapping of GNNs.
Experiments demonstrate LD significantly outperforms state-of-the-art methods on Open Graph datasets Benchmark.
arXiv Detail & Related papers (2023-09-26T13:09:43Z) - Communication-Free Distributed GNN Training with Vertex Cut [63.22674903170953]
CoFree-GNN is a novel distributed GNN training framework that significantly speeds up the training process by implementing communication-free training.
We demonstrate that CoFree-GNN speeds up the GNN training process by up to 10 times over the existing state-of-the-art GNN training approaches.
arXiv Detail & Related papers (2023-08-06T21:04:58Z) - NodeFormer: A Scalable Graph Structure Learning Transformer for Node
Classification [70.51126383984555]
We introduce a novel all-pair message passing scheme for efficiently propagating node signals between arbitrary nodes.
The efficient computation is enabled by a kernerlized Gumbel-Softmax operator.
Experiments demonstrate the promising efficacy of the method in various tasks including node classification on graphs.
arXiv Detail & Related papers (2023-06-14T09:21:15Z) - Boosting Distributed Full-graph GNN Training with Asynchronous One-bit
Communication [23.883543151975136]
Training Graph Neural Networks (GNNs) on large graphs is challenging due to the conflict between the high memory demand and limited GPU memory.
We propose an efficient distributed GNN training framework Sylvie, which employs one-bit quantization computation technique in GNNs.
In detail, Sylvie provides a lightweight Low-bit Module to quantize the sent data and dequantize the received data back to full precision values in each layer.
arXiv Detail & Related papers (2023-03-02T14:02:39Z) - Scalable Neural Network Training over Distributed Graphs [45.151244961817454]
Realworld graph data must often be stored across many machines just because capacity constraints.
Network communication is costly and becomes main bottleneck to train GNNs.
First framework that can be used to train GNNs at all network decentralization levels.
arXiv Detail & Related papers (2023-02-25T10:42:34Z) - ABC: Aggregation before Communication, a Communication Reduction
Framework for Distributed Graph Neural Network Training and Effective
Partition [0.0]
Graph Neural Networks (GNNs) are neural models tailored for graph-structure data and have shown superior performance in learning representations for graph-structured data.
In this paper, we study the communication complexity during distributed GNNs training.
We show that the new partition paradigm is particularly ideal in the case of dynamic graphs where it is infeasible to control the edge placement due to the unknown of the graph-changing process.
arXiv Detail & Related papers (2022-12-11T04:54:01Z) - Neural Graph Matching for Pre-training Graph Neural Networks [72.32801428070749]
Graph neural networks (GNNs) have been shown powerful capacity at modeling structural data.
We present a novel Graph Matching based GNN Pre-Training framework, called GMPT.
The proposed method can be applied to fully self-supervised pre-training and coarse-grained supervised pre-training.
arXiv Detail & Related papers (2022-03-03T09:53:53Z) - DistGNN: Scalable Distributed Training for Large-Scale Graph Neural
Networks [58.48833325238537]
Full-batch training on Graph Neural Networks (GNN) to learn the structure of large graphs is a critical problem that needs to scale to hundreds of compute nodes to be feasible.
In this paper, we presentGNN that optimize the well-known Deep Graph Library (DGL) for full-batch training on CPU clusters.
Our results on four common GNN benchmark datasets show up to 3.7x speed-up using a single CPU socket and up to 97x speed-up using 128 CPU sockets.
arXiv Detail & Related papers (2021-04-14T08:46:35Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.