Related papers: PipeGCN: Efficient Full-Graph Training of Graph Convolutional Networks with Pipelined Feature Communication

PipeGCN: Efficient Full-Graph Training of Graph Convolutional Networks with Pipelined Feature Communication

URL: http://arxiv.org/abs/2203.10428v1
Date: Sun, 20 Mar 2022 02:08:03 GMT
Title: PipeGCN: Efficient Full-Graph Training of Graph Convolutional Networks with Pipelined Feature Communication
Authors: Cheng Wan, Youjie Li, Cameron R. Wolfe, Anastasios Kyrillidis, Nam Sung Kim, Yingyan Lin
Abstract summary: Graph Convolutional Networks (GCNs) is the state-of-the-art method for learning graph-structured data. distributed GCN training incurs prohibitive overhead of communicating node features and feature gradients among partitions. We propose PipeGCN, a scheme that hides the communication overhead by pipelining inter- partition communication.
Score: 24.05916878277873
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Graph Convolutional Networks (GCNs) is the state-of-the-art method for learning graph-structured data, and training large-scale GCNs requires distributed training across multiple accelerators such that each accelerator is able to hold a partitioned subgraph. However, distributed GCN training incurs prohibitive overhead of communicating node features and feature gradients among partitions for every GCN layer during each training iteration, limiting the achievable training efficiency and model scalability. To this end, we propose PipeGCN, a simple yet effective scheme that hides the communication overhead by pipelining inter-partition communication with intra-partition computation. It is non-trivial to pipeline for efficient GCN training, as communicated node features/gradients will become stale and thus can harm the convergence, negating the pipeline benefit. Notably, little is known regarding the convergence rate of GCN training with both stale features and stale feature gradients. This work not only provides a theoretical convergence analysis but also finds the convergence rate of PipeGCN to be close to that of the vanilla distributed GCN training without any staleness. Furthermore, we develop a smoothing method to further improve PipeGCN's convergence. Extensive experiments show that PipeGCN can largely boost the training throughput (1.7x~28.5x) while achieving the same accuracy as its vanilla counterpart and existing full-graph training methods. The code is available at https://github.com/RICE-EIC/PipeGCN.

Related papers

MixGCN: Scalable GCN Training by Mixture of Parallelism and Mixture of Accelerators [3.598994359810843]
Training Graph convolutional networks (GCNs) on full graphs is challenging. Feature tensors can easily explode the memory and block the communication bandwidth of modern accelerators. Workflow in training GCNs alternates between sparse and dense matrix operations.
arXiv Detail & Related papers (2025-01-03T18:54:46Z)
CDFGNN: a Systematic Design of Cache-based Distributed Full-Batch Graph Neural Network Training with Communication Reduction [7.048300785744331]
Graph neural network training is mainly categorized into mini-batch and full-batch training methods. In the distributed cluster, frequent remote accesses of features and gradients lead to huge communication overhead. We introduce the cached-based distributed full-batch graph neural network training framework (CDFGNN) Our results indicate that CDFGNN has great potential in accelerating distributed full-batch GNN training tasks.
arXiv Detail & Related papers (2024-08-01T01:57:09Z)
Scalable Graph Compressed Convolutions [68.85227170390864]
We propose a differentiable method that applies permutations to calibrate input graphs for Euclidean convolution. Based on the graph calibration, we propose the Compressed Convolution Network (CoCN) for hierarchical graph representation learning.
arXiv Detail & Related papers (2024-07-26T03:14:13Z)
Distributed Training of Large Graph Neural Networks with Variable Communication Rates [71.7293735221656]
Training Graph Neural Networks (GNNs) on large graphs presents unique challenges due to the large memory and computing requirements. Distributed GNN training, where the graph is partitioned across multiple machines, is a common approach to training GNNs on large graphs. We introduce a variable compression scheme for reducing the communication volume in distributed GNN training without compromising the accuracy of the learned model.
arXiv Detail & Related papers (2024-06-25T14:57:38Z)
T-GAE: Transferable Graph Autoencoder for Network Alignment [79.89704126746204]
T-GAE is a graph autoencoder framework that leverages transferability and stability of GNNs to achieve efficient network alignment without retraining. Our experiments demonstrate that T-GAE outperforms the state-of-the-art optimization method and the best GNN approach by up to 38.7% and 50.8%, respectively.
arXiv Detail & Related papers (2023-10-05T02:58:29Z)
GNNPipe: Scaling Deep GNN Training with Pipelined Model Parallelism [10.723541176359452]
Communication is a key bottleneck for distributed graph neural network (GNN) training. GNNPipe is a new approach that scales the distributed full-graph deep GNN training.
arXiv Detail & Related papers (2023-08-19T18:44:14Z)
Communication-Free Distributed GNN Training with Vertex Cut [63.22674903170953]
CoFree-GNN is a novel distributed GNN training framework that significantly speeds up the training process by implementing communication-free training. We demonstrate that CoFree-GNN speeds up the GNN training process by up to 10 times over the existing state-of-the-art GNN training approaches.
arXiv Detail & Related papers (2023-08-06T21:04:58Z)
Adaptive Message Quantization and Parallelization for Distributed Full-graph GNN Training [6.557328947642343]
Distributed full-graph training of Graph Neural Networks (GNNs) over large graphs is bandwidth-demanding and time-consuming. This paper proposes an efficient GNN training system, AdaQP, to expedite distributed full-graph training.
arXiv Detail & Related papers (2023-06-02T09:02:09Z)
Boosting Distributed Full-graph GNN Training with Asynchronous One-bit Communication [23.883543151975136]
Training Graph Neural Networks (GNNs) on large graphs is challenging due to the conflict between the high memory demand and limited GPU memory. We propose an efficient distributed GNN training framework Sylvie, which employs one-bit quantization computation technique in GNNs. In detail, Sylvie provides a lightweight Low-bit Module to quantize the sent data and dequantize the received data back to full precision values in each layer.
arXiv Detail & Related papers (2023-03-02T14:02:39Z)
Comprehensive Graph Gradual Pruning for Sparse Training in Graph Neural Networks [52.566735716983956]
We propose a graph gradual pruning framework termed CGP to dynamically prune GNNs. Unlike LTH-based methods, the proposed CGP approach requires no re-training, which significantly reduces the computation costs. Our proposed strategy greatly improves both training and inference efficiency while matching or even exceeding the accuracy of existing methods.
arXiv Detail & Related papers (2022-07-18T14:23:31Z)
BNS-GCN: Efficient Full-Graph Training of Graph Convolutional Networks with Boundary Node Sampling [25.32242812045678]
We propose a simple yet effective method dubbed BNS-GCN that adopts random Boundary-Node-Sampling to enable efficient and scalable distributed GCN training. Experiments and ablation studies consistently validate the effectiveness of BNS-GCN, boosting the throughput by up to 16.2x and reducing the memory usage by up to 58%, while maintaining a full-graph accuracy.
arXiv Detail & Related papers (2022-03-21T13:44:37Z)
L$^2$-GCN: Layer-Wise and Learned Efficient Training of Graph Convolutional Networks [118.37805042816784]
Graph convolution networks (GCN) are increasingly popular in many applications, yet remain notoriously hard to train over large graph datasets. We propose a novel efficient layer-wise training framework for GCN (L-GCN), that disentangles feature aggregation and feature transformation during training. Experiments show that L-GCN is faster than state-of-the-arts by at least an order of magnitude, with a consistent of memory usage not dependent on dataset size.
arXiv Detail & Related papers (2020-03-30T16:37:56Z)

This list is automatically generated from the titles and abstracts of the papers in this site.