Sequential Aggregation and Rematerialization: Distributed Full-batch
Training of Graph Neural Networks on Large Graphs
- URL: http://arxiv.org/abs/2111.06483v1
- Date: Thu, 11 Nov 2021 22:27:59 GMT
- Title: Sequential Aggregation and Rematerialization: Distributed Full-batch
Training of Graph Neural Networks on Large Graphs
- Authors: Hesham Mostafa
- Abstract summary: We present the Sequential Aggregation and Rematerialization (SAR) scheme for distributed full-batch training of Graph Neural Networks (GNNs) on large graphs.
SAR is a distributed technique that can train any GNN type directly on an entire large graph.
We also present a general technique based on kernel fusion and attention-matrix rematerialization to optimize both the runtime and memory efficiency of attention-based models.
- Score: 7.549360351036771
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We present the Sequential Aggregation and Rematerialization (SAR) scheme for
distributed full-batch training of Graph Neural Networks (GNNs) on large
graphs. Large-scale training of GNNs has recently been dominated by
sampling-based methods and methods based on non-learnable message passing. SAR
on the other hand is a distributed technique that can train any GNN type
directly on an entire large graph. The key innovation in SAR is the distributed
sequential rematerialization scheme which sequentially re-constructs then frees
pieces of the prohibitively large GNN computational graph during the backward
pass. This results in excellent memory scaling behavior where the memory
consumption per worker goes down linearly with the number of workers, even for
densely connected graphs. Using SAR, we report the largest applications of
full-batch GNN training to-date, and demonstrate large memory savings as the
number of workers increases. We also present a general technique based on
kernel fusion and attention-matrix rematerialization to optimize both the
runtime and memory efficiency of attention-based models. We show that, coupled
with SAR, our optimized attention kernels lead to significant speedups and
memory savings in attention-based GNNs.
Related papers
- MassiveGNN: Efficient Training via Prefetching for Massively Connected Distributed Graphs [11.026326555186333]
This paper develops a parameterized continuous prefetch and eviction scheme on top of the state-of-the-art Amazon DistDGL distributed GNN framework.
It demonstrates about 15-40% improvement in end-to-end training performance on the National Energy Research Scientific Computing Center's (NERSC) Perlmutter supercomputer.
arXiv Detail & Related papers (2024-10-30T05:10:38Z) - Pre-Training Identification of Graph Winning Tickets in Adaptive Spatial-Temporal Graph Neural Networks [5.514795777097036]
We introduce the concept of the Graph Winning Ticket (GWT), derived from the Lottery Ticket Hypothesis (LTH)
By adopting a pre-determined star topology as a GWT prior to training, we balance edge reduction with efficient information propagation.
Our approach enables training ASTGNNs on the largest scale spatial-temporal dataset using a single A6000 equipped with 48 GB of memory.
arXiv Detail & Related papers (2024-06-12T14:53:23Z) - CATGNN: Cost-Efficient and Scalable Distributed Training for Graph Neural Networks [7.321893519281194]
Existing distributed systems load the entire graph in memory for graph partitioning.
We propose CATGNN, a cost-efficient and scalable distributed GNN training system.
We also propose a novel streaming partitioning algorithm named SPRING for distributed GNN training.
arXiv Detail & Related papers (2024-04-02T20:55:39Z) - Efficient Heterogeneous Graph Learning via Random Projection [58.4138636866903]
Heterogeneous Graph Neural Networks (HGNNs) are powerful tools for deep learning on heterogeneous graphs.
Recent pre-computation-based HGNNs use one-time message passing to transform a heterogeneous graph into regular-shaped tensors.
We propose a hybrid pre-computation-based HGNN, named Random Projection Heterogeneous Graph Neural Network (RpHGNN)
arXiv Detail & Related papers (2023-10-23T01:25:44Z) - Scalable Graph Convolutional Network Training on Distributed-Memory
Systems [5.169989177779801]
Graph Convolutional Networks (GCNs) are extensively utilized for deep learning on graphs.
Since the convolution operation on graphs induces irregular memory access patterns, designing a memory- and communication-efficient parallel algorithm for GCN training poses unique challenges.
We propose a highly parallel training algorithm that scales to large processor counts.
arXiv Detail & Related papers (2022-12-09T17:51:13Z) - A Comprehensive Study on Large-Scale Graph Training: Benchmarking and
Rethinking [124.21408098724551]
Large-scale graph training is a notoriously challenging problem for graph neural networks (GNNs)
We present a new ensembling training manner, named EnGCN, to address the existing issues.
Our proposed method has achieved new state-of-the-art (SOTA) performance on large-scale datasets.
arXiv Detail & Related papers (2022-10-14T03:43:05Z) - A Unified Lottery Ticket Hypothesis for Graph Neural Networks [82.31087406264437]
We present a unified GNN sparsification (UGS) framework that simultaneously prunes the graph adjacency matrix and the model weights.
We further generalize the popular lottery ticket hypothesis to GNNs for the first time, by defining a graph lottery ticket (GLT) as a pair of core sub-dataset and sparse sub-network.
arXiv Detail & Related papers (2021-02-12T21:52:43Z) - Binary Graph Neural Networks [69.51765073772226]
Graph Neural Networks (GNNs) have emerged as a powerful and flexible framework for representation learning on irregular data.
In this paper, we present and evaluate different strategies for the binarization of graph neural networks.
We show that through careful design of the models, and control of the training process, binary graph neural networks can be trained at only a moderate cost in accuracy on challenging benchmarks.
arXiv Detail & Related papers (2020-12-31T18:48:58Z) - Robust Optimization as Data Augmentation for Large-scale Graphs [117.2376815614148]
We propose FLAG (Free Large-scale Adversarial Augmentation on Graphs), which iteratively augments node features with gradient-based adversarial perturbations during training.
FLAG is a general-purpose approach for graph data, which universally works in node classification, link prediction, and graph classification tasks.
arXiv Detail & Related papers (2020-10-19T21:51:47Z) - Fast Graph Attention Networks Using Effective Resistance Based Graph
Sparsification [70.50751397870972]
FastGAT is a method to make attention based GNNs lightweight by using spectral sparsification to generate an optimal pruning of the input graph.
We experimentally evaluate FastGAT on several large real world graph datasets for node classification tasks.
arXiv Detail & Related papers (2020-06-15T22:07:54Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.