Accelerating Training and Inference of Graph Neural Networks with Fast
Sampling and Pipelining
- URL: http://arxiv.org/abs/2110.08450v1
- Date: Sat, 16 Oct 2021 02:41:35 GMT
- Title: Accelerating Training and Inference of Graph Neural Networks with Fast
Sampling and Pipelining
- Authors: Tim Kaler, Nickolas Stathas, Anne Ouyang, Alexandros-Stavros
Iliopoulos, Tao B. Schardl, Charles E. Leiserson, Jie Chen
- Abstract summary: Mini-batch training of graph neural networks (GNNs) requires a lot of computation and data movement.
We argue in favor of performing mini-batch training with neighborhood sampling in a distributed multi-GPU environment.
We present a sequence of improvements to mitigate these bottlenecks, including a performance-engineered neighborhood sampler.
We also conduct an empirical analysis that supports the use of sampling for inference, showing that test accuracies are not materially compromised.
- Score: 58.10436813430554
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Improving the training and inference performance of graph neural networks
(GNNs) is faced with a challenge uncommon in general neural networks: creating
mini-batches requires a lot of computation and data movement due to the
exponential growth of multi-hop graph neighborhoods along network layers. Such
a unique challenge gives rise to a diverse set of system design choices. We
argue in favor of performing mini-batch training with neighborhood sampling in
a distributed multi-GPU environment, under which we identify major performance
bottlenecks hitherto under-explored by developers: mini-batch preparation and
transfer. We present a sequence of improvements to mitigate these bottlenecks,
including a performance-engineered neighborhood sampler, a shared-memory
parallelization strategy, and the pipelining of batch transfer with GPU
computation. We also conduct an empirical analysis that supports the use of
sampling for inference, showing that test accuracies are not materially
compromised. Such an observation unifies training and inference, simplifying
model implementation. We report comprehensive experimental results with several
benchmark data sets and GNN architectures, including a demonstration that, for
the ogbn-papers100M data set, our system SALIENT achieves a speedup of 3x over
a standard PyTorch-Geometric implementation with a single GPU and a further 8x
parallel speedup with 16 GPUs. Therein, training a 3-layer GraphSAGE model with
sampling fanout (15, 10, 5) takes 2.0 seconds per epoch and inference with
fanout (20, 20, 20) takes 2.4 seconds, attaining test accuracy 64.58%.
Related papers
- FusionLLM: A Decentralized LLM Training System on Geo-distributed GPUs with Adaptive Compression [55.992528247880685]
Decentralized training faces significant challenges regarding system design and efficiency.
We present FusionLLM, a decentralized training system designed and implemented for training large deep neural networks (DNNs)
We show that our system and method can achieve 1.45 - 9.39x speedup compared to baseline methods while ensuring convergence.
arXiv Detail & Related papers (2024-10-16T16:13:19Z) - Sampling-based Distributed Training with Message Passing Neural Network [1.1088875073103417]
We introduce a domain-decomposition-based distributed training and inference approach for message-passing neural networks (MPNN)
We present a scalable graph neural network, referred to as DS-MPNN (D and S standing for distributed and sampled), capable of scaling up to $O(105)$ nodes.
arXiv Detail & Related papers (2024-02-23T05:33:43Z) - Distributed Matrix-Based Sampling for Graph Neural Network Training [0.0]
We propose a matrix-based bulk sampling approach that expresses sampling as a sparse matrix multiplication (SpGEMM) and samples multiple minibatches at once.
When the input graph topology does not fit on a single device, our method distributes the graph and use communication-avoiding SpGEMM algorithms to scale GNN minibatch sampling.
In addition to new methods for sampling, we introduce a pipeline that uses our matrix-based bulk sampling approach to provide end-to-end training results.
arXiv Detail & Related papers (2023-11-06T06:40:43Z) - Communication-Free Distributed GNN Training with Vertex Cut [63.22674903170953]
CoFree-GNN is a novel distributed GNN training framework that significantly speeds up the training process by implementing communication-free training.
We demonstrate that CoFree-GNN speeds up the GNN training process by up to 10 times over the existing state-of-the-art GNN training approaches.
arXiv Detail & Related papers (2023-08-06T21:04:58Z) - A Comprehensive Study on Large-Scale Graph Training: Benchmarking and
Rethinking [124.21408098724551]
Large-scale graph training is a notoriously challenging problem for graph neural networks (GNNs)
We present a new ensembling training manner, named EnGCN, to address the existing issues.
Our proposed method has achieved new state-of-the-art (SOTA) performance on large-scale datasets.
arXiv Detail & Related papers (2022-10-14T03:43:05Z) - BGL: GPU-Efficient GNN Training by Optimizing Graph Data I/O and
Preprocessing [0.0]
Graph neural networks (GNNs) have extended the success of deep neural networks (DNNs) to non-Euclidean graph data.
Existing systems are inefficient to train large graphs with billions of nodes and edges with GPUs.
This paper proposes BGL, a distributed GNN training system designed to address the bottlenecks with a few key ideas.
arXiv Detail & Related papers (2021-12-16T00:37:37Z) - Global Neighbor Sampling for Mixed CPU-GPU Training on Giant Graphs [26.074384252289384]
Graph neural networks (GNNs) are powerful tools for learning from graph data and are widely used in various applications.
Despite a number of sampling-based methods have been proposed to enable mini-batch training on large graphs, these methods have not been proved to work on truly industry-scale graphs.
We propose Global Neighborhood Sampling that aims at training GNNs on giant graphs specifically for mixed- CPU-GPU training.
arXiv Detail & Related papers (2021-06-11T03:30:25Z) - Large Batch Simulation for Deep Reinforcement Learning [101.01408262583378]
We accelerate deep reinforcement learning-based training in visually complex 3D environments by two orders of magnitude over prior work.
We realize end-to-end training speeds of over 19,000 frames of experience per second on a single and up to 72,000 frames per second on a single eight- GPU machine.
By combining batch simulation and performance optimizations, we demonstrate that Point navigation agents can be trained in complex 3D environments on a single GPU in 1.5 days to 97% of the accuracy of agents trained on a prior state-of-the-art system.
arXiv Detail & Related papers (2021-03-12T00:22:50Z) - Scalable Graph Neural Networks via Bidirectional Propagation [89.70835710988395]
Graph Neural Networks (GNN) is an emerging field for learning on non-Euclidean data.
This paper presents GBP, a scalable GNN that utilizes a localized bidirectional propagation process from both the feature vectors and the training/testing nodes.
An empirical study demonstrates that GBP achieves state-of-the-art performance with significantly less training/testing time.
arXiv Detail & Related papers (2020-10-29T08:55:33Z) - Accurate, Efficient and Scalable Training of Graph Neural Networks [9.569918335816963]
Graph Neural Networks (GNNs) are powerful deep learning models to generate node embeddings on graphs.
It is still challenging to perform training in an efficient and scalable way.
We propose a novel parallel training framework that reduces training workload by orders of magnitude compared with state-of-the-art minibatch methods.
arXiv Detail & Related papers (2020-10-05T22:06:23Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.