Global Neighbor Sampling for Mixed CPU-GPU Training on Giant Graphs
- URL: http://arxiv.org/abs/2106.06150v1
- Date: Fri, 11 Jun 2021 03:30:25 GMT
- Title: Global Neighbor Sampling for Mixed CPU-GPU Training on Giant Graphs
- Authors: Jialin Dong, Da Zheng, Lin F. Yang, Geroge Karypis
- Abstract summary: Graph neural networks (GNNs) are powerful tools for learning from graph data and are widely used in various applications.
Despite a number of sampling-based methods have been proposed to enable mini-batch training on large graphs, these methods have not been proved to work on truly industry-scale graphs.
We propose Global Neighborhood Sampling that aims at training GNNs on giant graphs specifically for mixed- CPU-GPU training.
- Score: 26.074384252289384
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Graph neural networks (GNNs) are powerful tools for learning from graph data
and are widely used in various applications such as social network
recommendation, fraud detection, and graph search. The graphs in these
applications are typically large, usually containing hundreds of millions of
nodes. Training GNN models on such large graphs efficiently remains a big
challenge. Despite a number of sampling-based methods have been proposed to
enable mini-batch training on large graphs, these methods have not been proved
to work on truly industry-scale graphs, which require GPUs or mixed-CPU-GPU
training. The state-of-the-art sampling-based methods are usually not optimized
for these real-world hardware setups, in which data movement between CPUs and
GPUs is a bottleneck. To address this issue, we propose Global Neighborhood
Sampling that aims at training GNNs on giant graphs specifically for
mixed-CPU-GPU training. The algorithm samples a global cache of nodes
periodically for all mini-batches and stores them in GPUs. This global cache
allows in-GPU importance sampling of mini-batches, which drastically reduces
the number of nodes in a mini-batch, especially in the input layer, to reduce
data copy between CPU and GPU and mini-batch computation without compromising
the training convergence rate or model accuracy. We provide a highly efficient
implementation of this method and show that our implementation outperforms an
efficient node-wise neighbor sampling baseline by a factor of 2X-4X on giant
graphs. It outperforms an efficient implementation of LADIES with small layers
by a factor of 2X-14X while achieving much higher accuracy than LADIES.We also
theoretically analyze the proposed algorithm and show that with cached node
data of a proper size, it enjoys a comparable convergence rate as the
underlying node-wise sampling method.
Related papers
- Graph Transformers for Large Graphs [57.19338459218758]
This work advances representation learning on single large-scale graphs with a focus on identifying model characteristics and critical design constraints.
A key innovation of this work lies in the creation of a fast neighborhood sampling technique coupled with a local attention mechanism.
We report a 3x speedup and 16.8% performance gain on ogbn-products and snap-patents, while we also scale LargeGT on ogbn-100M with a 5.9% performance improvement.
arXiv Detail & Related papers (2023-12-18T11:19:23Z) - Distributed Matrix-Based Sampling for Graph Neural Network Training [0.0]
We propose a matrix-based bulk sampling approach that expresses sampling as a sparse matrix multiplication (SpGEMM) and samples multiple minibatches at once.
When the input graph topology does not fit on a single device, our method distributes the graph and use communication-avoiding SpGEMM algorithms to scale GNN minibatch sampling.
In addition to new methods for sampling, we introduce a pipeline that uses our matrix-based bulk sampling approach to provide end-to-end training results.
arXiv Detail & Related papers (2023-11-06T06:40:43Z) - Efficient Heterogeneous Graph Learning via Random Projection [58.4138636866903]
Heterogeneous Graph Neural Networks (HGNNs) are powerful tools for deep learning on heterogeneous graphs.
Recent pre-computation-based HGNNs use one-time message passing to transform a heterogeneous graph into regular-shaped tensors.
We propose a hybrid pre-computation-based HGNN, named Random Projection Heterogeneous Graph Neural Network (RpHGNN)
arXiv Detail & Related papers (2023-10-23T01:25:44Z) - DistTGL: Distributed Memory-Based Temporal Graph Neural Network Training [18.52206409432894]
DistTGL is an efficient and scalable solution to train memory-based TGNNs on distributed GPU clusters.
In experiments, DistTGL achieves near-linear convergence speedup, outperforming state-of-the-art single-machine method by 14.5% in accuracy and 10.17x in training throughput.
arXiv Detail & Related papers (2023-07-14T22:52:27Z) - Quiver: Supporting GPUs for Low-Latency, High-Throughput GNN Serving
with Workload Awareness [4.8412870364335925]
Quiver is a distributed GPU-based GNN serving system with low-latency and high- throughput.
We show that Quiver achieves up to 35 times lower latency with an 8 times higher throughput compared to state-of-the-art GNN approaches.
arXiv Detail & Related papers (2023-05-18T10:34:23Z) - A Comprehensive Study on Large-Scale Graph Training: Benchmarking and
Rethinking [124.21408098724551]
Large-scale graph training is a notoriously challenging problem for graph neural networks (GNNs)
We present a new ensembling training manner, named EnGCN, to address the existing issues.
Our proposed method has achieved new state-of-the-art (SOTA) performance on large-scale datasets.
arXiv Detail & Related papers (2022-10-14T03:43:05Z) - BGL: GPU-Efficient GNN Training by Optimizing Graph Data I/O and
Preprocessing [0.0]
Graph neural networks (GNNs) have extended the success of deep neural networks (DNNs) to non-Euclidean graph data.
Existing systems are inefficient to train large graphs with billions of nodes and edges with GPUs.
This paper proposes BGL, a distributed GNN training system designed to address the bottlenecks with a few key ideas.
arXiv Detail & Related papers (2021-12-16T00:37:37Z) - Accelerating Training and Inference of Graph Neural Networks with Fast
Sampling and Pipelining [58.10436813430554]
Mini-batch training of graph neural networks (GNNs) requires a lot of computation and data movement.
We argue in favor of performing mini-batch training with neighborhood sampling in a distributed multi-GPU environment.
We present a sequence of improvements to mitigate these bottlenecks, including a performance-engineered neighborhood sampler.
We also conduct an empirical analysis that supports the use of sampling for inference, showing that test accuracies are not materially compromised.
arXiv Detail & Related papers (2021-10-16T02:41:35Z) - DistGNN: Scalable Distributed Training for Large-Scale Graph Neural
Networks [58.48833325238537]
Full-batch training on Graph Neural Networks (GNN) to learn the structure of large graphs is a critical problem that needs to scale to hundreds of compute nodes to be feasible.
In this paper, we presentGNN that optimize the well-known Deep Graph Library (DGL) for full-batch training on CPU clusters.
Our results on four common GNN benchmark datasets show up to 3.7x speed-up using a single CPU socket and up to 97x speed-up using 128 CPU sockets.
arXiv Detail & Related papers (2021-04-14T08:46:35Z) - Scaling Graph Neural Networks with Approximate PageRank [64.92311737049054]
We present the PPRGo model which utilizes an efficient approximation of information diffusion in GNNs.
In addition to being faster, PPRGo is inherently scalable, and can be trivially parallelized for large datasets like those found in industry settings.
We show that training PPRGo and predicting labels for all nodes in this graph takes under 2 minutes on a single machine, far outpacing other baselines on the same graph.
arXiv Detail & Related papers (2020-07-03T09:30:07Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.