PyTorch-Direct: Enabling GPU Centric Data Access for Very Large Graph
Neural Network Training with Irregular Accesses
- URL: http://arxiv.org/abs/2101.07956v1
- Date: Wed, 20 Jan 2021 04:24:39 GMT
- Title: PyTorch-Direct: Enabling GPU Centric Data Access for Very Large Graph
Neural Network Training with Irregular Accesses
- Authors: Seung Won Min, Kun Wu, Sitao Huang, Mert Hidayeto\u{g}lu, Jinjun
Xiong, Eiman Ebrahimi, Deming Chen, Wen-mei Hwu
- Abstract summary: We introduce PyTorch-Direct, which enables a GPU-centric data accessing paradigm for graph neural networks (GNNs) training.
Our microbenchmark and end-to-end GNN training results show that PyTorch-Direct reduces data transfer time by 47.1% on average and speeds up GNN training by up to 1.6x.
- Score: 19.2129567657739
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: With the increasing adoption of graph neural networks (GNNs) in the machine
learning community, GPUs have become an essential tool to accelerate GNN
training. However, training GNNs on very large graphs that do not fit in GPU
memory is still a challenging task. Unlike conventional neural networks,
mini-batching input samples in GNNs requires complicated tasks such as
traversing neighboring nodes and gathering their feature values. While this
process accounts for a significant portion of the training time, we find
existing GNN implementations using popular deep neural network (DNN) libraries
such as PyTorch are limited to a CPU-centric approach for the entire data
preparation step. This "all-in-CPU" approach has negative impact on the overall
GNN training performance as it over-utilizes CPU resources and hinders GPU
acceleration of GNN training. To overcome such limitations, we introduce
PyTorch-Direct, which enables a GPU-centric data accessing paradigm for GNN
training. In PyTorch-Direct, GPUs are capable of efficiently accessing
complicated data structures in host memory directly without CPU intervention.
Our microbenchmark and end-to-end GNN training results show that PyTorch-Direct
reduces data transfer time by 47.1% on average and speeds up GNN training by up
to 1.6x. Furthermore, by reducing CPU utilization, PyTorch-Direct also saves
system power by 12.4% to 17.5% during training. To minimize programmer effort,
we introduce a new "unified tensor" type along with necessary changes to the
PyTorch memory allocator, dispatch logic, and placement rules. As a result,
users need to change at most two lines of their PyTorch GNN training code for
each tensor object to take advantage of PyTorch-Direct.
Related papers
- SpanGNN: Towards Memory-Efficient Graph Neural Networks via Spanning Subgraph Training [14.63975787929143]
Graph Neural Networks (GNNs) have superior capability in learning graph data.
Full-graph GNN training generally has high accuracy, however, it suffers from large peak memory usage.
We propose a new memory-efficient GNN training method using spanning subgraph, called SpanGNN.
arXiv Detail & Related papers (2024-06-07T13:46:23Z) - iSpLib: A Library for Accelerating Graph Neural Networks using Auto-tuned Sparse Operations [1.3030767447016454]
iSpLib is a PyTorch-based C++ library equipped with auto-tuned sparse operations.
We demonstrate that iSpLib obtains up to 27x overall training speedup compared to the equivalent PyTorch 2.1.0 and PyTorch Geometric 2.4.0 implementations on the CPU.
arXiv Detail & Related papers (2024-03-21T21:56:44Z) - Accelerating Sampling and Aggregation Operations in GNN Frameworks with
GPU Initiated Direct Storage Accesses [9.773813896475264]
Graph Neural Networks (GNNs) are emerging as a powerful tool for learning from graph-structured data.
Training GNNs on large-scale graphs remains a significant challenge due to lack of efficient data access and data movement methods.
We propose the GPU Initiated Direct Storage Access (GIDS) dataloader to enable GPU-oriented GNN training for large-scale graphs.
arXiv Detail & Related papers (2023-06-28T17:22:15Z) - You Can Have Better Graph Neural Networks by Not Training Weights at
All: Finding Untrained GNNs Tickets [105.24703398193843]
Untrainedworks in graph neural networks (GNNs) still remains mysterious.
We show that the found untrainedworks can substantially mitigate the GNN over-smoothing problem.
We also observe that such sparse untrainedworks have appealing performance in out-of-distribution detection and robustness of input perturbations.
arXiv Detail & Related papers (2022-11-28T14:17:36Z) - Distributed Graph Neural Network Training: A Survey [51.77035975191926]
Graph neural networks (GNNs) are a type of deep learning models that are trained on graphs and have been successfully applied in various domains.
Despite the effectiveness of GNNs, it is still challenging for GNNs to efficiently scale to large graphs.
As a remedy, distributed computing becomes a promising solution of training large-scale GNNs.
arXiv Detail & Related papers (2022-11-01T01:57:00Z) - TC-GNN: Bridging Sparse GNN Computation and Dense Tensor Cores on GPUs [21.63854538768414]
We propose TC-GNN, the first GNN framework based on GPU Core Units (TCUs)
The core idea is to reconcile the "Sparse" GNN with the high-performance "Dense" TCUs.
Rigorous experiments show an average of 1.70 speedup over the state-of-the-art DGL framework.
arXiv Detail & Related papers (2021-12-03T18:06:23Z) - Training Graph Neural Networks with 1000 Layers [133.84813995275988]
We study reversible connections, group convolutions, weight tying, and equilibrium models to advance the memory and parameter efficiency of GNNs.
To the best of our knowledge, RevGNN-Deep is the deepest GNN in the literature by one order of magnitude.
arXiv Detail & Related papers (2021-06-14T15:03:00Z) - DistGNN: Scalable Distributed Training for Large-Scale Graph Neural
Networks [58.48833325238537]
Full-batch training on Graph Neural Networks (GNN) to learn the structure of large graphs is a critical problem that needs to scale to hundreds of compute nodes to be feasible.
In this paper, we presentGNN that optimize the well-known Deep Graph Library (DGL) for full-batch training on CPU clusters.
Our results on four common GNN benchmark datasets show up to 3.7x speed-up using a single CPU socket and up to 97x speed-up using 128 CPU sockets.
arXiv Detail & Related papers (2021-04-14T08:46:35Z) - BlockGNN: Towards Efficient GNN Acceleration Using Block-Circulant
Weight Matrices [9.406007544032848]
Graph Neural Networks (GNNs) are state-of-the-art algorithms for analyzing non-euclidean graph data.
How to inference GNNs in real time has become a challenging problem for some resource-limited edge-computing platforms.
We propose BlockGNN, a software- hardware co-design approach to realize efficient GNN acceleration.
arXiv Detail & Related papers (2021-04-13T14:09:22Z) - A Unified Lottery Ticket Hypothesis for Graph Neural Networks [82.31087406264437]
We present a unified GNN sparsification (UGS) framework that simultaneously prunes the graph adjacency matrix and the model weights.
We further generalize the popular lottery ticket hypothesis to GNNs for the first time, by defining a graph lottery ticket (GLT) as a pair of core sub-dataset and sparse sub-network.
arXiv Detail & Related papers (2021-02-12T21:52:43Z) - Hybrid Models for Learning to Branch [81.93868699246214]
We propose a new hybrid architecture for efficient branching on CPU machines.
The proposed architecture combines the expressive power of GNNs with computationally inexpensive multi-layer perceptrons (MLP) for branching.
arXiv Detail & Related papers (2020-06-26T21:03:45Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.