Related papers: Accelerating Sampling and Aggregation Operations in GNN Frameworks with GPU Initiated Direct Storage Accesses

Accelerating Sampling and Aggregation Operations in GNN Frameworks with GPU Initiated Direct Storage Accesses

URL: http://arxiv.org/abs/2306.16384v2
Date: Wed, 6 Mar 2024 22:41:30 GMT
Title: Accelerating Sampling and Aggregation Operations in GNN Frameworks with GPU Initiated Direct Storage Accesses
Authors: Jeongmin Brian Park and Vikram Sharma Mailthody and Zaid Qureshi and Wen-mei Hwu
Abstract summary: Graph Neural Networks (GNNs) are emerging as a powerful tool for learning from graph-structured data. Training GNNs on large-scale graphs remains a significant challenge due to lack of efficient data access and data movement methods. We propose the GPU Initiated Direct Storage Access (GIDS) dataloader to enable GPU-oriented GNN training for large-scale graphs.
Score: 9.773813896475264
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Graph Neural Networks (GNNs) are emerging as a powerful tool for learning from graph-structured data and performing sophisticated inference tasks in various application domains. Although GNNs have been shown to be effective on modest-sized graphs, training them on large-scale graphs remains a significant challenge due to lack of efficient data access and data movement methods. Existing frameworks for training GNNs use CPUs for graph sampling and feature aggregation, while the training and updating of model weights are executed on GPUs. However, our in-depth profiling shows the CPUs cannot achieve the throughput required to saturate GNN model training throughput, causing gross under-utilization of expensive GPU resources. Furthermore, when the graph and its embeddings do not fit in the CPU memory, the overhead introduced by the operating system, say for handling page-faults, comes in the critical path of execution. To address these issues, we propose the GPU Initiated Direct Storage Access (GIDS) dataloader, to enable GPU-oriented GNN training for large-scale graphs while efficiently utilizing all hardware resources, such as CPU memory, storage, and GPU memory with a hybrid data placement strategy. By enabling GPU threads to fetch feature vectors directly from storage, GIDS dataloader solves the memory capacity problem for GPU-oriented GNN training. Moreover, GIDS dataloader leverages GPU parallelism to tolerate storage latency and eliminates expensive page-fault overhead. Doing so enables us to design novel optimizations for exploiting locality and increasing effective bandwidth for GNN training. Our evaluation using a single GPU on terabyte-scale GNN datasets shows that GIDS dataloader accelerates the overall DGL GNN training pipeline by up to 392X when compared to the current, state-of-the-art DGL dataloader.

Related papers

Stealing Training Graphs from Graph Neural Networks [54.52392250297907]
Graph Neural Networks (GNNs) have shown promising results in modeling graphs in various tasks. As neural networks can memorize the training samples, the model parameters of GNNs have a high risk of leaking private training data. We investigate a novel problem of stealing graphs from trained GNNs.
arXiv Detail & Related papers (2024-11-17T23:15:36Z)
FastGL: A GPU-Efficient Framework for Accelerating Sampling-Based GNN Training at Large Scale [29.272368697268433]
Graph Neural Networks (GNNs) have shown great superiority on non-Euclidean graph data. We propose FastGL, a GPU-efficient Framework for accelerating sampling-based training of GNN at Large scale. FastGL can achieve an average speedup of 11.8x, 2.2x and 1.5x over the state-of-the-art frameworks PyG, DGL, and GNNLab, respectively.
arXiv Detail & Related papers (2024-09-23T11:45:47Z)
LSM-GNN: Large-scale Storage-based Multi-GPU GNN Training by Optimizing Data Transfer Scheme [12.64360444043247]
Graph Neural Networks (GNNs) are widely used today in recommendation systems, fraud detection, and node/link classification tasks. To address limited memory capacities, traditional GNN training approaches use graph partitioning and sharding techniques. We propose Large-scale Storage-based Multi- GPU GNN framework (LSM-GNN) LSM-GNN incorporates a hybrid eviction policy that intelligently manages cache space by using both static and dynamic node information.
arXiv Detail & Related papers (2024-07-21T20:41:39Z)
Communication-Free Distributed GNN Training with Vertex Cut [63.22674903170953]
CoFree-GNN is a novel distributed GNN training framework that significantly speeds up the training process by implementing communication-free training. We demonstrate that CoFree-GNN speeds up the GNN training process by up to 10 times over the existing state-of-the-art GNN training approaches.
arXiv Detail & Related papers (2023-08-06T21:04:58Z)
DistTGL: Distributed Memory-Based Temporal Graph Neural Network Training [18.52206409432894]
DistTGL is an efficient and scalable solution to train memory-based TGNNs on distributed GPU clusters. In experiments, DistTGL achieves near-linear convergence speedup, outperforming state-of-the-art single-machine method by 14.5% in accuracy and 10.17x in training throughput.
arXiv Detail & Related papers (2023-07-14T22:52:27Z)
Communication-Efficient Graph Neural Networks with Probabilistic Neighborhood Expansion Analysis and Caching [59.8522166385372]
Training and inference with graph neural networks (GNNs) on massive graphs has been actively studied since the inception of GNNs. This paper is concerned with minibatch training and inference with GNNs that employ node-wise sampling in distributed settings. We present SALIENT++, which extends the prior state-of-the-art SALIENT system to work with partitioned feature data.
arXiv Detail & Related papers (2023-05-04T21:04:01Z)
A Comprehensive Study on Large-Scale Graph Training: Benchmarking and Rethinking [124.21408098724551]
Large-scale graph training is a notoriously challenging problem for graph neural networks (GNNs) We present a new ensembling training manner, named EnGCN, to address the existing issues. Our proposed method has achieved new state-of-the-art (SOTA) performance on large-scale datasets.
arXiv Detail & Related papers (2022-10-14T03:43:05Z)
Scaling R-GCN Training with Graph Summarization [71.06855946732296]
Training of Relation Graph Convolutional Networks (R-GCN) does not scale well with the size of the graph. In this work, we experiment with the use of graph summarization techniques to compress the graph. We obtain reasonable results on the AIFB, MUTAG and AM datasets.
arXiv Detail & Related papers (2022-03-05T00:28:43Z)
BGL: GPU-Efficient GNN Training by Optimizing Graph Data I/O and Preprocessing [0.0]
Graph neural networks (GNNs) have extended the success of deep neural networks (DNNs) to non-Euclidean graph data. Existing systems are inefficient to train large graphs with billions of nodes and edges with GPUs. This paper proposes BGL, a distributed GNN training system designed to address the bottlenecks with a few key ideas.
arXiv Detail & Related papers (2021-12-16T00:37:37Z)
MG-GCN: Scalable Multi-GPU GCN Training Framework [1.7188280334580197]
Full batch training of Graph Convolutional Network (GCN) models is not feasible on a single GPU for large graphs. MG-GCN employs multiple High-Performance Computing optimizations, including efficient re-use of memory buffers. MG-GCN achieves super-linear speedup with respect to DGL, on the Reddit graph on both DGX-1 (V100) and DGX-A100.
arXiv Detail & Related papers (2021-10-17T00:41:43Z)
DistGNN: Scalable Distributed Training for Large-Scale Graph Neural Networks [58.48833325238537]
Full-batch training on Graph Neural Networks (GNN) to learn the structure of large graphs is a critical problem that needs to scale to hundreds of compute nodes to be feasible. In this paper, we presentGNN that optimize the well-known Deep Graph Library (DGL) for full-batch training on CPU clusters. Our results on four common GNN benchmark datasets show up to 3.7x speed-up using a single CPU socket and up to 97x speed-up using 128 CPU sockets.
arXiv Detail & Related papers (2021-04-14T08:46:35Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.