Reducing Memory Contention and I/O Congestion for Disk-based GNN Training
- URL: http://arxiv.org/abs/2406.13984v1
- Date: Thu, 20 Jun 2024 04:24:51 GMT
- Title: Reducing Memory Contention and I/O Congestion for Disk-based GNN Training
- Authors: Qisheng Jiang, Lei Jia, Chundong Wang,
- Abstract summary: Graph neural networks (GNNs) gain wide popularity. Large graphs with high-dimensional features become common and training GNNs on them is non-trivial.
Given a gigantic graph, even sample-based GNN training cannot work efficiently, since it is difficult to keep the graph's entire data in memory during the training process.
Memory and I/Os are hence critical for effectual disk-based training.
- Score: 6.492879435794228
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Graph neural networks (GNNs) gain wide popularity. Large graphs with high-dimensional features become common and training GNNs on them is non-trivial on an ordinary machine. Given a gigantic graph, even sample-based GNN training cannot work efficiently, since it is difficult to keep the graph's entire data in memory during the training process. Leveraging a solid-state drive (SSD) or other storage devices to extend the memory space has been studied in training GNNs. Memory and I/Os are hence critical for effectual disk-based training. We find that state-of-the-art (SoTA) disk-based GNN training systems severely suffer from issues like the memory contention between a graph's topological and feature data, and severe I/O congestion upon loading data from SSD for training. We accordingly develop GNNDrive. GNNDrive 1) minimizes the memory footprint with holistic buffer management across sampling and extracting, and 2) avoids I/O congestion through a strategy of asynchronous feature extraction. It also avoids costly data preparation on the critical path and makes the most of software and hardware resources. Experiments show that GNNDrive achieves superior performance. For example, when training with the Papers100M dataset and GraphSAGE model, GNNDrive is faster than SoTA PyG+, Ginex, and MariusGNN by 16.9x, 2.6x, and 2.7x, respectively.
Related papers
- DiskGNN: Bridging I/O Efficiency and Model Accuracy for Out-of-Core GNN Training [12.945647145403438]
Graph neural networks (GNNs) are machine learning models specialized for graph data and widely used in many applications.
DiskGNN achieves high I/O efficiency and thus fast training without hurting model accuracy.
We compare DiskGNN with Ginex and MariusGNN, which are state-of-the-art systems for out-of-core GNN training.
arXiv Detail & Related papers (2024-05-08T17:27:11Z) - CATGNN: Cost-Efficient and Scalable Distributed Training for Graph Neural Networks [7.321893519281194]
Existing distributed systems load the entire graph in memory for graph partitioning.
We propose CATGNN, a cost-efficient and scalable distributed GNN training system.
We also propose a novel streaming partitioning algorithm named SPRING for distributed GNN training.
arXiv Detail & Related papers (2024-04-02T20:55:39Z) - Communication-Free Distributed GNN Training with Vertex Cut [63.22674903170953]
CoFree-GNN is a novel distributed GNN training framework that significantly speeds up the training process by implementing communication-free training.
We demonstrate that CoFree-GNN speeds up the GNN training process by up to 10 times over the existing state-of-the-art GNN training approaches.
arXiv Detail & Related papers (2023-08-06T21:04:58Z) - Accelerating Sampling and Aggregation Operations in GNN Frameworks with
GPU Initiated Direct Storage Accesses [9.773813896475264]
Graph Neural Networks (GNNs) are emerging as a powerful tool for learning from graph-structured data.
Training GNNs on large-scale graphs remains a significant challenge due to lack of efficient data access and data movement methods.
We propose the GPU Initiated Direct Storage Access (GIDS) dataloader to enable GPU-oriented GNN training for large-scale graphs.
arXiv Detail & Related papers (2023-06-28T17:22:15Z) - A Comprehensive Study on Large-Scale Graph Training: Benchmarking and
Rethinking [124.21408098724551]
Large-scale graph training is a notoriously challenging problem for graph neural networks (GNNs)
We present a new ensembling training manner, named EnGCN, to address the existing issues.
Our proposed method has achieved new state-of-the-art (SOTA) performance on large-scale datasets.
arXiv Detail & Related papers (2022-10-14T03:43:05Z) - Ginex: SSD-enabled Billion-scale Graph Neural Network Training on a
Single Machine via Provably Optimal In-memory Caching [3.0479527348064197]
Graph Neural Networks (GNNs) have been receiving a spotlight as a powerful tool that can effectively serve various graph tasks on structured data.
As the size of real-world graphs continues to scale, the GNN training system faces a scalability challenge.
We propose Ginex, the first SSD-based GNN training system that can process billion-scale graph datasets on a single machine.
arXiv Detail & Related papers (2022-08-19T04:57:18Z) - Benchmarking GNN-Based Recommender Systems on Intel Optane Persistent
Memory [9.216391057418566]
Graph neural networks (GNNs) have emerged as effective method for handling machine learning tasks on graphs.
Training GNN-based recommender systems (GNNRecSys) on large graphs incurs a large memory footprint.
We show that single-machine Optane-based GNNRecSys training outperforms distributed training by a large margin.
arXiv Detail & Related papers (2022-07-25T06:08:24Z) - A Unified Lottery Ticket Hypothesis for Graph Neural Networks [82.31087406264437]
We present a unified GNN sparsification (UGS) framework that simultaneously prunes the graph adjacency matrix and the model weights.
We further generalize the popular lottery ticket hypothesis to GNNs for the first time, by defining a graph lottery ticket (GLT) as a pair of core sub-dataset and sparse sub-network.
arXiv Detail & Related papers (2021-02-12T21:52:43Z) - Binary Graph Neural Networks [69.51765073772226]
Graph Neural Networks (GNNs) have emerged as a powerful and flexible framework for representation learning on irregular data.
In this paper, we present and evaluate different strategies for the binarization of graph neural networks.
We show that through careful design of the models, and control of the training process, binary graph neural networks can be trained at only a moderate cost in accuracy on challenging benchmarks.
arXiv Detail & Related papers (2020-12-31T18:48:58Z) - Optimizing Memory Placement using Evolutionary Graph Reinforcement
Learning [56.83172249278467]
We introduce Evolutionary Graph Reinforcement Learning (EGRL), a method designed for large search spaces.
We train and validate our approach directly on the Intel NNP-I chip for inference.
We additionally achieve 28-78% speed-up compared to the native NNP-I compiler on all three workloads.
arXiv Detail & Related papers (2020-07-14T18:50:12Z) - GPT-GNN: Generative Pre-Training of Graph Neural Networks [93.35945182085948]
Graph neural networks (GNNs) have been demonstrated to be powerful in modeling graph-structured data.
We present the GPT-GNN framework to initialize GNNs by generative pre-training.
We show that GPT-GNN significantly outperforms state-of-the-art GNN models without pre-training by up to 9.1% across various downstream tasks.
arXiv Detail & Related papers (2020-06-27T20:12:33Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.