Benchmarking GNN-Based Recommender Systems on Intel Optane Persistent
Memory
- URL: http://arxiv.org/abs/2207.11918v1
- Date: Mon, 25 Jul 2022 06:08:24 GMT
- Title: Benchmarking GNN-Based Recommender Systems on Intel Optane Persistent
Memory
- Authors: Yuwei Hu, Jiajie Li, Zhongming Yu, Zhiru Zhang
- Abstract summary: Graph neural networks (GNNs) have emerged as effective method for handling machine learning tasks on graphs.
Training GNN-based recommender systems (GNNRecSys) on large graphs incurs a large memory footprint.
We show that single-machine Optane-based GNNRecSys training outperforms distributed training by a large margin.
- Score: 9.216391057418566
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Graph neural networks (GNNs), which have emerged as an effective method for
handling machine learning tasks on graphs, bring a new approach to building
recommender systems, where the task of recommendation can be formulated as the
link prediction problem on user-item bipartite graphs. Training GNN-based
recommender systems (GNNRecSys) on large graphs incurs a large memory
footprint, easily exceeding the DRAM capacity on a typical server. Existing
solutions resort to distributed subgraph training, which is inefficient due to
the high cost of dynamically constructing subgraphs and significant redundancy
across subgraphs.
The emerging Intel Optane persistent memory allows a single machine to have
up to 6 TB of memory at an affordable cost, thus making single-machine
GNNRecSys training feasible, which eliminates the inefficiencies in distributed
training. One major concern of using Optane for GNNRecSys is Optane's
relatively low bandwidth compared with DRAMs. This limitation can be
particularly detrimental to achieving high performance for GNNRecSys workloads
since their dominant compute kernels are sparse and memory access intensive. To
understand whether Optane is a good fit for GNNRecSys training, we perform an
in-depth characterization of GNNRecSys workloads and a comprehensive
benchmarking study. Our benchmarking results show that when properly
configured, Optane-based single-machine GNNRecSys training outperforms
distributed training by a large margin, especially when handling deep GNN
models. We analyze where the speedup comes from, provide guidance on how to
configure Optane for GNNRecSys workloads, and discuss opportunities for further
optimizations.
Related papers
- Reducing Memory Contention and I/O Congestion for Disk-based GNN Training [6.492879435794228]
Graph neural networks (GNNs) gain wide popularity. Large graphs with high-dimensional features become common and training GNNs on them is non-trivial.
Given a gigantic graph, even sample-based GNN training cannot work efficiently, since it is difficult to keep the graph's entire data in memory during the training process.
Memory and I/Os are hence critical for effectual disk-based training.
arXiv Detail & Related papers (2024-06-20T04:24:51Z) - Pre-Training Identification of Graph Winning Tickets in Adaptive Spatial-Temporal Graph Neural Networks [5.514795777097036]
We introduce the concept of the Graph Winning Ticket (GWT), derived from the Lottery Ticket Hypothesis (LTH)
By adopting a pre-determined star topology as a GWT prior to training, we balance edge reduction with efficient information propagation.
Our approach enables training ASTGNNs on the largest scale spatial-temporal dataset using a single A6000 equipped with 48 GB of memory.
arXiv Detail & Related papers (2024-06-12T14:53:23Z) - Cached Operator Reordering: A Unified View for Fast GNN Training [24.917363701638607]
Graph Neural Networks (GNNs) are a powerful tool for handling structured graph data and addressing tasks such as node classification, graph classification, and clustering.
However, the sparse nature of GNN computation poses new challenges for performance optimization compared to traditional deep neural networks.
We address these challenges by providing a unified view of GNN computation, I/O, and memory.
arXiv Detail & Related papers (2023-08-23T12:27:55Z) - Intelligence Processing Units Accelerate Neuromorphic Learning [52.952192990802345]
Spiking neural networks (SNNs) have achieved orders of magnitude improvement in terms of energy consumption and latency.
We present an IPU-optimized release of our custom SNN Python package, snnTorch.
arXiv Detail & Related papers (2022-11-19T15:44:08Z) - A Comprehensive Study on Large-Scale Graph Training: Benchmarking and
Rethinking [124.21408098724551]
Large-scale graph training is a notoriously challenging problem for graph neural networks (GNNs)
We present a new ensembling training manner, named EnGCN, to address the existing issues.
Our proposed method has achieved new state-of-the-art (SOTA) performance on large-scale datasets.
arXiv Detail & Related papers (2022-10-14T03:43:05Z) - Ginex: SSD-enabled Billion-scale Graph Neural Network Training on a
Single Machine via Provably Optimal In-memory Caching [3.0479527348064197]
Graph Neural Networks (GNNs) have been receiving a spotlight as a powerful tool that can effectively serve various graph tasks on structured data.
As the size of real-world graphs continues to scale, the GNN training system faces a scalability challenge.
We propose Ginex, the first SSD-based GNN training system that can process billion-scale graph datasets on a single machine.
arXiv Detail & Related papers (2022-08-19T04:57:18Z) - Sequential Aggregation and Rematerialization: Distributed Full-batch
Training of Graph Neural Networks on Large Graphs [7.549360351036771]
We present the Sequential Aggregation and Rematerialization (SAR) scheme for distributed full-batch training of Graph Neural Networks (GNNs) on large graphs.
SAR is a distributed technique that can train any GNN type directly on an entire large graph.
We also present a general technique based on kernel fusion and attention-matrix rematerialization to optimize both the runtime and memory efficiency of attention-based models.
arXiv Detail & Related papers (2021-11-11T22:27:59Z) - Binary Graph Neural Networks [69.51765073772226]
Graph Neural Networks (GNNs) have emerged as a powerful and flexible framework for representation learning on irregular data.
In this paper, we present and evaluate different strategies for the binarization of graph neural networks.
We show that through careful design of the models, and control of the training process, binary graph neural networks can be trained at only a moderate cost in accuracy on challenging benchmarks.
arXiv Detail & Related papers (2020-12-31T18:48:58Z) - Optimizing Memory Placement using Evolutionary Graph Reinforcement
Learning [56.83172249278467]
We introduce Evolutionary Graph Reinforcement Learning (EGRL), a method designed for large search spaces.
We train and validate our approach directly on the Intel NNP-I chip for inference.
We additionally achieve 28-78% speed-up compared to the native NNP-I compiler on all three workloads.
arXiv Detail & Related papers (2020-07-14T18:50:12Z) - Fast Graph Attention Networks Using Effective Resistance Based Graph
Sparsification [70.50751397870972]
FastGAT is a method to make attention based GNNs lightweight by using spectral sparsification to generate an optimal pruning of the input graph.
We experimentally evaluate FastGAT on several large real world graph datasets for node classification tasks.
arXiv Detail & Related papers (2020-06-15T22:07:54Z) - Binarized Graph Neural Network [65.20589262811677]
We develop a binarized graph neural network to learn the binary representations of the nodes with binary network parameters.
Our proposed method can be seamlessly integrated into the existing GNN-based embedding approaches.
Experiments indicate that the proposed binarized graph neural network, namely BGN, is orders of magnitude more efficient in terms of both time and space.
arXiv Detail & Related papers (2020-04-19T09:43:14Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.