Related papers: Benchmarking GNN-Based Recommender Systems on Intel Optane Persistent Memory

Benchmarking GNN-Based Recommender Systems on Intel Optane Persistent Memory

URL: http://arxiv.org/abs/2207.11918v1
Date: Mon, 25 Jul 2022 06:08:24 GMT
Title: Benchmarking GNN-Based Recommender Systems on Intel Optane Persistent Memory
Authors: Yuwei Hu, Jiajie Li, Zhongming Yu, Zhiru Zhang
Abstract summary: Graph neural networks (GNNs) have emerged as effective method for handling machine learning tasks on graphs. Training GNN-based recommender systems (GNNRecSys) on large graphs incurs a large memory footprint. We show that single-machine Optane-based GNNRecSys training outperforms distributed training by a large margin.
Score: 9.216391057418566
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Graph neural networks (GNNs), which have emerged as an effective method for handling machine learning tasks on graphs, bring a new approach to building recommender systems, where the task of recommendation can be formulated as the link prediction problem on user-item bipartite graphs. Training GNN-based recommender systems (GNNRecSys) on large graphs incurs a large memory footprint, easily exceeding the DRAM capacity on a typical server. Existing solutions resort to distributed subgraph training, which is inefficient due to the high cost of dynamically constructing subgraphs and significant redundancy across subgraphs. The emerging Intel Optane persistent memory allows a single machine to have up to 6 TB of memory at an affordable cost, thus making single-machine GNNRecSys training feasible, which eliminates the inefficiencies in distributed training. One major concern of using Optane for GNNRecSys is Optane's relatively low bandwidth compared with DRAMs. This limitation can be particularly detrimental to achieving high performance for GNNRecSys workloads since their dominant compute kernels are sparse and memory access intensive. To understand whether Optane is a good fit for GNNRecSys training, we perform an in-depth characterization of GNNRecSys workloads and a comprehensive benchmarking study. Our benchmarking results show that when properly configured, Optane-based single-machine GNNRecSys training outperforms distributed training by a large margin, especially when handling deep GNN models. We analyze where the speedup comes from, provide guidance on how to configure Optane for GNNRecSys workloads, and discuss opportunities for further optimizations.

Related papers

Reducing Memory Contention and I/O Congestion for Disk-based GNN Training [6.492879435794228]
Graph neural networks (GNNs) gain wide popularity. Large graphs with high-dimensional features become common and training GNNs on them is non-trivial. Given a gigantic graph, even sample-based GNN training cannot work efficiently, since it is difficult to keep the graph's entire data in memory during the training process. Memory and I/Os are hence critical for effectual disk-based training.
arXiv Detail & Related papers (2024-06-20T04:24:51Z)
Pre-Training Identification of Graph Winning Tickets in Adaptive Spatial-Temporal Graph Neural Networks [5.514795777097036]
We introduce the concept of the Graph Winning Ticket (GWT), derived from the Lottery Ticket Hypothesis (LTH) By adopting a pre-determined star topology as a GWT prior to training, we balance edge reduction with efficient information propagation. Our approach enables training ASTGNNs on the largest scale spatial-temporal dataset using a single A6000 equipped with 48 GB of memory.
arXiv Detail & Related papers (2024-06-12T14:53:23Z)
Cached Operator Reordering: A Unified View for Fast GNN Training [24.917363701638607]
Graph Neural Networks (GNNs) are a powerful tool for handling structured graph data and addressing tasks such as node classification, graph classification, and clustering. However, the sparse nature of GNN computation poses new challenges for performance optimization compared to traditional deep neural networks. We address these challenges by providing a unified view of GNN computation, I/O, and memory.
arXiv Detail & Related papers (2023-08-23T12:27:55Z)
Intelligence Processing Units Accelerate Neuromorphic Learning [52.952192990802345]
Spiking neural networks (SNNs) have achieved orders of magnitude improvement in terms of energy consumption and latency. We present an IPU-optimized release of our custom SNN Python package, snnTorch.
arXiv Detail & Related papers (2022-11-19T15:44:08Z)
A Comprehensive Study on Large-Scale Graph Training: Benchmarking and Rethinking [124.21408098724551]
Large-scale graph training is a notoriously challenging problem for graph neural networks (GNNs) We present a new ensembling training manner, named EnGCN, to address the existing issues. Our proposed method has achieved new state-of-the-art (SOTA) performance on large-scale datasets.
arXiv Detail & Related papers (2022-10-14T03:43:05Z)
Ginex: SSD-enabled Billion-scale Graph Neural Network Training on a Single Machine via Provably Optimal In-memory Caching [3.0479527348064197]
Graph Neural Networks (GNNs) have been receiving a spotlight as a powerful tool that can effectively serve various graph tasks on structured data. As the size of real-world graphs continues to scale, the GNN training system faces a scalability challenge. We propose Ginex, the first SSD-based GNN training system that can process billion-scale graph datasets on a single machine.
arXiv Detail & Related papers (2022-08-19T04:57:18Z)
Sequential Aggregation and Rematerialization: Distributed Full-batch Training of Graph Neural Networks on Large Graphs [7.549360351036771]
We present the Sequential Aggregation and Rematerialization (SAR) scheme for distributed full-batch training of Graph Neural Networks (GNNs) on large graphs. SAR is a distributed technique that can train any GNN type directly on an entire large graph. We also present a general technique based on kernel fusion and attention-matrix rematerialization to optimize both the runtime and memory efficiency of attention-based models.
arXiv Detail & Related papers (2021-11-11T22:27:59Z)
Binary Graph Neural Networks [69.51765073772226]
Graph Neural Networks (GNNs) have emerged as a powerful and flexible framework for representation learning on irregular data. In this paper, we present and evaluate different strategies for the binarization of graph neural networks. We show that through careful design of the models, and control of the training process, binary graph neural networks can be trained at only a moderate cost in accuracy on challenging benchmarks.
arXiv Detail & Related papers (2020-12-31T18:48:58Z)
Optimizing Memory Placement using Evolutionary Graph Reinforcement Learning [56.83172249278467]
We introduce Evolutionary Graph Reinforcement Learning (EGRL), a method designed for large search spaces. We train and validate our approach directly on the Intel NNP-I chip for inference. We additionally achieve 28-78% speed-up compared to the native NNP-I compiler on all three workloads.
arXiv Detail & Related papers (2020-07-14T18:50:12Z)
Fast Graph Attention Networks Using Effective Resistance Based Graph Sparsification [70.50751397870972]
FastGAT is a method to make attention based GNNs lightweight by using spectral sparsification to generate an optimal pruning of the input graph. We experimentally evaluate FastGAT on several large real world graph datasets for node classification tasks.
arXiv Detail & Related papers (2020-06-15T22:07:54Z)
Binarized Graph Neural Network [65.20589262811677]
We develop a binarized graph neural network to learn the binary representations of the nodes with binary network parameters. Our proposed method can be seamlessly integrated into the existing GNN-based embedding approaches. Experiments indicate that the proposed binarized graph neural network, namely BGN, is orders of magnitude more efficient in terms of both time and space.
arXiv Detail & Related papers (2020-04-19T09:43:14Z)

This list is automatically generated from the titles and abstracts of the papers in this site.