Related papers: Hector: An Efficient Programming and Compilation Framework for Implementing Relational Graph Neural Networks in GPU Architectures

Hector: An Efficient Programming and Compilation Framework for Implementing Relational Graph Neural Networks in GPU Architectures

URL: http://arxiv.org/abs/2301.06284v3
Date: Tue, 9 Apr 2024 04:17:48 GMT
Title: Hector: An Efficient Programming and Compilation Framework for Implementing Relational Graph Neural Networks in GPU Architectures
Authors: Kun Wu, Mert Hidayetoğlu, Xiang Song, Sitao Huang, Da Zheng, Israt Nisa, Wen-mei Hwu,
Abstract summary: RGNNs are graph neural networks with dedicated structures for modeling the different types of nodes and edges in heterogeneous graphs. We propose Hector, a novel two-level intermediate representation and its code generator framework, to capture the key properties of RGNN models. Hector achieves up to 9.9x speed-up in inference and 43.7x speed-up in training compared with the state-of-the-art public systems.
Score: 24.841128441671234
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Relational graph neural networks (RGNNs) are graph neural networks with dedicated structures for modeling the different types of nodes and edges in heterogeneous graphs. While RGNNs have been increasingly adopted in many real-world applications due to their versatility and accuracy, they pose performance and system design challenges: inherent memory-intensive computation patterns, the gap between the programming interface and kernel APIs, and heavy programming effort in optimizing kernels caused by their coupling with data layout and heterogeneity. To systematically address these challenges, we propose Hector, a novel two-level intermediate representation and its code generator framework, that (a) captures the key properties of RGNN models, and opportunities to reduce memory accesses in inter-operator scheduling and materialization, (b) generates code with flexible data access scheme to eliminate redundant data copies, (c) decouples model semantics, data layout, and operators-specific optimization from each other to reduce programming effort. By building on one general matrix multiply (GEMM) template and a node/edge traversal template, Hector achieves up to 9.9x speed-up in inference and 43.7x speed-up in training compared with the state-of-the-art public systems on select models, i.e., RGCN, RGAT and HGT, when running heterogeneous graphs provided by Deep Graph Library (DGL) and Open Graph Benchmark (OGB). In addition, Hector does not trigger any out-of-memory (OOM) exception in these tests. We also propose the linear operator reorder and compact materialization to further accelerate the system by up to 3.8x. As an indicator of programming effort reduction, Hector takes in 51 lines of code expressing the three models and generates a total of 8K lines of CUDA and C++ code.

Related papers

Distributed Graph Neural Network Inference With Just-In-Time Compilation For Industry-Scale Graphs [6.924892368183222]
Graph neural networks (GNNs) have delivered remarkable results in various fields.<n>The rapid increase in the scale of graph data has introduced significant performance bottlenecks for GNN inference.<n>This paper introduces an innovative processing paradgim for distributed graph learning that abstracts GNNs with a new set of programming interfaces.
arXiv Detail & Related papers (2025-03-08T13:26:59Z)
OMEGA: A Low-Latency GNN Serving System for Large Graphs [8.51634655687174]
Graph Neural Networks (GNNs) have been widely adopted for their ability to compute expressive node representations in graph datasets. Existing approximation techniques in training can mitigate the overheads but, in serving, still lead to high latency and/or accuracy loss. We propose OMEGA, a system that enables low-latency GNN serving for large graphs with minimal accuracy loss.
arXiv Detail & Related papers (2025-01-15T03:14:18Z)
LASE: Learned Adjacency Spectral Embeddings [7.612218105739107]
We learn nodal Adjacency Spectral Embeddings (ASE) from graph inputs. LASE is interpretable, parameter efficient, robust to inputs with unobserved edges. LASE layers combine Graph Convolutional Network (GCN) and fully-connected Graph Attention Network (GAT) modules.
arXiv Detail & Related papers (2024-12-23T17:35:19Z)
CONCORD: Towards a DSL for Configurable Graph Code Representation [3.756550107432323]
We introduce CONCORD, a domain-specific language to build customizable graph representations. We demonstrate its effectiveness in code smell detection as an illustrative use case. ConCORD will help researchers create and experiment with customizable graph-based code representations.
arXiv Detail & Related papers (2024-01-31T16:16:48Z)
T-GAE: Transferable Graph Autoencoder for Network Alignment [79.89704126746204]
T-GAE is a graph autoencoder framework that leverages transferability and stability of GNNs to achieve efficient network alignment without retraining. Our experiments demonstrate that T-GAE outperforms the state-of-the-art optimization method and the best GNN approach by up to 38.7% and 50.8%, respectively.
arXiv Detail & Related papers (2023-10-05T02:58:29Z)
Connectivity Optimized Nested Graph Networks for Crystal Structures [1.1470070927586016]
Graph neural networks (GNNs) have been applied to a large variety of applications in materials science and chemistry. We show that our suggested models systematically improve state-of-the-art results across all tasks within the MatBench benchmark.
arXiv Detail & Related papers (2023-02-27T19:26:48Z)
Scalable Graph Convolutional Network Training on Distributed-Memory Systems [5.169989177779801]
Graph Convolutional Networks (GCNs) are extensively utilized for deep learning on graphs. Since the convolution operation on graphs induces irregular memory access patterns, designing a memory- and communication-efficient parallel algorithm for GCN training poses unique challenges. We propose a highly parallel training algorithm that scales to large processor counts.
arXiv Detail & Related papers (2022-12-09T17:51:13Z)
A Comprehensive Study on Large-Scale Graph Training: Benchmarking and Rethinking [124.21408098724551]
Large-scale graph training is a notoriously challenging problem for graph neural networks (GNNs) We present a new ensembling training manner, named EnGCN, to address the existing issues. Our proposed method has achieved new state-of-the-art (SOTA) performance on large-scale datasets.
arXiv Detail & Related papers (2022-10-14T03:43:05Z)
Comprehensive Graph Gradual Pruning for Sparse Training in Graph Neural Networks [52.566735716983956]
We propose a graph gradual pruning framework termed CGP to dynamically prune GNNs. Unlike LTH-based methods, the proposed CGP approach requires no re-training, which significantly reduces the computation costs. Our proposed strategy greatly improves both training and inference efficiency while matching or even exceeding the accuracy of existing methods.
arXiv Detail & Related papers (2022-07-18T14:23:31Z)
End-to-end Mapping in Heterogeneous Systems Using Graph Representation Learning [13.810753108848582]
We propose a unified, end-to-end, programmable graph representation learning framework. It is capable of mining the complexity of high-level programs down to the universal intermediate representation, extracting the specific computational patterns and predicting which code segments would run best on a specific core. In the evaluation, we demonstrate a maximum speedup of 6.42x compared to the thread-based execution, and 2.02x compared to the state-of-the-art technique.
arXiv Detail & Related papers (2022-04-25T22:13:13Z)
A Robust and Generalized Framework for Adversarial Graph Embedding [73.37228022428663]
We propose a robust framework for adversarial graph embedding, named AGE. AGE generates the fake neighbor nodes as the enhanced negative samples from the implicit distribution. Based on this framework, we propose three models to handle three types of graph data.
arXiv Detail & Related papers (2021-05-22T07:05:48Z)
Binary Graph Neural Networks [69.51765073772226]
Graph Neural Networks (GNNs) have emerged as a powerful and flexible framework for representation learning on irregular data. In this paper, we present and evaluate different strategies for the binarization of graph neural networks. We show that through careful design of the models, and control of the training process, binary graph neural networks can be trained at only a moderate cost in accuracy on challenging benchmarks.
arXiv Detail & Related papers (2020-12-31T18:48:58Z)
Ramanujan Bipartite Graph Products for Efficient Block Sparse Neural Networks [2.4235475271758076]
We propose framework for generating structured multi level block sparse neural networks by using the theory of Graph products. We also propose to use products of Ramanujan graphs which gives the best connectivity for a given level of sparsity. We benchmark our approach by experimenting on image classification task over CIFAR dataset using VGG19 and WideResnet-40-4 networks.
arXiv Detail & Related papers (2020-06-24T05:08:17Z)
Embedding Graph Auto-Encoder for Graph Clustering [90.8576971748142]
Graph auto-encoder (GAE) models are based on semi-supervised graph convolution networks (GCN) We design a specific GAE-based model for graph clustering to be consistent with the theory, namely Embedding Graph Auto-Encoder (EGAE) EGAE consists of one encoder and dual decoders.
arXiv Detail & Related papers (2020-02-20T09:53:28Z)

This list is automatically generated from the titles and abstracts of the papers in this site.