Model-Architecture Co-Design for High Performance Temporal GNN Inference
on FPGA
- URL: http://arxiv.org/abs/2203.05095v1
- Date: Thu, 10 Mar 2022 00:24:47 GMT
- Title: Model-Architecture Co-Design for High Performance Temporal GNN Inference
on FPGA
- Authors: Hongkuan Zhou, Bingyi Zhang, Rajgopal Kannan, Viktor Prasanna, Carl
Busart
- Abstract summary: Real-world applications require high performance inference on real-time streaming dynamic graphs.
We present a novel model-architecture co-design for inference in memory-based TGNNs on FPGAs.
We train our simplified models using knowledge distillation to ensure similar accuracy vis-'a-vis the original model.
- Score: 5.575293536755127
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Temporal Graph Neural Networks (TGNNs) are powerful models to capture
temporal, structural, and contextual information on temporal graphs. The
generated temporal node embeddings outperform other methods in many downstream
tasks. Real-world applications require high performance inference on real-time
streaming dynamic graphs. However, these models usually rely on complex
attention mechanisms to capture relationships between temporal neighbors. In
addition, maintaining vertex memory suffers from intrinsic temporal data
dependency that hinders task-level parallelism, making it inefficient on
general-purpose processors. In this work, we present a novel model-architecture
co-design for inference in memory-based TGNNs on FPGAs. The key modeling
optimizations we propose include a light-weight method to compute attention
scores and a related temporal neighbor pruning strategy to further reduce
computation and memory accesses. These are holistically coupled with key
hardware optimizations that leverage FPGA hardware. We replace the temporal
sampler with an on-chip FIFO based hardware sampler and the time encoder with a
look-up-table. We train our simplified models using knowledge distillation to
ensure similar accuracy vis-\'a-vis the original model. Taking advantage of the
model optimizations, we propose a principled hardware architecture using
batching, pipelining, and prefetching techniques to further improve the
performance. We also propose a hardware mechanism to ensure the chronological
vertex updating without sacrificing the computation parallelism. We evaluate
the performance of the proposed hardware accelerator on three real-world
datasets.
Related papers
- Spatiotemporal Forecasting Meets Efficiency: Causal Graph Process Neural Networks [5.703629317205571]
Causal Graph Graph Processes (CGPs) offer an alternative, using graph filters instead of relational field layers to reduce parameters and minimize memory consumption.
This paper introduces a non-linear model combining CGPs and GNNs fortemporal forecasting. CGProNet employs higher-order graph filters, optimizing the model with fewer parameters, reducing memory usage, and improving runtime efficiency.
arXiv Detail & Related papers (2024-05-29T08:37:48Z) - TCCT-Net: Two-Stream Network Architecture for Fast and Efficient Engagement Estimation via Behavioral Feature Signals [58.865901821451295]
We present a novel two-stream feature fusion "Tensor-Convolution and Convolution-Transformer Network" (TCCT-Net) architecture.
To better learn the meaningful patterns in the temporal-spatial domain, we design a "CT" stream that integrates a hybrid convolutional-transformer.
In parallel, to efficiently extract rich patterns from the temporal-frequency domain, we introduce a "TC" stream that uses Continuous Wavelet Transform (CWT) to represent information in a 2D tensor form.
arXiv Detail & Related papers (2024-04-15T06:01:48Z) - TimeGraphs: Graph-based Temporal Reasoning [64.18083371645956]
TimeGraphs is a novel approach that characterizes dynamic interactions as a hierarchical temporal graph.
Our approach models the interactions using a compact graph-based representation, enabling adaptive reasoning across diverse time scales.
We evaluate TimeGraphs on multiple datasets with complex, dynamic agent interactions, including a football simulator, the Resistance game, and the MOMA human activity dataset.
arXiv Detail & Related papers (2024-01-06T06:26:49Z) - Hierarchical Graph Pattern Understanding for Zero-Shot VOS [102.21052200245457]
This paper proposes a new hierarchical graph neural network (GNN) architecture for zero-shot video object segmentation (ZS-VOS)
Inspired by the strong ability of GNNs in capturing structural relations, HGPU innovatively leverages motion cues (ie, optical flow) to enhance the high-order representations from the neighbors of target frames.
arXiv Detail & Related papers (2023-12-15T04:13:21Z) - SPEED: Streaming Partition and Parallel Acceleration for Temporal
Interaction Graph Embedding [22.68416593780539]
We introduce a novel training approach namely Streaming Edge Partitioning and Parallel Acceleration for Temporal Interaction Graph Embedding.
Our method can achieve a good balance in computing resources, computing time, and downstream task performance.
Empirical validation across 7 real-world datasets demonstrates the potential to expedite training speeds by a factor of up to 19.29x.
arXiv Detail & Related papers (2023-08-27T15:11:44Z) - ParaGraph: Weighted Graph Representation for Performance Optimization of
HPC Kernels [1.304892050913381]
We introduce a new graph-based program representation for parallel applications that extends the Abstract Syntax Tree.
We evaluate our proposed representation by training a Graph Neural Network (GNN) to predict the runtime of an OpenMP code region.
Results show that our approach is indeed effective and has normalized RMSE as low as 0.004 to at most 0.01 in its runtime predictions.
arXiv Detail & Related papers (2023-04-07T05:52:59Z) - Energy-efficient Task Adaptation for NLP Edge Inference Leveraging
Heterogeneous Memory Architectures [68.91874045918112]
adapter-ALBERT is an efficient model optimization for maximal data reuse across different tasks.
We demonstrate the advantage of mapping the model to a heterogeneous on-chip memory architecture by performing simulations on a validated NLP edge accelerator.
arXiv Detail & Related papers (2023-03-25T14:40:59Z) - Dynamic Graph Message Passing Networks for Visual Recognition [112.49513303433606]
Modelling long-range dependencies is critical for scene understanding tasks in computer vision.
A fully-connected graph is beneficial for such modelling, but its computational overhead is prohibitive.
We propose a dynamic graph message passing network, that significantly reduces the computational complexity.
arXiv Detail & Related papers (2022-09-20T14:41:37Z) - Fast Graph Attention Networks Using Effective Resistance Based Graph
Sparsification [70.50751397870972]
FastGAT is a method to make attention based GNNs lightweight by using spectral sparsification to generate an optimal pruning of the input graph.
We experimentally evaluate FastGAT on several large real world graph datasets for node classification tasks.
arXiv Detail & Related papers (2020-06-15T22:07:54Z) - GraphACT: Accelerating GCN Training on CPU-FPGA Heterogeneous Platforms [1.2183405753834562]
Graph Convolutional Networks (GCNs) have emerged as the state-of-the-art deep learning model for representation learning on graphs.
It is challenging to accelerate training of GCNs due to substantial and irregular data communication.
We design a novel accelerator for training GCNs on CPU-FPGA heterogeneous systems.
arXiv Detail & Related papers (2019-12-31T21:19:01Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.