Model-Architecture Co-Design for High Performance Temporal GNN Inference
on FPGA
- URL: http://arxiv.org/abs/2203.05095v1
- Date: Thu, 10 Mar 2022 00:24:47 GMT
- Title: Model-Architecture Co-Design for High Performance Temporal GNN Inference
on FPGA
- Authors: Hongkuan Zhou, Bingyi Zhang, Rajgopal Kannan, Viktor Prasanna, Carl
Busart
- Abstract summary: Real-world applications require high performance inference on real-time streaming dynamic graphs.
We present a novel model-architecture co-design for inference in memory-based TGNNs on FPGAs.
We train our simplified models using knowledge distillation to ensure similar accuracy vis-'a-vis the original model.
- Score: 5.575293536755127
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Temporal Graph Neural Networks (TGNNs) are powerful models to capture
temporal, structural, and contextual information on temporal graphs. The
generated temporal node embeddings outperform other methods in many downstream
tasks. Real-world applications require high performance inference on real-time
streaming dynamic graphs. However, these models usually rely on complex
attention mechanisms to capture relationships between temporal neighbors. In
addition, maintaining vertex memory suffers from intrinsic temporal data
dependency that hinders task-level parallelism, making it inefficient on
general-purpose processors. In this work, we present a novel model-architecture
co-design for inference in memory-based TGNNs on FPGAs. The key modeling
optimizations we propose include a light-weight method to compute attention
scores and a related temporal neighbor pruning strategy to further reduce
computation and memory accesses. These are holistically coupled with key
hardware optimizations that leverage FPGA hardware. We replace the temporal
sampler with an on-chip FIFO based hardware sampler and the time encoder with a
look-up-table. We train our simplified models using knowledge distillation to
ensure similar accuracy vis-\'a-vis the original model. Taking advantage of the
model optimizations, we propose a principled hardware architecture using
batching, pipelining, and prefetching techniques to further improve the
performance. We also propose a hardware mechanism to ensure the chronological
vertex updating without sacrificing the computation parallelism. We evaluate
the performance of the proposed hardware accelerator on three real-world
datasets.
Related papers
- Task-Oriented Real-time Visual Inference for IoVT Systems: A Co-design Framework of Neural Networks and Edge Deployment [61.20689382879937]
Task-oriented edge computing addresses this by shifting data analysis to the edge.
Existing methods struggle to balance high model performance with low resource consumption.
We propose a novel co-design framework to optimize neural network architecture.
arXiv Detail & Related papers (2024-10-29T19:02:54Z) - Compressing Recurrent Neural Networks for FPGA-accelerated Implementation in Fluorescence Lifetime Imaging [3.502427552446068]
Deep learning models enable real-time inference, but can be computationally demanding due to complex architectures and large matrix operations.
This makes DL models ill-suited for direct implementation on field-programmable gate array (FPGA)-based camera hardware.
In this work, we focus on compressing recurrent neural networks (RNNs), which are well-suited for FLI time-series data processing, to enable deployment on resource-constrained FPGA boards.
arXiv Detail & Related papers (2024-10-01T17:23:26Z) - POMONAG: Pareto-Optimal Many-Objective Neural Architecture Generator [4.09225917049674]
Transferable NAS has emerged, generalizing the search process from dataset-dependent to task-dependent.
This paper introduces POMONAG, extending DiffusionNAG via a many-optimal diffusion process.
Results were validated on two search spaces -- NAS201 and MobileNetV3 -- and evaluated across 15 image classification datasets.
arXiv Detail & Related papers (2024-09-30T16:05:29Z) - TCCT-Net: Two-Stream Network Architecture for Fast and Efficient Engagement Estimation via Behavioral Feature Signals [58.865901821451295]
We present a novel two-stream feature fusion "Tensor-Convolution and Convolution-Transformer Network" (TCCT-Net) architecture.
To better learn the meaningful patterns in the temporal-spatial domain, we design a "CT" stream that integrates a hybrid convolutional-transformer.
In parallel, to efficiently extract rich patterns from the temporal-frequency domain, we introduce a "TC" stream that uses Continuous Wavelet Transform (CWT) to represent information in a 2D tensor form.
arXiv Detail & Related papers (2024-04-15T06:01:48Z) - TimeGraphs: Graph-based Temporal Reasoning [64.18083371645956]
TimeGraphs is a novel approach that characterizes dynamic interactions as a hierarchical temporal graph.
Our approach models the interactions using a compact graph-based representation, enabling adaptive reasoning across diverse time scales.
We evaluate TimeGraphs on multiple datasets with complex, dynamic agent interactions, including a football simulator, the Resistance game, and the MOMA human activity dataset.
arXiv Detail & Related papers (2024-01-06T06:26:49Z) - Hierarchical Graph Pattern Understanding for Zero-Shot VOS [102.21052200245457]
This paper proposes a new hierarchical graph neural network (GNN) architecture for zero-shot video object segmentation (ZS-VOS)
Inspired by the strong ability of GNNs in capturing structural relations, HGPU innovatively leverages motion cues (ie, optical flow) to enhance the high-order representations from the neighbors of target frames.
arXiv Detail & Related papers (2023-12-15T04:13:21Z) - Mending of Spatio-Temporal Dependencies in Block Adjacency Matrix [3.529869282529924]
We propose a novel end-to-end learning architecture designed to mend the temporal dependencies, resulting in a well-connected graph.
Our methodology demonstrates superior performance on benchmark datasets, such as SurgVisDom and C2D2.
arXiv Detail & Related papers (2023-10-04T06:42:33Z) - Energy-efficient Task Adaptation for NLP Edge Inference Leveraging
Heterogeneous Memory Architectures [68.91874045918112]
adapter-ALBERT is an efficient model optimization for maximal data reuse across different tasks.
We demonstrate the advantage of mapping the model to a heterogeneous on-chip memory architecture by performing simulations on a validated NLP edge accelerator.
arXiv Detail & Related papers (2023-03-25T14:40:59Z) - Dynamic Graph Message Passing Networks for Visual Recognition [112.49513303433606]
Modelling long-range dependencies is critical for scene understanding tasks in computer vision.
A fully-connected graph is beneficial for such modelling, but its computational overhead is prohibitive.
We propose a dynamic graph message passing network, that significantly reduces the computational complexity.
arXiv Detail & Related papers (2022-09-20T14:41:37Z) - GraphACT: Accelerating GCN Training on CPU-FPGA Heterogeneous Platforms [1.2183405753834562]
Graph Convolutional Networks (GCNs) have emerged as the state-of-the-art deep learning model for representation learning on graphs.
It is challenging to accelerate training of GCNs due to substantial and irregular data communication.
We design a novel accelerator for training GCNs on CPU-FPGA heterogeneous systems.
arXiv Detail & Related papers (2019-12-31T21:19:01Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.