LL-GNN: Low Latency Graph Neural Networks on FPGAs for High Energy
Physics
- URL: http://arxiv.org/abs/2209.14065v5
- Date: Tue, 9 Jan 2024 10:05:38 GMT
- Title: LL-GNN: Low Latency Graph Neural Networks on FPGAs for High Energy
Physics
- Authors: Zhiqiang Que, Hongxiang Fan, Marcus Loo, He Li, Michaela Blott,
Maurizio Pierini, Alexander Tapper and Wayne Luk
- Abstract summary: This work presents a novel reconfigurable architecture for Low Graph Neural Network (LL-GNN) designs for particle detectors.
The LL-GNN design advances the next generation of trigger systems by enabling sophisticated algorithms to process experimental data efficiently.
- Score: 45.666822327616046
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: This work presents a novel reconfigurable architecture for Low Latency Graph
Neural Network (LL-GNN) designs for particle detectors, delivering
unprecedented low latency performance. Incorporating FPGA-based GNNs into
particle detectors presents a unique challenge since it requires
sub-microsecond latency to deploy the networks for online event selection with
a data rate of hundreds of terabytes per second in the Level-1 triggers at the
CERN Large Hadron Collider experiments. This paper proposes a novel
outer-product based matrix multiplication approach, which is enhanced by
exploiting the structured adjacency matrix and a column-major data layout.
Moreover, a fusion step is introduced to further reduce the end-to-end design
latency by eliminating unnecessary boundaries. Furthermore, a GNN-specific
algorithm-hardware co-design approach is presented which not only finds a
design with a much better latency but also finds a high accuracy design under
given latency constraints. To facilitate this, a customizable template for this
low latency GNN hardware architecture has been designed and open-sourced, which
enables the generation of low-latency FPGA designs with efficient resource
utilization using a high-level synthesis tool. Evaluation results show that our
FPGA implementation is up to 9.0 times faster and achieves up to 13.1 times
higher power efficiency than a GPU implementation. Compared to the previous
FPGA implementations, this work achieves 6.51 to 16.7 times lower latency.
Moreover, the latency of our FPGA design is sufficiently low to enable
deployment of GNNs in a sub-microsecond, real-time collider trigger system,
enabling it to benefit from improved accuracy. The proposed LL-GNN design
advances the next generation of trigger systems by enabling sophisticated
algorithms to process experimental data efficiently.
Related papers
- Hardware-Software Co-optimised Fast and Accurate Deep Reconfigurable Spiking Inference Accelerator Architecture Design Methodology [2.968768532937366]
Spiking Neural Networks (SNNs) have emerged as a promising approach to improve the energy efficiency of machine learning models.
We develop a hardware-software co-optimisation strategy to port software-trained deep neural networks (DNN) to reduced-precision spiking models.
arXiv Detail & Related papers (2024-10-07T05:04:13Z) - Quasar-ViT: Hardware-Oriented Quantization-Aware Architecture Search for Vision Transformers [56.37495946212932]
Vision transformers (ViTs) have demonstrated their superior accuracy for computer vision tasks compared to convolutional neural networks (CNNs)
This work proposes Quasar-ViT, a hardware-oriented quantization-aware architecture search framework for ViTs.
arXiv Detail & Related papers (2024-07-25T16:35:46Z) - Embedded Graph Convolutional Networks for Real-Time Event Data Processing on SoC FPGAs [0.815557531820863]
Event cameras find significant relevance for their integration into embedded real-time systems.
One effective approach to ensure the necessary throughput and latency for event processing systems is through the utilisation of graph convolutional networks (GCNs)
We introduce a series of hardware-aware optimisations tailored for PointNet++, a GCN architecture designed for point cloud processing.
arXiv Detail & Related papers (2024-06-11T14:47:36Z) - End-to-end codesign of Hessian-aware quantized neural networks for FPGAs
and ASICs [49.358119307844035]
We develop an end-to-end workflow for the training and implementation of co-designed neural networks (NNs)
This makes efficient NN implementations in hardware accessible to nonexperts, in a single open-sourced workflow.
We demonstrate the workflow in a particle physics application involving trigger decisions that must operate at the 40 MHz collision rate of the Large Hadron Collider (LHC)
We implement an optimized mixed-precision NN for high-momentum particle jets in simulated LHC proton-proton collisions.
arXiv Detail & Related papers (2023-04-13T18:00:01Z) - HARFLOW3D: A Latency-Oriented 3D-CNN Accelerator Toolflow for HAR on
FPGA Devices [71.45672882756001]
This study introduces a novel streaming architecture based toolflow for mapping 3D Convolutional Neural Networks onto FPGAs.
The HARFLOW3D toolflow takes as input a 3D CNN in ONNX format and a description of the FPGA characteristics.
The ability of the toolflow to support a broad range of models and devices is shown through a number of experiments on various 3D CNN and FPGA system pairs.
arXiv Detail & Related papers (2023-03-30T08:25:27Z) - FPGA-optimized Hardware acceleration for Spiking Neural Networks [69.49429223251178]
This work presents the development of a hardware accelerator for an SNN, with off-line training, applied to an image recognition task.
The design targets a Xilinx Artix-7 FPGA, using in total around the 40% of the available hardware resources.
It reduces the classification time by three orders of magnitude, with a small 4.5% impact on the accuracy, if compared to its software, full precision counterpart.
arXiv Detail & Related papers (2022-01-18T13:59:22Z) - High-Performance FPGA-based Accelerator for Bayesian Recurrent Neural
Networks [2.0631735969348064]
We propose an FPGA-based hardware design to accelerate Bayesian LSTM-based RNNs.
Compared with GPU implementation, our FPGA-based design can achieve up to 10 times speedup with nearly 106 times higher energy efficiency.
arXiv Detail & Related papers (2021-06-04T14:30:39Z) - NullaNet Tiny: Ultra-low-latency DNN Inference Through Fixed-function
Combinational Logic [4.119948826527649]
Field-programmable gate array (FPGA)-based accelerators are gaining traction as a serious contender to replace graphics processing unit/central processing unit-based platforms.
This paper presents NullaNet Tiny, a framework for constructing resource and energy-efficient, ultra-low-latency FPGA-based neural network accelerators.
arXiv Detail & Related papers (2021-04-07T00:16:39Z) - EdgeBERT: Sentence-Level Energy Optimizations for Latency-Aware
Multi-Task NLP Inference [82.1584439276834]
Transformer-based language models such as BERT provide significant accuracy improvement for a multitude of natural language processing (NLP) tasks.
We present EdgeBERT, an in-depth algorithm- hardware co-design for latency-aware energy optimization for multi-task NLP.
arXiv Detail & Related papers (2020-11-28T19:21:47Z) - Distance-Weighted Graph Neural Networks on FPGAs for Real-Time Particle
Reconstruction in High Energy Physics [11.125632758828266]
We discuss how to design distance-weighted graph networks that can be executed with a latency of less than 1$mumathrms$ on an FPGA.
We consider a representative task associated to particle reconstruction and identification in a next-generation calorimeter operating at a particle collider.
We convert the compressed models into firmware to be implemented on an FPGA.
arXiv Detail & Related papers (2020-08-08T21:26:31Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.