Related papers: Sub-microsecond Transformers for Jet Tagging on FPGAs

Sub-microsecond Transformers for Jet Tagging on FPGAs

URL: http://arxiv.org/abs/2510.24784v1
Date: Sun, 26 Oct 2025 23:13:00 GMT
Title: Sub-microsecond Transformers for Jet Tagging on FPGAs
Authors: Lauri Laatu, Chang Sun, Arianna Cox, Abhijith Gandrakota, Benedikt Maier, Jennifer Ngadiuba, Zhiqiang Que, Wayne Luk, Maria Spiropulu, Alexander Tapper,
Abstract summary: We present the first sub-microsecond transformer implementation on an FPGA achieving competitive performance for state-of-the-art high-energy physics benchmarks.<n>Transformers have shown exceptional performance on multiple tasks in modern machine learning applications, including jet tagging at the CERN Large Hadron Collider (LHC)<n>This work advances the next-generation trigger systems for the High Luminosity LHC, enabling the use of transformers for real-time applications in high-energy physics and beyond.
Score: 36.414144954711865
License: http://creativecommons.org/licenses/by/4.0/
Abstract: We present the first sub-microsecond transformer implementation on an FPGA achieving competitive performance for state-of-the-art high-energy physics benchmarks. Transformers have shown exceptional performance on multiple tasks in modern machine learning applications, including jet tagging at the CERN Large Hadron Collider (LHC). However, their computational complexity prohibits use in real-time applications, such as the hardware trigger system of the collider experiments up until now. In this work, we demonstrate the first application of transformers for jet tagging on FPGAs, achieving $\mathcal{O}(100)$ nanosecond latency with superior performance compared to alternative baseline models. We leverage high-granularity quantization and distributed arithmetic optimization to fit the entire transformer model on a single FPGA, achieving the required throughput and latency. Furthermore, we add multi-head attention and linear attention support to hls4ml, making our work accessible to the broader fast machine learning community. This work advances the next-generation trigger systems for the High Luminosity LHC, enabling the use of transformers for real-time applications in high-energy physics and beyond.

Related papers

TrackCore-F: Deploying Transformer-Based Subatomic Particle Tracking on FPGAs [0.2851702684899107]
We aim to develop tools for monolithic, or partitioned Transformer synthesis, specifically targeting inference.<n>Our primary use-case involves two machine learning model designs for tracking, derived from the TrackFormers project.
arXiv Detail & Related papers (2025-09-30T14:44:43Z)
Fast attention mechanisms: a tale of parallelism [52.7657529272906]
We introduce an efficient attention mechanism called Approximate Nearest Neighbor Attention (ANNA) with sub-quadratic time complexity.<n>We prove that ANNA-transformers retain the expressive power previously established for standard attention in terms of matching the capabilities of MPC algorithms.
arXiv Detail & Related papers (2025-09-10T20:59:44Z)
JEDI-linear: Fast and Efficient Graph Neural Networks for Jet Tagging on FPGAs [36.158374493924455]
Graph Neural Networks (GNNs) have shown exceptional performance for jet tagging at the CERN High Luminosity Large Hadron Collider (HLLHC)<n>We propose JEDI-linear, a novel GNN architecture with linear computational complexity that eliminates explicit pairwise interactions.<n>This is the first interaction-based GNN to achieve less than 60ns latency and currently meets the requirements for use in the HL-LHC CMS Level-1 trigger system.
arXiv Detail & Related papers (2025-08-21T11:40:49Z)
Design and Implementation of an FPGA-Based Hardware Accelerator for Transformer [0.0]
Transformer-based large language models rely heavily on matrix multiplications for attention and feed-forward layers.<n>We introduce a highly optimized tiled matrix multiplication accelerator on a resource-constrained Xilinx KV260 FPGA.<n>Our design exploits persistent on-chip storage, a robust two-level tiling strategy for maximal data reuse, and a systolic-like unrolled compute engine.
arXiv Detail & Related papers (2025-03-20T22:15:42Z)
Low Latency Transformer Inference on FPGAs for Physics Applications with hls4ml [2.6892725687961394]
This study presents an efficient implementation of transformer architectures in Field-Programmable Gate Arrays(FPGAs) using hls4ml. Their deployment on VU13P FPGA chip achieved less than 2us, demonstrating the potential for real-time applications.
arXiv Detail & Related papers (2024-09-08T19:50:25Z)
Ultra Fast Transformers on FPGAs for Particle Physics Experiments [2.666074491398626]
This work introduces a highly efficient implementation of the transformer architecture on a Field-Programmable Gate Array (FPGA) We have implemented critical components of a transformer model, such as multi-head attention and softmax layers. We recorded latency under 2 $mu$s on the Xilinx UltraScale+ FPGA, which is compatible with hardware trigger requirements at CERN.
arXiv Detail & Related papers (2024-02-01T22:32:39Z)
End-to-end codesign of Hessian-aware quantized neural networks for FPGAs and ASICs [49.358119307844035]
We develop an end-to-end workflow for the training and implementation of co-designed neural networks (NNs) This makes efficient NN implementations in hardware accessible to nonexperts, in a single open-sourced workflow. We demonstrate the workflow in a particle physics application involving trigger decisions that must operate at the 40 MHz collision rate of the Large Hadron Collider (LHC) We implement an optimized mixed-precision NN for high-momentum particle jets in simulated LHC proton-proton collisions.
arXiv Detail & Related papers (2023-04-13T18:00:01Z)
LL-GNN: Low Latency Graph Neural Networks on FPGAs for High Energy Physics [45.666822327616046]
This work presents a novel reconfigurable architecture for Low Graph Neural Network (LL-GNN) designs for particle detectors. The LL-GNN design advances the next generation of trigger systems by enabling sophisticated algorithms to process experimental data efficiently.
arXiv Detail & Related papers (2022-09-28T12:55:35Z)
Auto-ViT-Acc: An FPGA-Aware Automatic Acceleration Framework for Vision Transformer with Mixed-Scheme Quantization [78.18328503396057]
Vision transformers (ViTs) are emerging with significantly improved accuracy in computer vision tasks. This work proposes an FPGA-aware automatic ViT acceleration framework based on the proposed mixed-scheme quantization.
arXiv Detail & Related papers (2022-08-10T05:54:46Z)
VAQF: Fully Automatic Software-hardware Co-design Framework for Low-bit Vision Transformer [121.85581713299918]
We propose VAQF, a framework that builds inference accelerators on FPGA platforms for quantized Vision Transformers (ViTs) Given the model structure and the desired frame rate, VAQF will automatically output the required quantization precision for activations. This is the first time quantization has been incorporated into ViT acceleration on FPGAs.
arXiv Detail & Related papers (2022-01-17T20:27:52Z)

This list is automatically generated from the titles and abstracts of the papers in this site.