Related papers: Enabling Low-Latency Machine learning on Radiation-Hard FPGAs with hls4ml

Enabling Low-Latency Machine learning on Radiation-Hard FPGAs with hls4ml

URL: http://arxiv.org/abs/2602.15751v1
Date: Tue, 17 Feb 2026 17:30:28 GMT
Title: Enabling Low-Latency Machine learning on Radiation-Hard FPGAs with hls4ml
Authors: Katya Govorkova, Julian Garcia Pardinas, Vladimir Loncar, Victoria Nguyen, Sebastian Schmitt, Marco Pizzichemi, Loris Martinazzoli, Eluned Anne Smith,
Abstract summary: This paper presents the first demonstration of a viable, ultra-fast, radiation-hard machine learning (ML) application on FPGAs.<n>We develop a lightweight autoencoder to compress a 32-sample timing readout.<n>We show that the model can be reduced to 10-bit weights with minimal performance loss.
Score: 0.6253022837963071
License: http://creativecommons.org/licenses/by/4.0/
Abstract: This paper presents the first demonstration of a viable, ultra-fast, radiation-hard machine learning (ML) application on FPGAs, which could be used in future high-energy physics experiments. We present a three-fold contribution, with the PicoCal calorimeter, planned for the LHCb Upgrade II experiment, used as a test case. First, we develop a lightweight autoencoder to compress a 32-sample timing readout, representative of that of the PicoCal, into a two-dimensional latent space. Second, we introduce a systematic, hardware-aware quantization strategy and show that the model can be reduced to 10-bit weights with minimal performance loss. Third, as a barrier to the adoption of on-detector ML is the lack of support for radiation-hard FPGAs in the High-Energy Physics community's standard ML synthesis tool, hls4ml, we develop a new backend for this library. This new back-end enables the automatic translation of ML models into High-Level Synthesis (HLS) projects for the Microchip PolarFire family of FPGAs, one of the few commercially available and radiation hard FPGAs. We present the synthesis of the autoencoder on a target PolarFire FPGA, which indicates that a latency of 25 ns can be achieved. We show that the resources utilized are low enough that the model can be placed within the inherently protected logic of the FPGA. Our extension to hls4ml is a significant contribution, paving the way for broader adoption of ML on FPGAs in high-radiation environments.

Related papers

Runtime Tunable Tsetlin Machines for Edge Inference on eFPGAs [0.2294388534633318]
eFPGAs allow for the design of hardware accelerators of edge Machine Learning (ML) applications at a lower power budget.<n>The limited eFPGA logic and memory significantly constrain compute capabilities and model size.<n>The proposed eFPGA accelerator focuses on minimizing resource usage and allowing flexibility for on-field recalibration over throughput.
arXiv Detail & Related papers (2025-02-10T12:49:22Z)
Machine Learning Enhanced Quantum State Tomography on FPGA [0.12171407135748794]
Machine learning techniques have opened new avenues for real-time quantum state tomography (QST)<n>We demonstrate the deployment of machine learning-based QST onto edge devices, specifically utilizing field programmable gate arrays (FPGAs)
arXiv Detail & Related papers (2025-01-08T07:59:40Z)
Low Latency Transformer Inference on FPGAs for Physics Applications with hls4ml [2.6892725687961394]
This study presents an efficient implementation of transformer architectures in Field-Programmable Gate Arrays(FPGAs) using hls4ml. Their deployment on VU13P FPGA chip achieved less than 2us, demonstrating the potential for real-time applications.
arXiv Detail & Related papers (2024-09-08T19:50:25Z)
Investigating Resource-efficient Neutron/Gamma Classification ML Models Targeting eFPGAs [0.0]
Open-source embedded FPGA (eFPGA) frameworks provide an alternate, more flexible pathway for implementing machine learning models in hardware. We explore the parameter space for eFPGA implementations of fully-connected neural network (fcNN) and boosted decision tree (BDT) models. The results of the study will be used to aid the specification of an eFPGA fabric, which will be integrated as part of a test chip.
arXiv Detail & Related papers (2024-04-19T20:03:30Z)
Understanding the Potential of FPGA-Based Spatial Acceleration for Large Language Model Inference [11.614722231006695]
Large language models (LLMs) boasting billions of parameters have generated a significant demand for efficient deployment in inference workloads. This paper investigates the feasibility and potential of model-specific spatial acceleration for LLM inference on FPGAs.
arXiv Detail & Related papers (2023-12-23T04:27:06Z)
SATAY: A Streaming Architecture Toolflow for Accelerating YOLO Models on FPGA Devices [48.47320494918925]
This work tackles the challenges of deploying stateof-the-art object detection models onto FPGA devices for ultralow latency applications. We employ a streaming architecture design for our YOLO accelerators, implementing the complete model on-chip in a deeply pipelined fashion. We introduce novel hardware components to support the operations of YOLO models in a dataflow manner, and off-chip memory buffering to address the limited on-chip memory resources.
arXiv Detail & Related papers (2023-09-04T13:15:01Z)
A2Q: Accumulator-Aware Quantization with Guaranteed Overflow Avoidance [49.1574468325115]
accumulator-aware quantization (A2Q) is a novel weight quantization method designed to train quantized neural networks (QNNs) to avoid overflow during inference. A2Q introduces a unique formulation inspired by weight normalization that constrains the L1-norm of model weights according to accumulator bit width bounds. We show A2Q can train QNNs for low-precision accumulators while maintaining model accuracy competitive with a floating-point baseline.
arXiv Detail & Related papers (2023-08-25T17:28:58Z)
HARFLOW3D: A Latency-Oriented 3D-CNN Accelerator Toolflow for HAR on FPGA Devices [71.45672882756001]
This study introduces a novel streaming architecture based toolflow for mapping 3D Convolutional Neural Networks onto FPGAs. The HARFLOW3D toolflow takes as input a 3D CNN in ONNX format and a description of the FPGA characteristics. The ability of the toolflow to support a broad range of models and devices is shown through a number of experiments on various 3D CNN and FPGA system pairs.
arXiv Detail & Related papers (2023-03-30T08:25:27Z)
LL-GNN: Low Latency Graph Neural Networks on FPGAs for High Energy Physics [45.666822327616046]
This work presents a novel reconfigurable architecture for Low Graph Neural Network (LL-GNN) designs for particle detectors. The LL-GNN design advances the next generation of trigger systems by enabling sophisticated algorithms to process experimental data efficiently.
arXiv Detail & Related papers (2022-09-28T12:55:35Z)
A TinyML Platform for On-Device Continual Learning with Quantized Latent Replays [66.62377866022221]
Latent Replay-based Continual Learning (CL) techniques enable online, serverless adaptation in principle. We introduce a HW/SW platform for end-to-end CL based on a 10-core FP32-enabled parallel ultra-low-power processor. Our results show that by combining these techniques, continual learning can be achieved in practice using less than 64MB of memory.
arXiv Detail & Related papers (2021-10-20T11:01:23Z)
Gaze-Sensing LEDs for Head Mounted Displays [73.88424800314634]
We exploit the sensing capability of LEDs to create low-power gaze tracker for virtual reality (VR) applications. We show that our gaze estimation method does not require complex dimension reduction techniques.
arXiv Detail & Related papers (2020-03-18T23:03:06Z)

This list is automatically generated from the titles and abstracts of the papers in this site.