Related papers: Position: The Need for Ultrafast Training

Position: The Need for Ultrafast Training

URL: http://arxiv.org/abs/2602.02005v1
Date: Mon, 02 Feb 2026 12:04:11 GMT
Title: Position: The Need for Ultrafast Training
Authors: Duc Hoang,
Abstract summary: Domain-specialized FPGAs have delivered unprecedented performance for low-latency inference across scientific and industrial workloads.<n>I argue for a shift from inference-only accelerators to ultrafast on-chip learning, in which both inference and training execute directly within the FPGA fabric.
Score: 2.049249624501703
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Domain-specialized FPGAs have delivered unprecedented performance for low-latency inference across scientific and industrial workloads, yet nearly all existing accelerators assume static models trained offline, relegating learning and adaptation to slower CPUs or GPUs. This separation fundamentally limits systems that must operate in non-stationary, high-frequency environments, where model updates must occur at the timescale of the underlying physics. In this paper, I argue for a shift from inference-only accelerators to ultrafast on-chip learning, in which both inference and training execute directly within the FPGA fabric under deterministic, sub-microsecond latency constraints. Bringing learning into the same real-time datapath as inference would enable closed-loop systems that adapt as fast as the physical processes they control, with applications spanning quantum error correction, cryogenic qubit calibration, plasma and fusion control, accelerator tuning, and autonomous scientific experiments. Enabling such regimes requires rethinking algorithms, architectures, and toolflows jointly, but promises to transform FPGAs from static inference engines into real-time learning machines.

Related papers

Continual Quantum Architecture Search with Tensor-Train Encoding: Theory and Applications to Signal Processing [68.35481158940401]
CL-QAS is a continual quantum architecture search framework.<n>It mitigates challenges of costly encoding amplitude and forgetting in variational quantum circuits.<n>It achieves controllable robustness expressivity, sample-efficient generalization, and smooth convergence without barren plateaus.
arXiv Detail & Related papers (2026-01-10T02:36:03Z)
Reinforcement Learning Control of Quantum Error Correction [108.70420561323692]
Quantum computer learns to self-improve directly from its errors and never stops computing.<n>This work enables a new paradigm: a quantum computer that learns to self-improve directly from its errors and never stops computing.
arXiv Detail & Related papers (2025-11-11T17:32:25Z)
Efficient Online Learning with Predictive Coding Networks: Exploiting Temporal Correlations [26.073347035678342]
Predictive Coding (PC) framework offers a biologically plausible alternative with local, Hebbian-like update rules.<n>We present Predictive Coding Network with Temporal Amortization (PCN-TA), which preserves latent states across temporal frames.<n>Experiments on the COIL-20 robotic perception dataset demonstrate that PCN-TA achieves 10% fewer weight updates compared to backpropagation.
arXiv Detail & Related papers (2025-10-29T22:09:53Z)
Hybrid Neural-MPM for Interactive Fluid Simulations in Real-Time [57.30651532625017]
We present a novel hybrid method that integrates numerical simulation, neural physics, and generative control.<n>Our system demonstrates robust performance across diverse 2D/3D scenarios, material types, and obstacle interactions.<n>We promise to release both models and data upon acceptance.
arXiv Detail & Related papers (2025-05-25T01:27:18Z)
Compressing Recurrent Neural Networks for FPGA-accelerated Implementation in Fluorescence Lifetime Imaging [3.502427552446068]
Deep learning models enable real-time inference, but can be computationally demanding due to complex architectures and large matrix operations. This makes DL models ill-suited for direct implementation on field-programmable gate array (FPGA)-based camera hardware. In this work, we focus on compressing recurrent neural networks (RNNs), which are well-suited for FLI time-series data processing, to enable deployment on resource-constrained FPGA boards.
arXiv Detail & Related papers (2024-10-01T17:23:26Z)
Enhancing Dropout-based Bayesian Neural Networks with Multi-Exit on FPGA [20.629635991749808]
This paper proposes an algorithm and hardware co-design framework that can generate field-programmable gate array (FPGA)-based accelerators for efficient BayesNNs. At the algorithm level, we propose novel multi-exit dropout-based BayesNNs with reduced computational and memory overheads. At the hardware level, this paper introduces a transformation framework that can generate FPGA-based accelerators for the proposed efficient BayesNNs.
arXiv Detail & Related papers (2024-06-20T17:08:42Z)
DOCTOR: Dynamic On-Chip Temporal Variation Remediation Toward Self-Corrected Photonic Tensor Accelerators [5.873308516576125]
Photonic tensor accelerators offer unparalleled speed and energy efficiency. Off-chip noise-aware training and on-chip training have been proposed to enhance the variation tolerance of optical neural accelerators. We propose a lightweight dynamic on-chip framework, dubbed DOCTOR, providing adaptive, in-situ accuracy recovery against temporally drifting noise.
arXiv Detail & Related papers (2024-03-05T06:17:13Z)
Fast Neural Network Inference on FPGAs for Triggering on Long-Lived Particles at Colliders [0.0]
We present two machine-learning algorithms for selecting events where neutral long-lived particles decay within the detector volume. The proposed new algorithms are proven efficient for the considered benchmark physics scenario and their accuracy is found to not degrade when accelerated on the FPGA cards.
arXiv Detail & Related papers (2023-07-11T10:17:57Z)
ETLP: Event-based Three-factor Local Plasticity for online learning with neuromorphic hardware [105.54048699217668]
We show a competitive performance in accuracy with a clear advantage in the computational complexity for Event-Based Three-factor Local Plasticity (ETLP) We also show that when using local plasticity, threshold adaptation in spiking neurons and a recurrent topology are necessary to learntemporal patterns with a rich temporal structure.
arXiv Detail & Related papers (2023-01-19T19:45:42Z)
Real-Time GPU-Accelerated Machine Learning Based Multiuser Detection for 5G and Beyond [70.81551587109833]
nonlinear beamforming filters can significantly outperform linear approaches in stationary scenarios with massive connectivity. One of the main challenges comes from the real-time implementation of these algorithms. This paper explores the acceleration of APSM-based algorithms through massive parallelization.
arXiv Detail & Related papers (2022-01-13T15:20:45Z)
Fast and differentiable simulation of driven quantum systems [58.720142291102135]
We introduce a semi-analytic method based on the Dyson expansion that allows us to time-evolve driven quantum systems much faster than standard numerical methods. We show results of the optimization of a two-qubit gate using transmon qubits in the circuit QED architecture.
arXiv Detail & Related papers (2020-12-16T21:43:38Z)

This list is automatically generated from the titles and abstracts of the papers in this site.