Related papers: Hardware-accelerated graph neural networks: an alternative approach for neuromorphic event-based audio classification and keyword spotting on SoC FPGA

Hardware-accelerated graph neural networks: an alternative approach for neuromorphic event-based audio classification and keyword spotting on SoC FPGA

URL: http://arxiv.org/abs/2602.16442v1
Date: Wed, 18 Feb 2026 13:26:22 GMT
Title: Hardware-accelerated graph neural networks: an alternative approach for neuromorphic event-based audio classification and keyword spotting on SoC FPGA
Authors: Kamil Jeziorek, Piotr Wzorek, Krzysztof Blachut, Hiroshi Nakano, Manon Dampfhoffer, Thomas Mesquida, Hiroaki Nishi, Thomas Dalgaty, Tomasz Kryjak,
Abstract summary: FPGA implementation of event-graph neural networks for audio processing.<n>System achieves up to 95% word-end detection accuracy, with only 10.53 microsecond latency and 1.18 W power consumption.
Score: 12.261656409413753
License: http://creativecommons.org/licenses/by/4.0/
Abstract: As the volume of data recorded by embedded edge sensors increases, particularly from neuromorphic devices producing discrete event streams, there is a growing need for hardware-aware neural architectures that enable efficient, low-latency, and energy-conscious local processing. We present an FPGA implementation of event-graph neural networks for audio processing. We utilise an artificial cochlea that converts time-series signals into sparse event data, reducing memory and computation costs. Our architecture was implemented on a SoC FPGA and evaluated on two open-source datasets. For classification task, our baseline floating-point model achieves 92.7% accuracy on SHD dataset - only 2.4% below the state of the art - while requiring over 10x and 67x fewer parameters. On SSC, our models achieve 66.9-71.0% accuracy. Compared to FPGA-based spiking neural networks, our quantised model reaches 92.3% accuracy, outperforming them by up to 19.3% while reducing resource usage and latency. For SSC, we report the first hardware-accelerated evaluation. We further demonstrate the first end-to-end FPGA implementation of event-audio keyword spotting, combining graph convolutional layers with recurrent sequence modelling. The system achieves up to 95% word-end detection accuracy, with only 10.53 microsecond latency and 1.18 W power consumption, establishing a strong benchmark for energy-efficient event-driven KWS.

Related papers

LiQSS: Post-Transformer Linear Quantum-Inspired State-Space Tensor Networks for Real-Time 6G [85.58816960936069]
Proactive and agentic control in Sixth-Generation (6G) Open Radio Access Networks (O-RAN) requires control-grade prediction under stringent Near-Time (Near-RT) latency and computational constraints.<n>This paper investigates a post-Transformer paradigm for efficient radio telemetry forecasting.<n>We propose a quantum-inspired state-space tensor network that replaces self-attention with stable structured state-space dynamics kernels.
arXiv Detail & Related papers (2026-01-18T12:08:38Z)
Enabling Vibration-Based Gesture Recognition on Everyday Furniture via Energy-Efficient FPGA Implementation of 1D Convolutional Networks [11.481972015296812]
This study proposes an energy-efficient solution deploying compact NNs on low-power Field-Programmable Gate Arrays (FPGAs)<n>We replace complex spectral preprocessing with raw waveform input, eliminating complex on-board preprocessing while reducing input size by 21x without sacrificing accuracy.<n>We design two lightweight architectures (1D-CNN and 1D-SepCNN) tailored for embedded FPGAs, reducing parameters from 369 million to as few as 216 while maintaining comparable accuracy.
arXiv Detail & Related papers (2025-10-27T09:30:36Z)
Neuromorphic Wireless Split Computing with Resonate-and-Fire Neurons [69.73249913506042]
This paper investigates a wireless split computing architecture that employs resonate-and-fire (RF) neurons to process time-domain signals directly.<n>By resonating at tunable frequencies, RF neurons extract time-localized spectral features while maintaining low spiking activity.<n> Experimental results show that the proposed RF-SNN architecture achieves comparable accuracy to conventional LIF-SNNs and ANNs.
arXiv Detail & Related papers (2025-06-24T21:14:59Z)
Efficient FPGA-accelerated Convolutional Neural Networks for Cloud Detection on CubeSats [0.5420492913071214]
We present the implementation of four FPGA-accelerated convolutional neural network (CNN) models for onboard cloud detection in resource-constrained CubeSat missions.<n>This study explores both pixel-wise (Pixel-Net and Patch-Net) and image-wise (U-Net and Scene-Net) models to benchmark trade-offs in accuracy, latency, and model complexity.<n>All models retained high accuracy post-FPGA integration, with a cumulative maximum accuracy drop of only 0.6% after quantization and pruning.
arXiv Detail & Related papers (2025-04-04T19:32:47Z)
Hardware-Accelerated Event-Graph Neural Networks for Low-Latency Time-Series Classification on SoC FPGA [0.043533652831655174]
We present a hardware implementation of an event-graph neural network for time-series classification.<n>We leverage an artificial cochlea model to convert the input time-series signals into a sparse event-data format.<n>Our method achieves a floating-point accuracy of 92.7% on the SHD dataset for the base model, which is only 2.4% and 2% less than the state-of-the-art models.
arXiv Detail & Related papers (2025-03-09T14:08:46Z)
Embedded Graph Convolutional Networks for Real-Time Event Data Processing on SoC FPGAs [0.815557531820863]
We introduce a custom EFGCN (Event-based FPGA-accelerated Graph Convolutional Network) designed with a series of hardware-aware optimisations tailored for PointNetConv.<n>The proposed techniques result in up to 100-fold reduction in model size compared to Asynchronous Event-based GNN (AEGNN)<n>Our approach achieves state-of-the-art performance across multiple event-based classification benchmarks while remaining highly scalable, customisable and resource-efficient.
arXiv Detail & Related papers (2024-06-11T14:47:36Z)
LL-GNN: Low Latency Graph Neural Networks on FPGAs for High Energy Physics [45.666822327616046]
This work presents a novel reconfigurable architecture for Low Graph Neural Network (LL-GNN) designs for particle detectors. The LL-GNN design advances the next generation of trigger systems by enabling sophisticated algorithms to process experimental data efficiently.
arXiv Detail & Related papers (2022-09-28T12:55:35Z)
Seizure Detection and Prediction by Parallel Memristive Convolutional Neural Networks [2.0738462952016232]
We propose a low-latency parallel Convolutional Neural Network (CNN) architecture that has between 2-2,800x fewer network parameters compared to SOTA CNN architectures. Our network achieves cross validation accuracy of 99.84% for epileptic seizure detection and 97.54% for epileptic seizure prediction. The CNN component of our platform is estimated to consume approximately 2.791W of power while occupying an area of 31.255mm$2$ in a 22nm FDSOI CMOS process.
arXiv Detail & Related papers (2022-06-20T18:16:35Z)
Braille Letter Reading: A Benchmark for Spatio-Temporal Pattern Recognition on Neuromorphic Hardware [50.380319968947035]
Recent deep learning approaches have reached accuracy in such tasks, but their implementation on conventional embedded solutions is still computationally very and energy expensive. We propose a new benchmark for computing tactile pattern recognition at the edge through letters reading. We trained and compared feed-forward and recurrent spiking neural networks (SNNs) offline using back-propagation through time with surrogate gradients, then we deployed them on the Intel Loihimorphic chip for efficient inference. Our results show that the LSTM outperforms the recurrent SNN in terms of accuracy by 14%. However, the recurrent SNN on Loihi is 237 times more energy
arXiv Detail & Related papers (2022-05-30T14:30:45Z)
SignalNet: A Low Resolution Sinusoid Decomposition and Estimation Network [79.04274563889548]
We propose SignalNet, a neural network architecture that detects the number of sinusoids and estimates their parameters from quantized in-phase and quadrature samples. We introduce a worst-case learning threshold for comparing the results of our network relative to the underlying data distributions. In simulation, we find that our algorithm is always able to surpass the threshold for three-bit data but often cannot exceed the threshold for one-bit data.
arXiv Detail & Related papers (2021-06-10T04:21:20Z)
ANNETTE: Accurate Neural Network Execution Time Estimation with Stacked Models [56.21470608621633]
We propose a time estimation framework to decouple the architectural search from the target hardware. The proposed methodology extracts a set of models from micro- kernel and multi-layer benchmarks and generates a stacked model for mapping and network execution time estimation. We compare estimation accuracy and fidelity of the generated mixed models, statistical models with the roofline model, and a refined roofline model for evaluation.
arXiv Detail & Related papers (2021-05-07T11:39:05Z)
FastFlowNet: A Lightweight Network for Fast Optical Flow Estimation [81.76975488010213]
Dense optical flow estimation plays a key role in many robotic vision tasks. Current networks often occupy large number of parameters and require heavy computation costs. Our proposed FastFlowNet works in the well-known coarse-to-fine manner with following innovations.
arXiv Detail & Related papers (2021-03-08T03:09:37Z)

This list is automatically generated from the titles and abstracts of the papers in this site.