Neural-HAR: A Dimension-Gated CNN Accelerator for Real-Time Radar Human Activity Recognition
- URL: http://arxiv.org/abs/2510.22772v1
- Date: Sun, 26 Oct 2025 17:42:28 GMT
- Title: Neural-HAR: A Dimension-Gated CNN Accelerator for Real-Time Radar Human Activity Recognition
- Authors: Yizhuo Wu, Francesco Fioranelli, Chang Gao,
- Abstract summary: We introduce a dimension-gated CNN accelerator tailored for real-time radar HAR on resource-constrained platforms.<n>GateCNN attains 86.4% accuracy with only 2.7k parameters and 0.28M FLOPs per inference, comparable to CNN-BiGRU at a fraction of the complexity.<n>Our FPGA prototype on Xilinx Zynq-7000 Z-7007S reaches 107.5 $mu$s latency and 15 mW dynamic power using LUT-based ROM and distributed RAM only.
- Score: 5.400353553418959
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Radar-based human activity recognition (HAR) is attractive for unobtrusive and privacy-preserving monitoring, yet many CNN/RNN solutions remain too heavy for edge deployment, and even lightweight ViT/SSM variants often exceed practical compute and memory budgets. We introduce Neural-HAR, a dimension-gated CNN accelerator tailored for real-time radar HAR on resource-constrained platforms. At its core is GateCNN, a parameter-efficient Doppler-temporal network that (i) embeds Doppler vectors to emphasize frequency evolution over time and (ii) applies dual-path gated convolutions that modulate Doppler-aware content features with temporal gates, complemented by a residual path for stable training. On the University of Glasgow UoG2020 continuous radar dataset, GateCNN attains 86.4% accuracy with only 2.7k parameters and 0.28M FLOPs per inference, comparable to CNN-BiGRU at a fraction of the complexity. Our FPGA prototype on Xilinx Zynq-7000 Z-7007S reaches 107.5 $\mu$s latency and 15 mW dynamic power using LUT-based ROM and distributed RAM only (zero DSP/BRAM), demonstrating real-time, energy-efficient edge inference. Code and HLS conversion scripts are available at https://github.com/lab-emi/AIRHAR.
Related papers
- Hardware-accelerated graph neural networks: an alternative approach for neuromorphic event-based audio classification and keyword spotting on SoC FPGA [12.261656409413753]
FPGA implementation of event-graph neural networks for audio processing.<n>System achieves up to 95% word-end detection accuracy, with only 10.53 microsecond latency and 1.18 W power consumption.
arXiv Detail & Related papers (2026-02-18T13:26:22Z) - LiQSS: Post-Transformer Linear Quantum-Inspired State-Space Tensor Networks for Real-Time 6G [85.58816960936069]
Proactive and agentic control in Sixth-Generation (6G) Open Radio Access Networks (O-RAN) requires control-grade prediction under stringent Near-Time (Near-RT) latency and computational constraints.<n>This paper investigates a post-Transformer paradigm for efficient radio telemetry forecasting.<n>We propose a quantum-inspired state-space tensor network that replaces self-attention with stable structured state-space dynamics kernels.
arXiv Detail & Related papers (2026-01-18T12:08:38Z) - Enabling Vibration-Based Gesture Recognition on Everyday Furniture via Energy-Efficient FPGA Implementation of 1D Convolutional Networks [11.481972015296812]
This study proposes an energy-efficient solution deploying compact NNs on low-power Field-Programmable Gate Arrays (FPGAs)<n>We replace complex spectral preprocessing with raw waveform input, eliminating complex on-board preprocessing while reducing input size by 21x without sacrificing accuracy.<n>We design two lightweight architectures (1D-CNN and 1D-SepCNN) tailored for embedded FPGAs, reducing parameters from 369 million to as few as 216 while maintaining comparable accuracy.
arXiv Detail & Related papers (2025-10-27T09:30:36Z) - Lightweight DNN for Full-Band Speech Denoising on Mobile Devices: Exploiting Long and Short Temporal Patterns [4.121578819979242]
We present a causal, low latency, and lightweight deep neural network (DNN)-based method for full-band speech denoising.<n>The method is based on a modified UNet architecture employing look-back frames, temporal spanning of convolutional kernels, and recurrent neural networks.<n>The proposed method is evaluated using established speech denoising metrics and publicly available datasets.
arXiv Detail & Related papers (2025-09-05T13:18:25Z) - Lightweight LIF-only SNN accelerator using differential time encoding [0.3749861135832073]
Spiking Neural Networks (SNNs) offer a promising solution to the problem of increasing computational and energy requirements for modern Machine Learning (ML) applications.<n>Due to their unique data representation choice of using spikes and spike trains, they mostly rely on additions and thresholding operations to achieve results approaching state-of-the-art (SOTA) Artificial Neural Networks (ANNs)<n>This work will introduce a hardware accelerator architecture capable of computing feedforward LIF-only SNNs, as well as an accompanying encoding method to efficiently encode already existing data into spike trains.
arXiv Detail & Related papers (2025-05-16T13:42:39Z) - Dynamic Tsetlin Machine Accelerators for On-Chip Training at the Edge using FPGAs [0.3440236962613469]
This paper presents a Dynamic Tsetlin Machine (DTM) training accelerator as an alternative to Deep Neural Networks (DNNs)<n>DTM trains with fewer multiply-accumulates, devoid of derivative computation.<n>The proposed accelerator offers 2.54x more Giga operations per second per Watt (GOP/s per W) and uses 6x less power than the next-best comparable design.
arXiv Detail & Related papers (2025-04-28T13:38:53Z) - Spiker+: a framework for the generation of efficient Spiking Neural
Networks FPGA accelerators for inference at the edge [49.42371633618761]
Spiker+ is a framework for generating efficient, low-power, and low-area customized Spiking Neural Networks (SNN) accelerators on FPGA for inference at the edge.
Spiker+ is tested on two benchmark datasets, the MNIST and the Spiking Heidelberg Digits (SHD)
arXiv Detail & Related papers (2024-01-02T10:42:42Z) - Data-Free Dynamic Compression of CNNs for Tractable Efficiency [46.498278084317704]
structured pruning approaches have shown promise in lowering floating-point operations without substantial drops in accuracy.<n>We propose HASTE (Hashing for Tractable Efficiency), a data-free, plug-and-play convolution module that instantly reduces a network's test-time inference cost without training or fine-tuning.<n>We demonstrate our approach on the popular vision benchmarks CIFAR-10 and ImageNet, where we achieve a 46.72% reduction in FLOPs with only a 1.25% loss in accuracy.
arXiv Detail & Related papers (2023-09-29T13:09:40Z) - LL-GNN: Low Latency Graph Neural Networks on FPGAs for High Energy
Physics [45.666822327616046]
This work presents a novel reconfigurable architecture for Low Graph Neural Network (LL-GNN) designs for particle detectors.
The LL-GNN design advances the next generation of trigger systems by enabling sophisticated algorithms to process experimental data efficiently.
arXiv Detail & Related papers (2022-09-28T12:55:35Z) - Braille Letter Reading: A Benchmark for Spatio-Temporal Pattern
Recognition on Neuromorphic Hardware [50.380319968947035]
Recent deep learning approaches have reached accuracy in such tasks, but their implementation on conventional embedded solutions is still computationally very and energy expensive.
We propose a new benchmark for computing tactile pattern recognition at the edge through letters reading.
We trained and compared feed-forward and recurrent spiking neural networks (SNNs) offline using back-propagation through time with surrogate gradients, then we deployed them on the Intel Loihimorphic chip for efficient inference.
Our results show that the LSTM outperforms the recurrent SNN in terms of accuracy by 14%. However, the recurrent SNN on Loihi is 237 times more energy
arXiv Detail & Related papers (2022-05-30T14:30:45Z) - Vau da muntanialas: Energy-efficient multi-die scalable acceleration of
RNN inference [18.50014427283814]
We present Muntaniala, an RNN accelerator architecture for LSTM inference with a silicon-measured energy-efficiency of 3.25$TOP/s/W$.
The scalable design of Muntaniala allows running large RNN models by combining multiple tiles in a systolic array.
We show a phoneme error rate (PER) drop of approximately 3% with respect to floating-point (FP) on a 3L-384NH-123NI LSTM network.
arXiv Detail & Related papers (2022-02-14T09:21:16Z) - Two-Timescale End-to-End Learning for Channel Acquisition and Hybrid
Precoding [94.40747235081466]
We propose an end-to-end deep learning-based joint transceiver design algorithm for millimeter wave (mmWave) massive multiple-input multiple-output (MIMO) systems.
We develop a DNN architecture that maps the received pilots into feedback bits at the receiver, and then further maps the feedback bits into the hybrid precoder at the transmitter.
arXiv Detail & Related papers (2021-10-22T20:49:02Z) - FastFlowNet: A Lightweight Network for Fast Optical Flow Estimation [81.76975488010213]
Dense optical flow estimation plays a key role in many robotic vision tasks.
Current networks often occupy large number of parameters and require heavy computation costs.
Our proposed FastFlowNet works in the well-known coarse-to-fine manner with following innovations.
arXiv Detail & Related papers (2021-03-08T03:09:37Z) - RTMobile: Beyond Real-Time Mobile Acceleration of RNNs for Speech
Recognition [51.1437598405873]
RTMobile is the first work that can achieve real-time RNN inference on mobile platforms.
Compared with prior work on FPGA, RTMobile using Adreno 640 embedded GPU on GRU can improve the energy-efficiency by about 40$times$ while maintaining the same inference time.
arXiv Detail & Related papers (2020-02-19T00:07:32Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.