Related papers: StrikeWatch: Wrist-worn Gait Recognition with Compact Time-series Models on Low-power FPGAs

StrikeWatch: Wrist-worn Gait Recognition with Compact Time-series Models on Low-power FPGAs

URL: http://arxiv.org/abs/2510.24738v1
Date: Tue, 14 Oct 2025 20:28:31 GMT
Title: StrikeWatch: Wrist-worn Gait Recognition with Compact Time-series Models on Low-power FPGAs
Authors: Tianheng Ling, Chao Qian, Peter Zdankin, Torben Weis, Gregor Schiele,
Abstract summary: Proper gait patterns can lead to injuries, particularly without expert feedback.<n>Wrist-worn wearables offer a more practical and non-intrusive alternative.<n>This paper introduces StrikeWatch, a compact wrist-worn system that performs entirely on-device, real-time gait recognition.
Score: 10.946464973530214
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Running offers substantial health benefits, but improper gait patterns can lead to injuries, particularly without expert feedback. While prior gait analysis systems based on cameras, insoles, or body-mounted sensors have demonstrated effectiveness, they are often bulky and limited to offline, post-run analysis. Wrist-worn wearables offer a more practical and non-intrusive alternative, yet enabling real-time gait recognition on such devices remains challenging due to noisy Inertial Measurement Unit (IMU) signals, limited computing resources, and dependence on cloud connectivity. This paper introduces StrikeWatch, a compact wrist-worn system that performs entirely on-device, real-time gait recognition using IMU signals. As a case study, we target the detection of heel versus forefoot strikes to enable runners to self-correct harmful gait patterns through visual and auditory feedback during running. We propose four compact DL architectures (1D-CNN, 1D-SepCNN, LSTM, and Transformer) and optimize them for energy-efficient inference on two representative embedded Field-Programmable Gate Arrays (FPGAs): the AMD Spartan-7 XC7S15 and the Lattice iCE40UP5K. Using our custom-built hardware prototype, we collect a labeled dataset from outdoor running sessions and evaluate all models via a fully automated deployment pipeline. Our results reveal clear trade-offs between model complexity and hardware efficiency. Evaluated across 12 participants, 6-bit quantized 1D-SepCNN achieves the highest average F1 score of 0.847 while consuming just 0.350 {\mu}J per inference with a latency of 0.140 ms on the iCE40UP5K running at 20 MHz. This configuration supports up to 13.6 days of continuous inference on a 320 mAh battery. All datasets and code are available in the GitHub repository https://github.com/tianheng-ling/StrikeWatch.

Related papers

Time-Series at the Edge: Tiny Separable CNNs for Wearable Gait Detection and Optimal Sensor Placement [3.7765281299298015]
We study on-device time-series analysis for gait detection in Parkinson's disease (PD) from short windows of triaxial acceleration, targeting resource-latency wearables and edge nodes.<n>We compare magnitude thresholding to three 1D CNNs for time-series analysis: a literature baseline (separable convolutions) and two ultra-light models - one purely separable and one with residual connections.
arXiv Detail & Related papers (2025-11-29T08:52:41Z)
Enabling Vibration-Based Gesture Recognition on Everyday Furniture via Energy-Efficient FPGA Implementation of 1D Convolutional Networks [11.481972015296812]
This study proposes an energy-efficient solution deploying compact NNs on low-power Field-Programmable Gate Arrays (FPGAs)<n>We replace complex spectral preprocessing with raw waveform input, eliminating complex on-board preprocessing while reducing input size by 21x without sacrificing accuracy.<n>We design two lightweight architectures (1D-CNN and 1D-SepCNN) tailored for embedded FPGAs, reducing parameters from 369 million to as few as 216 while maintaining comparable accuracy.
arXiv Detail & Related papers (2025-10-27T09:30:36Z)
Neural-HAR: A Dimension-Gated CNN Accelerator for Real-Time Radar Human Activity Recognition [5.400353553418959]
We introduce a dimension-gated CNN accelerator tailored for real-time radar HAR on resource-constrained platforms.<n>GateCNN attains 86.4% accuracy with only 2.7k parameters and 0.28M FLOPs per inference, comparable to CNN-BiGRU at a fraction of the complexity.<n>Our FPGA prototype on Xilinx Zynq-7000 Z-7007S reaches 107.5 $mu$s latency and 15 mW dynamic power using LUT-based ROM and distributed RAM only.
arXiv Detail & Related papers (2025-10-26T17:42:28Z)
Low-cost Embedded Breathing Rate Determination Using 802.15.4z IR-UWB Hardware for Remote Healthcare [2.6066253940276347]
We propose a convolutional neural network (CNN) specifically adapted to predict breathing rates from ultra-wideband (UWB) channel impulse response (CIR) data.<n>We show it is feasible to deploy the algorithm on an nRF52840 system-on-chip requiring only 46 KB of memory and operating with an inference time of only 192 ms.
arXiv Detail & Related papers (2025-04-03T07:54:25Z)
Realtime Person Identification via Gait Analysis [1.3260363717086592]
We propose a small CNN model with 4 layers that is very amenable for edge AI deployment and realtime gait recognition. Our model achieves 96.7% accuracy and consumes only 5KB RAM with an inferencing time of 70 ms and 125mW power.
arXiv Detail & Related papers (2024-04-02T18:15:06Z)
Exploring Dynamic Transformer for Efficient Object Tracking [58.120191254379854]
We propose DyTrack, a dynamic transformer framework for efficient tracking.<n>DyTrack automatically learns to configure proper reasoning routes for various inputs, gaining better utilization of the available computational budget.<n>Experiments on multiple benchmarks demonstrate that DyTrack achieves promising speed-precision trade-offs with only a single model.
arXiv Detail & Related papers (2024-03-26T12:31:58Z)
Ultra-low Power Deep Learning-based Monocular Relative Localization Onboard Nano-quadrotors [64.68349896377629]
This work presents a novel autonomous end-to-end system that addresses the monocular relative localization, through deep neural networks (DNNs), of two peer nano-drones. To cope with the ultra-constrained nano-drone platform, we propose a vertically-integrated framework, including dataset augmentation, quantization, and system optimizations. Experimental results show that our DNN can precisely localize a 10cm-size target nano-drone by employing only low-resolution monochrome images, up to 2m distance.
arXiv Detail & Related papers (2023-03-03T14:14:08Z)
FastPillars: A Deployment-friendly Pillar-based 3D Detector [63.0697065653061]
Existing BEV-based (i.e., Bird Eye View) detectors favor sparse convolutions (known as SPConv) to speed up training and inference. FastPillars delivers state-of-the-art accuracy on Open dataset with 1.8X speed up and 3.8 mAPH/L2 improvement over CenterPoint (SPConv-based)
arXiv Detail & Related papers (2023-02-05T12:13:27Z)
Q-PPG: Energy-Efficient PPG-based Heart Rate Monitoring on Wearable Devices [22.7371904884504]
We propose a design methodology to automatically generate a rich family of deep Temporal Convolutional Networks (TCNs) for HR monitoring. Our most accurate model sets a new state-of-the-art in Mean Absolute Error. We deploy our TCNs on an embedded platform featuring a STM32WB55 microcontroller, demonstrating their suitability for real-time execution.
arXiv Detail & Related papers (2022-03-24T10:50:33Z)
Accelerating Training and Inference of Graph Neural Networks with Fast Sampling and Pipelining [58.10436813430554]
Mini-batch training of graph neural networks (GNNs) requires a lot of computation and data movement. We argue in favor of performing mini-batch training with neighborhood sampling in a distributed multi-GPU environment. We present a sequence of improvements to mitigate these bottlenecks, including a performance-engineered neighborhood sampler. We also conduct an empirical analysis that supports the use of sampling for inference, showing that test accuracies are not materially compromised.
arXiv Detail & Related papers (2021-10-16T02:41:35Z)
ANNETTE: Accurate Neural Network Execution Time Estimation with Stacked Models [56.21470608621633]
We propose a time estimation framework to decouple the architectural search from the target hardware. The proposed methodology extracts a set of models from micro- kernel and multi-layer benchmarks and generates a stacked model for mapping and network execution time estimation. We compare estimation accuracy and fidelity of the generated mixed models, statistical models with the roofline model, and a refined roofline model for evaluation.
arXiv Detail & Related papers (2021-05-07T11:39:05Z)
Simultaneous Detection and Tracking with Motion Modelling for Multiple Object Tracking [94.24393546459424]
We introduce Deep Motion Modeling Network (DMM-Net) that can estimate multiple objects' motion parameters to perform joint detection and association. DMM-Net achieves PR-MOTA score of 12.80 @ 120+ fps for the popular UA-DETRAC challenge, which is better performance and orders of magnitude faster. We also contribute a synthetic large-scale public dataset Omni-MOT for vehicle tracking that provides precise ground-truth annotations.
arXiv Detail & Related papers (2020-08-20T08:05:33Z)
REST: Robust and Efficient Neural Networks for Sleep Monitoring in the Wild [62.36144064259933]
We propose REST, a new method that simultaneously tackles both issues via adversarial training and controlling the Lipschitz constant of the neural network. We demonstrate that REST produces highly-robust and efficient models that substantially outperform the original full-sized models in the presence of noise. By deploying these models to an Android application on a smartphone, we quantitatively observe that REST allows models to achieve up to 17x energy reduction and 9x faster inference.
arXiv Detail & Related papers (2020-01-29T17:23:16Z)

This list is automatically generated from the titles and abstracts of the papers in this site.