StrikeWatch: Wrist-worn Gait Recognition with Compact Time-series Models on Low-power FPGAs
- URL: http://arxiv.org/abs/2510.24738v1
- Date: Tue, 14 Oct 2025 20:28:31 GMT
- Title: StrikeWatch: Wrist-worn Gait Recognition with Compact Time-series Models on Low-power FPGAs
- Authors: Tianheng Ling, Chao Qian, Peter Zdankin, Torben Weis, Gregor Schiele,
- Abstract summary: Proper gait patterns can lead to injuries, particularly without expert feedback.<n>Wrist-worn wearables offer a more practical and non-intrusive alternative.<n>This paper introduces StrikeWatch, a compact wrist-worn system that performs entirely on-device, real-time gait recognition.
- Score: 10.946464973530214
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Running offers substantial health benefits, but improper gait patterns can lead to injuries, particularly without expert feedback. While prior gait analysis systems based on cameras, insoles, or body-mounted sensors have demonstrated effectiveness, they are often bulky and limited to offline, post-run analysis. Wrist-worn wearables offer a more practical and non-intrusive alternative, yet enabling real-time gait recognition on such devices remains challenging due to noisy Inertial Measurement Unit (IMU) signals, limited computing resources, and dependence on cloud connectivity. This paper introduces StrikeWatch, a compact wrist-worn system that performs entirely on-device, real-time gait recognition using IMU signals. As a case study, we target the detection of heel versus forefoot strikes to enable runners to self-correct harmful gait patterns through visual and auditory feedback during running. We propose four compact DL architectures (1D-CNN, 1D-SepCNN, LSTM, and Transformer) and optimize them for energy-efficient inference on two representative embedded Field-Programmable Gate Arrays (FPGAs): the AMD Spartan-7 XC7S15 and the Lattice iCE40UP5K. Using our custom-built hardware prototype, we collect a labeled dataset from outdoor running sessions and evaluate all models via a fully automated deployment pipeline. Our results reveal clear trade-offs between model complexity and hardware efficiency. Evaluated across 12 participants, 6-bit quantized 1D-SepCNN achieves the highest average F1 score of 0.847 while consuming just 0.350 {\mu}J per inference with a latency of 0.140 ms on the iCE40UP5K running at 20 MHz. This configuration supports up to 13.6 days of continuous inference on a 320 mAh battery. All datasets and code are available in the GitHub repository https://github.com/tianheng-ling/StrikeWatch.
Related papers
- Time-Series at the Edge: Tiny Separable CNNs for Wearable Gait Detection and Optimal Sensor Placement [3.7765281299298015]
We study on-device time-series analysis for gait detection in Parkinson's disease (PD) from short windows of triaxial acceleration, targeting resource-latency wearables and edge nodes.<n>We compare magnitude thresholding to three 1D CNNs for time-series analysis: a literature baseline (separable convolutions) and two ultra-light models - one purely separable and one with residual connections.
arXiv Detail & Related papers (2025-11-29T08:52:41Z) - Enabling Vibration-Based Gesture Recognition on Everyday Furniture via Energy-Efficient FPGA Implementation of 1D Convolutional Networks [11.481972015296812]
This study proposes an energy-efficient solution deploying compact NNs on low-power Field-Programmable Gate Arrays (FPGAs)<n>We replace complex spectral preprocessing with raw waveform input, eliminating complex on-board preprocessing while reducing input size by 21x without sacrificing accuracy.<n>We design two lightweight architectures (1D-CNN and 1D-SepCNN) tailored for embedded FPGAs, reducing parameters from 369 million to as few as 216 while maintaining comparable accuracy.
arXiv Detail & Related papers (2025-10-27T09:30:36Z) - Neural-HAR: A Dimension-Gated CNN Accelerator for Real-Time Radar Human Activity Recognition [5.400353553418959]
We introduce a dimension-gated CNN accelerator tailored for real-time radar HAR on resource-constrained platforms.<n>GateCNN attains 86.4% accuracy with only 2.7k parameters and 0.28M FLOPs per inference, comparable to CNN-BiGRU at a fraction of the complexity.<n>Our FPGA prototype on Xilinx Zynq-7000 Z-7007S reaches 107.5 $mu$s latency and 15 mW dynamic power using LUT-based ROM and distributed RAM only.
arXiv Detail & Related papers (2025-10-26T17:42:28Z) - Low-cost Embedded Breathing Rate Determination Using 802.15.4z IR-UWB Hardware for Remote Healthcare [2.6066253940276347]
We propose a convolutional neural network (CNN) specifically adapted to predict breathing rates from ultra-wideband (UWB) channel impulse response (CIR) data.<n>We show it is feasible to deploy the algorithm on an nRF52840 system-on-chip requiring only 46 KB of memory and operating with an inference time of only 192 ms.
arXiv Detail & Related papers (2025-04-03T07:54:25Z) - Realtime Person Identification via Gait Analysis [1.3260363717086592]
We propose a small CNN model with 4 layers that is very amenable for edge AI deployment and realtime gait recognition.
Our model achieves 96.7% accuracy and consumes only 5KB RAM with an inferencing time of 70 ms and 125mW power.
arXiv Detail & Related papers (2024-04-02T18:15:06Z) - Exploring Dynamic Transformer for Efficient Object Tracking [58.120191254379854]
We propose DyTrack, a dynamic transformer framework for efficient tracking.<n>DyTrack automatically learns to configure proper reasoning routes for various inputs, gaining better utilization of the available computational budget.<n>Experiments on multiple benchmarks demonstrate that DyTrack achieves promising speed-precision trade-offs with only a single model.
arXiv Detail & Related papers (2024-03-26T12:31:58Z) - Ultra-low Power Deep Learning-based Monocular Relative Localization
Onboard Nano-quadrotors [64.68349896377629]
This work presents a novel autonomous end-to-end system that addresses the monocular relative localization, through deep neural networks (DNNs), of two peer nano-drones.
To cope with the ultra-constrained nano-drone platform, we propose a vertically-integrated framework, including dataset augmentation, quantization, and system optimizations.
Experimental results show that our DNN can precisely localize a 10cm-size target nano-drone by employing only low-resolution monochrome images, up to 2m distance.
arXiv Detail & Related papers (2023-03-03T14:14:08Z) - FastPillars: A Deployment-friendly Pillar-based 3D Detector [63.0697065653061]
Existing BEV-based (i.e., Bird Eye View) detectors favor sparse convolutions (known as SPConv) to speed up training and inference.
FastPillars delivers state-of-the-art accuracy on Open dataset with 1.8X speed up and 3.8 mAPH/L2 improvement over CenterPoint (SPConv-based)
arXiv Detail & Related papers (2023-02-05T12:13:27Z) - Q-PPG: Energy-Efficient PPG-based Heart Rate Monitoring on Wearable
Devices [22.7371904884504]
We propose a design methodology to automatically generate a rich family of deep Temporal Convolutional Networks (TCNs) for HR monitoring.
Our most accurate model sets a new state-of-the-art in Mean Absolute Error.
We deploy our TCNs on an embedded platform featuring a STM32WB55 microcontroller, demonstrating their suitability for real-time execution.
arXiv Detail & Related papers (2022-03-24T10:50:33Z) - Accelerating Training and Inference of Graph Neural Networks with Fast
Sampling and Pipelining [58.10436813430554]
Mini-batch training of graph neural networks (GNNs) requires a lot of computation and data movement.
We argue in favor of performing mini-batch training with neighborhood sampling in a distributed multi-GPU environment.
We present a sequence of improvements to mitigate these bottlenecks, including a performance-engineered neighborhood sampler.
We also conduct an empirical analysis that supports the use of sampling for inference, showing that test accuracies are not materially compromised.
arXiv Detail & Related papers (2021-10-16T02:41:35Z) - ANNETTE: Accurate Neural Network Execution Time Estimation with Stacked
Models [56.21470608621633]
We propose a time estimation framework to decouple the architectural search from the target hardware.
The proposed methodology extracts a set of models from micro- kernel and multi-layer benchmarks and generates a stacked model for mapping and network execution time estimation.
We compare estimation accuracy and fidelity of the generated mixed models, statistical models with the roofline model, and a refined roofline model for evaluation.
arXiv Detail & Related papers (2021-05-07T11:39:05Z) - Simultaneous Detection and Tracking with Motion Modelling for Multiple
Object Tracking [94.24393546459424]
We introduce Deep Motion Modeling Network (DMM-Net) that can estimate multiple objects' motion parameters to perform joint detection and association.
DMM-Net achieves PR-MOTA score of 12.80 @ 120+ fps for the popular UA-DETRAC challenge, which is better performance and orders of magnitude faster.
We also contribute a synthetic large-scale public dataset Omni-MOT for vehicle tracking that provides precise ground-truth annotations.
arXiv Detail & Related papers (2020-08-20T08:05:33Z) - REST: Robust and Efficient Neural Networks for Sleep Monitoring in the
Wild [62.36144064259933]
We propose REST, a new method that simultaneously tackles both issues via adversarial training and controlling the Lipschitz constant of the neural network.
We demonstrate that REST produces highly-robust and efficient models that substantially outperform the original full-sized models in the presence of noise.
By deploying these models to an Android application on a smartphone, we quantitatively observe that REST allows models to achieve up to 17x energy reduction and 9x faster inference.
arXiv Detail & Related papers (2020-01-29T17:23:16Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.