Related papers: TinyRadarNN: Combining Spatial and Temporal Convolutional Neural Networks for Embedded Gesture Recognition with Short Range Radars

TinyRadarNN: Combining Spatial and Temporal Convolutional Neural Networks for Embedded Gesture Recognition with Short Range Radars

URL: http://arxiv.org/abs/2006.16281v3
Date: Tue, 16 Mar 2021 15:33:18 GMT
Title: TinyRadarNN: Combining Spatial and Temporal Convolutional Neural Networks for Embedded Gesture Recognition with Short Range Radars
Authors: Moritz Scherer, Michele Magno, Jonas Erb, Philipp Mayer, Manuel Eggimann, Luca Benini
Abstract summary: This work proposes a low-power high-accuracy embedded hand-gesture recognition algorithm targeting battery-operated wearable devices. A 2D Convolutional Neural Network (CNN) using range frequency Doppler features is combined with a Temporal Convolutional Neural Network (TCN) for time sequence prediction.
Score: 13.266626571886354
License: http://creativecommons.org/licenses/by-sa/4.0/
Abstract: This work proposes a low-power high-accuracy embedded hand-gesture recognition algorithm targeting battery-operated wearable devices using low power short-range RADAR sensors. A 2D Convolutional Neural Network (CNN) using range frequency Doppler features is combined with a Temporal Convolutional Neural Network (TCN) for time sequence prediction. The final algorithm has a model size of only 46 thousand parameters, yielding a memory footprint of only 92 KB. Two datasets containing 11 challenging hand gestures performed by 26 different people have been recorded containing a total of 20,210 gesture instances. On the 11 hand gesture dataset, accuracies of 86.6% (26 users) and 92.4% (single user) have been achieved, which are comparable to the state-of-the-art, which achieves 87% (10 users) and 94% (single user), while using a TCN-based network that is 7500x smaller than the state-of-the-art. Furthermore, the gesture recognition classifier has been implemented on a Parallel Ultra-Low Power Processor, demonstrating that real-time prediction is feasible with only 21 mW of power consumption for the full TCN sequence prediction network, while a system-level power consumption of less than 100 mW is achieved. We provide open-source access to all the code and data collected and used in this work on tinyradar.ethz.ch.

Related papers

TENNs-PLEIADES: Building Temporal Kernels with Orthogonal Polynomials [1.1970409518725493]
We focus on interfacing these networks with event-based data to perform online classification and detection with low latency. We experimented with three event-based benchmarks and obtained state-of-the-art results on all three by large margins with significantly smaller memory and compute costs.
arXiv Detail & Related papers (2024-05-20T17:06:24Z)
Gesture Recognition for FMCW Radar on the Edge [0.0]
We show that gestures can be characterized efficiently by a set of five features. A recurrent neural network (RNN) based architecture exploits these features to jointly detect and classify five different gestures. The proposed system recognizes gestures with an F1 score of 98.4% on our hold-out test dataset.
arXiv Detail & Related papers (2023-10-13T06:03:07Z)
Ultra-low Power Deep Learning-based Monocular Relative Localization Onboard Nano-quadrotors [64.68349896377629]
This work presents a novel autonomous end-to-end system that addresses the monocular relative localization, through deep neural networks (DNNs), of two peer nano-drones. To cope with the ultra-constrained nano-drone platform, we propose a vertically-integrated framework, including dataset augmentation, quantization, and system optimizations. Experimental results show that our DNN can precisely localize a 10cm-size target nano-drone by employing only low-resolution monochrome images, up to 2m distance.
arXiv Detail & Related papers (2023-03-03T14:14:08Z)
Hand gesture recognition using 802.11ad mmWave sensor in the mobile device [2.5476515662939563]
We explore the feasibility of AI assisted hand-gesture recognition using 802.11ad 60GHz (mmWave) technology in smartphones. We built a prototype system, where radar sensing and communication waveform can coexist by time-division duplex (TDD) It can gather sensing data and predict gestures within 100 milliseconds.
arXiv Detail & Related papers (2022-11-14T03:36:17Z)
Variable Bitrate Neural Fields [75.24672452527795]
We present a dictionary method for compressing feature grids, reducing their memory consumption by up to 100x. We formulate the dictionary optimization as a vector-quantized auto-decoder problem which lets us learn end-to-end discrete neural representations in a space where no direct supervision is available.
arXiv Detail & Related papers (2022-06-15T17:58:34Z)
Braille Letter Reading: A Benchmark for Spatio-Temporal Pattern Recognition on Neuromorphic Hardware [50.380319968947035]
Recent deep learning approaches have reached accuracy in such tasks, but their implementation on conventional embedded solutions is still computationally very and energy expensive. We propose a new benchmark for computing tactile pattern recognition at the edge through letters reading. We trained and compared feed-forward and recurrent spiking neural networks (SNNs) offline using back-propagation through time with surrogate gradients, then we deployed them on the Intel Loihimorphic chip for efficient inference. Our results show that the LSTM outperforms the recurrent SNN in terms of accuracy by 14%. However, the recurrent SNN on Loihi is 237 times more energy
arXiv Detail & Related papers (2022-05-30T14:30:45Z)
Real-time Speaker counting in a cocktail party scenario using Attention-guided Convolutional Neural Network [60.99112031408449]
We propose a real-time, single-channel attention-guided Convolutional Neural Network (CNN) to estimate the number of active speakers in overlapping speech. The proposed system extracts higher-level information from the speech spectral content using a CNN model. Experiments on simulated overlapping speech using WSJ corpus show that the attention solution is shown to improve the performance by almost 3% absolute over conventional temporal average pooling.
arXiv Detail & Related papers (2021-10-30T19:24:57Z)
ANNETTE: Accurate Neural Network Execution Time Estimation with Stacked Models [56.21470608621633]
We propose a time estimation framework to decouple the architectural search from the target hardware. The proposed methodology extracts a set of models from micro- kernel and multi-layer benchmarks and generates a stacked model for mapping and network execution time estimation. We compare estimation accuracy and fidelity of the generated mixed models, statistical models with the roofline model, and a refined roofline model for evaluation.
arXiv Detail & Related papers (2021-05-07T11:39:05Z)
Sound Event Detection with Binary Neural Networks on Tightly Power-Constrained IoT Devices [20.349809458335532]
Sound event detection (SED) is a hot topic in consumer and smart city applications. Existing approaches based on Deep Neural Networks are very effective, but highly demanding in terms of memory, power, and throughput. In this paper, we explore the combination of extreme quantization to a small-print binary neural network (BNN) with the highly energy-efficient, RISC-V-based (8+1)-core GAP8 microcontroller.
arXiv Detail & Related papers (2021-01-12T12:38:23Z)
Resource-efficient DNNs for Keyword Spotting using Neural Architecture Search and Quantization [23.850887499271842]
This paper introduces neural architecture search (NAS) for the automatic discovery of small models for keyword spotting. We employ a differentiable NAS approach to optimize the structure of convolutional neural networks (CNNs) to maximize the classification accuracy. Using NAS only, we were able to obtain a highly efficient model with 95.4% accuracy on the Google speech commands dataset.
arXiv Detail & Related papers (2020-12-18T09:53:55Z)
End-to-End Multi-speaker Speech Recognition with Transformer [88.22355110349933]
We replace the RNN-based encoder-decoder in the speech recognition model with a Transformer architecture. We also modify the self-attention component to be restricted to a segment rather than the whole sequence in order to reduce computation.
arXiv Detail & Related papers (2020-02-10T16:29:26Z)

This list is automatically generated from the titles and abstracts of the papers in this site.