Low-latency hand gesture recognition with a low resolution thermal
imager
- URL: http://arxiv.org/abs/2004.11623v1
- Date: Fri, 24 Apr 2020 09:43:48 GMT
- Title: Low-latency hand gesture recognition with a low resolution thermal
imager
- Authors: Maarten Vandersteegen, Wouter Reusen, Kristof Van Beeck Toon Goedeme
- Abstract summary: We propose an algorithm that predicts hand gestures using a cheap low-resolution thermal camera with only 32x24 pixels.
Our best model achieves 95.9% classification accuracy and 83% mAP detection accuracy while its processing pipeline has a latency of only one frame.
- Score: 4.063682271487617
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Using hand gestures to answer a call or to control the radio while driving a
car, is nowadays an established feature in more expensive cars. High resolution
time-of-flight cameras and powerful embedded processors usually form the heart
of these gesture recognition systems. This however comes with a price tag. We
therefore investigate the possibility to design an algorithm that predicts hand
gestures using a cheap low-resolution thermal camera with only 32x24 pixels,
which is light-weight enough to run on a low-cost processor. We recorded a new
dataset of over 1300 video clips for training and evaluation and propose a
light-weight low-latency prediction algorithm. Our best model achieves 95.9%
classification accuracy and 83% mAP detection accuracy while its processing
pipeline has a latency of only one frame.
Related papers
- Learning to Make Keypoints Sub-Pixel Accurate [80.55676599677824]
This work addresses the challenge of sub-pixel accuracy in detecting 2D local features.
We propose a novel network that enhances any detector with sub-pixel precision by learning an offset vector for detected features.
arXiv Detail & Related papers (2024-07-16T12:39:56Z) - Resource-Efficient Gesture Recognition using Low-Resolution Thermal
Camera via Spiking Neural Networks and Sparse Segmentation [1.7758299835471887]
This work proposes a novel approach for hand gesture recognition using an inexpensive, low-resolution (24 x 32) thermal sensor.
Compared to the use of standard RGB cameras, the proposed system is insensitive to lighting variations.
This paper shows that the innovative use of the recently proposed Monostable Multivibrator (MMV) neural networks as a new class of SNN achieves more than one order of magnitude smaller memory and compute complexity.
arXiv Detail & Related papers (2024-01-12T13:20:01Z) - EventTransAct: A video transformer-based framework for Event-camera
based action recognition [52.537021302246664]
Event cameras offer new opportunities compared to standard action recognition in RGB videos.
In this study, we employ a computationally efficient model, namely the video transformer network (VTN), which initially acquires spatial embeddings per event-frame.
In order to better adopt the VTN for the sparse and fine-grained nature of event data, we design Event-Contrastive Loss ($mathcalL_EC$) and event-specific augmentations.
arXiv Detail & Related papers (2023-08-25T23:51:07Z) - Agile gesture recognition for capacitive sensing devices: adapting
on-the-job [55.40855017016652]
We demonstrate a hand gesture recognition system that uses signals from capacitive sensors embedded into the etee hand controller.
The controller generates real-time signals from each of the wearer five fingers.
We use a machine learning technique to analyse the time series signals and identify three features that can represent 5 fingers within 500 ms.
arXiv Detail & Related papers (2023-05-12T17:24:02Z) - PMI Sampler: Patch Similarity Guided Frame Selection for Aerial Action
Recognition [52.78234467516168]
We introduce the concept of patch mutual information (PMI) score to quantify the motion bias between adjacent frames.
We present an adaptive frame selection strategy using shifted leaky ReLu and cumulative distribution function.
Our method achieves a relative improvement of 2.2 - 13.8% in top-1 accuracy on UAV-Human, 6.8% on NEC Drone, and 9.0% on Diving48 datasets.
arXiv Detail & Related papers (2023-04-14T00:01:11Z) - Person Detection Using an Ultra Low-resolution Thermal Imager on a
Low-cost MCU [5.684789481921423]
We propose a novel ultra-lightweight CNN-based person detector that processes thermal video from a low-cost 32x24 pixel static imager.
Our model achieves up to 91.62% accuracy (F1-score), has less than 10k parameters, and runs as fast as 87ms and 46ms on low-cost microcontrollers.
arXiv Detail & Related papers (2022-12-16T11:27:50Z) - Hand gesture recognition using 802.11ad mmWave sensor in the mobile
device [2.5476515662939563]
We explore the feasibility of AI assisted hand-gesture recognition using 802.11ad 60GHz (mmWave) technology in smartphones.
We built a prototype system, where radar sensing and communication waveform can coexist by time-division duplex (TDD)
It can gather sensing data and predict gestures within 100 milliseconds.
arXiv Detail & Related papers (2022-11-14T03:36:17Z) - ETAD: A Unified Framework for Efficient Temporal Action Detection [70.21104995731085]
Untrimmed video understanding such as temporal action detection (TAD) often suffers from the pain of huge demand for computing resources.
We build a unified framework for efficient end-to-end temporal action detection (ETAD)
ETAD achieves state-of-the-art performance on both THUMOS-14 and ActivityNet-1.3.
arXiv Detail & Related papers (2022-05-14T21:16:21Z) - Fast and Accurate Camera Scene Detection on Smartphones [51.424407411660376]
This paper proposes a novel Camera Scene Detection dataset (CamSDD) containing more than 11K manually crawled images.
We propose an efficient and NPU-friendly CNN model for this task that demonstrates a top-3 accuracy of 99.5% on this dataset.
arXiv Detail & Related papers (2021-05-17T14:06:21Z) - FrameExit: Conditional Early Exiting for Efficient Video Recognition [11.92976432364216]
We propose a conditional early exiting framework for efficient video recognition.
Our model learns to process fewer frames for simpler videos and more frames for complex ones.
Our method sets a new state of the art for efficient video understanding on the HVU benchmark.
arXiv Detail & Related papers (2021-04-27T18:01:05Z) - KutralNet: A Portable Deep Learning Model for Fire Recognition [4.886882441164088]
We propose a new deep learning architecture that requires fewer floating-point operations (flops) for fire recognition.
We also propose a portable approach for fire recognition and the use of modern techniques to reduce the model's computational cost.
One of our models presents 71% fewer parameters than FireNet, while still presenting competitive accuracy and AUROC performance.
arXiv Detail & Related papers (2020-08-16T09:35:25Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.