DriverMHG: A Multi-Modal Dataset for Dynamic Recognition of Driver Micro
Hand Gestures and a Real-Time Recognition Framework
- URL: http://arxiv.org/abs/2003.00951v2
- Date: Tue, 19 Oct 2021 13:38:48 GMT
- Title: DriverMHG: A Multi-Modal Dataset for Dynamic Recognition of Driver Micro
Hand Gestures and a Real-Time Recognition Framework
- Authors: Okan K\"op\"ukl\"u, Thomas Ledwon, Yao Rong, Neslihan Kose, Gerhard
Rigoll
- Abstract summary: Real-time recognition of dynamic micro hand gestures from video streams is challenging for in-vehicle scenarios.
We propose a lightweight convolutional neural network (CNN) based architecture which operates online efficiently with a sliding window approach.
Online recognition of gestures has been performed with 3D-MobileNetV2, which provided the best offline accuracy.
- Score: 9.128828609564522
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The use of hand gestures provides a natural alternative to cumbersome
interface devices for Human-Computer Interaction (HCI) systems. However,
real-time recognition of dynamic micro hand gestures from video streams is
challenging for in-vehicle scenarios since (i) the gestures should be performed
naturally without distracting the driver, (ii) micro hand gestures occur within
very short time intervals at spatially constrained areas, (iii) the performed
gesture should be recognized only once, and (iv) the entire architecture should
be designed lightweight as it will be deployed to an embedded system. In this
work, we propose an HCI system for dynamic recognition of driver micro hand
gestures, which can have a crucial impact in automotive sector especially for
safety related issues. For this purpose, we initially collected a dataset named
Driver Micro Hand Gestures (DriverMHG), which consists of RGB, depth and
infrared modalities. The challenges for dynamic recognition of micro hand
gestures have been addressed by proposing a lightweight convolutional neural
network (CNN) based architecture which operates online efficiently with a
sliding window approach. For the CNN model, several 3-dimensional resource
efficient networks are applied and their performances are analyzed. Online
recognition of gestures has been performed with 3D-MobileNetV2, which provided
the best offline accuracy among the applied networks with similar computational
complexities. The final architecture is deployed on a driver simulator
operating in real-time. We make DriverMHG dataset and our source code publicly
available.
Related papers
- N-DriverMotion: Driver motion learning and prediction using an event-based camera and directly trained spiking neural networks [2.3941497253612085]
This paper presents a novel system for learning and predicting driver motions and an event-based high-resolution (1280x720) dataset.
The proposed neuromorphic vision system achieves comparable accuracy, 94.04%, in recognizing driver motions with the CSNN architecture.
arXiv Detail & Related papers (2024-08-23T21:25:16Z) - Real-Time Hand Gesture Recognition: Integrating Skeleton-Based Data Fusion and Multi-Stream CNN [0.0]
Hand Gesture Recognition (HGR) enables intuitive human-computer interactions in various real-world contexts.
Existing frameworks often struggle to meet the real-time requirements essential for practical HGR applications.
This study introduces a robust, skeleton-based framework for dynamic HGR that simplifies the recognition of dynamic hand gestures into a static image task.
arXiv Detail & Related papers (2024-06-21T09:30:59Z) - G-MEMP: Gaze-Enhanced Multimodal Ego-Motion Prediction in Driving [71.9040410238973]
We focus on inferring the ego trajectory of a driver's vehicle using their gaze data.
Next, we develop G-MEMP, a novel multimodal ego-trajectory prediction network that combines GPS and video input with gaze data.
The results show that G-MEMP significantly outperforms state-of-the-art methods in both benchmarks.
arXiv Detail & Related papers (2023-12-13T23:06:30Z) - EventTransAct: A video transformer-based framework for Event-camera
based action recognition [52.537021302246664]
Event cameras offer new opportunities compared to standard action recognition in RGB videos.
In this study, we employ a computationally efficient model, namely the video transformer network (VTN), which initially acquires spatial embeddings per event-frame.
In order to better adopt the VTN for the sparse and fine-grained nature of event data, we design Event-Contrastive Loss ($mathcalL_EC$) and event-specific augmentations.
arXiv Detail & Related papers (2023-08-25T23:51:07Z) - LiDAR-BEVMTN: Real-Time LiDAR Bird's-Eye View Multi-Task Perception
Network for Autonomous Driving [7.137567622606353]
We present a real-time multi-task convolutional neural network for LiDAR-based object detection, semantics, and motion segmentation.
We propose a novel Semantic Weighting and Guidance (SWAG) module to transfer semantic features for improved object detection selectively.
We achieve state-of-the-art results for two tasks, semantic and motion segmentation, and close to state-of-the-art performance for 3D object detection.
arXiv Detail & Related papers (2023-07-17T21:22:17Z) - A Wireless-Vision Dataset for Privacy Preserving Human Activity
Recognition [53.41825941088989]
A new WiFi-based and video-based neural network (WiNN) is proposed to improve the robustness of activity recognition.
Our results show that WiVi data set satisfies the primary demand and all three branches in the proposed pipeline keep more than $80%$ of activity recognition accuracy.
arXiv Detail & Related papers (2022-05-24T10:49:11Z) - Learnable Online Graph Representations for 3D Multi-Object Tracking [156.58876381318402]
We propose a unified and learning based approach to the 3D MOT problem.
We employ a Neural Message Passing network for data association that is fully trainable.
We show the merit of the proposed approach on the publicly available nuScenes dataset by achieving state-of-the-art performance of 65.6% AMOTA and 58% fewer ID-switches.
arXiv Detail & Related papers (2021-04-23T17:59:28Z) - Achieving Real-Time LiDAR 3D Object Detection on a Mobile Device [53.323878851563414]
We propose a compiler-aware unified framework incorporating network enhancement and pruning search with the reinforcement learning techniques.
Specifically, a generator Recurrent Neural Network (RNN) is employed to provide the unified scheme for both network enhancement and pruning search automatically.
The proposed framework achieves real-time 3D object detection on mobile devices with competitive detection performance.
arXiv Detail & Related papers (2020-12-26T19:41:15Z) - DMD: A Large-Scale Multi-Modal Driver Monitoring Dataset for Attention
and Alertness Analysis [54.198237164152786]
Vision is the richest and most cost-effective technology for Driver Monitoring Systems (DMS)
The lack of sufficiently large and comprehensive datasets is currently a bottleneck for the progress of DMS development.
In this paper, we introduce the Driver Monitoring dataset (DMD), an extensive dataset which includes real and simulated driving scenarios.
arXiv Detail & Related papers (2020-08-27T12:33:54Z) - LE-HGR: A Lightweight and Efficient RGB-based Online Gesture Recognition
Network for Embedded AR Devices [8.509059894058947]
We propose a lightweight and computationally efficient HGR framework, namely LE-HGR, to enable real-time gesture recognition on embedded devices with low computing power.
We show that the proposed method is of high accuracy and robustness, which is able to reach high-end performance in a variety of complicated interaction environments.
arXiv Detail & Related papers (2020-01-16T05:23:24Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.