EgoHand: Ego-centric Hand Pose Estimation and Gesture Recognition with Head-mounted Millimeter-wave Radar and IMUs
- URL: http://arxiv.org/abs/2501.13805v1
- Date: Thu, 23 Jan 2025 16:25:08 GMT
- Title: EgoHand: Ego-centric Hand Pose Estimation and Gesture Recognition with Head-mounted Millimeter-wave Radar and IMUs
- Authors: Yizhe Lv, Tingting Zhang, Yunpeng Song, Han Ding, Jinsong Han, Fei Wang,
- Abstract summary: Bottom-facing VR cameras can pose a risk of exposing sensitive information, such as private body parts or personal surroundings.<n>We introduce EgoHand, a system that integrates millimeter-wave radar and IMUs for hand gesture recognition.<n>In experiments, EgoHand can detect hand gestures with 90.8% accuracy.
- Score: 15.644891766887255
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Recent advanced Virtual Reality (VR) headsets, such as the Apple Vision Pro, employ bottom-facing cameras to detect hand gestures and inputs, which offers users significant convenience in VR interactions. However, these bottom-facing cameras can sometimes be inconvenient and pose a risk of unintentionally exposing sensitive information, such as private body parts or personal surroundings. To mitigate these issues, we introduce EgoHand. This system provides an alternative solution by integrating millimeter-wave radar and IMUs for hand gesture recognition, thereby offering users an additional option for gesture interaction that enhances privacy protection. To accurately recognize hand gestures, we devise a two-stage skeleton-based gesture recognition scheme. In the first stage, a novel end-to-end Transformer architecture is employed to estimate the coordinates of hand joints. Subsequently, these estimated joint coordinates are utilized for gesture recognition. Extensive experiments involving 10 subjects show that EgoHand can detect hand gestures with 90.8% accuracy. Furthermore, EgoHand demonstrates robust performance across a variety of cross-domain tests, including different users, dominant hands, body postures, and scenes.
Related papers
- DiG-Net: Enhancing Quality of Life through Hyper-Range Dynamic Gesture Recognition in Assistive Robotics [2.625826951636656]
We introduce a novel approach designed specifically for assistive robotics, enabling dynamic gesture recognition at extended distances of up to 30 meters.<n>Our proposed Distance-aware Gesture Network (DiG-Net) effectively combines Depth-Conditioned Deformable Alignment (DADA) blocks with Spatio-Temporal Graph modules.<n>By effectively interpreting gestures from considerable distances, DiG-Net significantly enhances the usability of assistive robots in home healthcare, industrial safety, and remote assistance scenarios.
arXiv Detail & Related papers (2025-05-30T16:47:44Z) - Progressive Inertial Poser: Progressive Real-Time Kinematic Chain Estimation for 3D Full-Body Pose from Three IMU Sensors [25.67875816218477]
Full-body pose estimation from sparse tracking signals is not limited by environmental conditions or recording range.<n>Previous works either face the challenge of wearing additional sensors on the pelvis and lower-body or rely on external visual sensors to obtain global positions of key joints.<n>To improve the practicality of the technology for virtual reality applications, we estimate full-body poses using only inertial data obtained from three Inertial Measurement Unit (IMU) sensors worn on the head and wrists.
arXiv Detail & Related papers (2025-05-08T15:28:09Z) - emg2pose: A Large and Diverse Benchmark for Surface Electromyographic Hand Pose Estimation [12.566524562446467]
Reliable and always-available hand pose inference could yield new and intuitive control schemes for human-computer interactions.
Wearable wrist-based surface electromyography (sEMG) presents a promising alternative.
emg2pose is the largest publicly available dataset of high-quality hand pose labels and wrist sEMG recordings.
arXiv Detail & Related papers (2024-12-02T23:39:37Z) - EgoPressure: A Dataset for Hand Pressure and Pose Estimation in Egocentric Vision [69.1005706608681]
EgoPressure is a novel egocentric dataset that captures detailed touch contact and pressure interactions.<n>Our dataset comprises 5 hours of recorded interactions from 21 participants captured simultaneously by one head-mounted and seven stationary Kinect cameras.
arXiv Detail & Related papers (2024-09-03T18:53:32Z) - In My Perspective, In My Hands: Accurate Egocentric 2D Hand Pose and Action Recognition [1.4732811715354455]
Action recognition is essential for egocentric video understanding, allowing automatic and continuous monitoring of Activities of Daily Living (ADLs) without user effort.
Existing literature focuses on 3D hand pose input, which requires computationally intensive depth estimation networks or wearing an uncomfortable depth sensor.
We introduce two novel approaches for 2D hand pose estimation, namely EffHandNet for single-hand estimation and EffHandEgoNet, tailored for an egocentric perspective.
arXiv Detail & Related papers (2024-04-14T17:33:33Z) - Egocentric RGB+Depth Action Recognition in Industry-Like Settings [50.38638300332429]
Our work focuses on recognizing actions from egocentric RGB and Depth modalities in an industry-like environment.
Our framework is based on the 3D Video SWIN Transformer to encode both RGB and Depth modalities effectively.
Our method also secured first place at the multimodal action recognition challenge at ICIAP 2023.
arXiv Detail & Related papers (2023-09-25T08:56:22Z) - Agile gesture recognition for capacitive sensing devices: adapting
on-the-job [55.40855017016652]
We demonstrate a hand gesture recognition system that uses signals from capacitive sensors embedded into the etee hand controller.
The controller generates real-time signals from each of the wearer five fingers.
We use a machine learning technique to analyse the time series signals and identify three features that can represent 5 fingers within 500 ms.
arXiv Detail & Related papers (2023-05-12T17:24:02Z) - HOOV: Hand Out-Of-View Tracking for Proprioceptive Interaction using
Inertial Sensing [25.34222794274071]
We present HOOV, a wrist-worn sensing method that allows VR users to interact with objects outside their field of view.
Based on the signals of a single wrist-worn inertial sensor, HOOV continuously estimates the user's hand position in 3-space.
Our novel data-driven method predicts hand positions and trajectories from just the continuous estimation of hand orientation.
arXiv Detail & Related papers (2023-03-13T11:25:32Z) - Deformer: Dynamic Fusion Transformer for Robust Hand Pose Estimation [59.3035531612715]
Existing methods often struggle to generate plausible hand poses when the hand is heavily occluded or blurred.
In videos, the movements of the hand allow us to observe various parts of the hand that may be occluded or blurred in a single frame.
We propose the Deformer: a framework that implicitly reasons about the relationship between hand parts within the same image.
arXiv Detail & Related papers (2023-03-09T02:24:30Z) - HUM3DIL: Semi-supervised Multi-modal 3D Human Pose Estimation for
Autonomous Driving [95.42203932627102]
3D human pose estimation is an emerging technology, which can enable the autonomous vehicle to perceive and understand the subtle and complex behaviors of pedestrians.
Our method efficiently makes use of these complementary signals, in a semi-supervised fashion and outperforms existing methods with a large margin.
Specifically, we embed LiDAR points into pixel-aligned multi-modal features, which we pass through a sequence of Transformer refinement stages.
arXiv Detail & Related papers (2022-12-15T11:15:14Z) - mRI: Multi-modal 3D Human Pose Estimation Dataset using mmWave, RGB-D,
and Inertial Sensors [6.955796938573367]
We present mRI, a multi-modal 3D human pose estimation dataset with mmWave, RGB-D, and Inertial Sensors.
Our dataset consists of over 160k synchronized frames from 20 subjects performing rehabilitation exercises.
arXiv Detail & Related papers (2022-10-15T23:08:44Z) - AvatarPoser: Articulated Full-Body Pose Tracking from Sparse Motion
Sensing [24.053096294334694]
We present AvatarPoser, the first learning-based method that predicts full-body poses in world coordinates using only motion input from the user's head and hands.
Our method builds on a Transformer encoder to extract deep features from the input signals and decouples global motion from the learned local joint orientations.
In our evaluation, AvatarPoser achieved new state-of-the-art results in evaluations on large motion capture datasets.
arXiv Detail & Related papers (2022-07-27T20:52:39Z) - Semi-Perspective Decoupled Heatmaps for 3D Robot Pose Estimation from
Depth Maps [66.24554680709417]
Knowing the exact 3D location of workers and robots in a collaborative environment enables several real applications.
We propose a non-invasive framework based on depth devices and deep neural networks to estimate the 3D pose of robots from an external camera.
arXiv Detail & Related papers (2022-07-06T08:52:12Z) - The Gesture Authoring Space: Authoring Customised Hand Gestures for
Grasping Virtual Objects in Immersive Virtual Environments [81.5101473684021]
This work proposes a hand gesture authoring tool for object specific grab gestures allowing virtual objects to be grabbed as in the real world.
The presented solution uses template matching for gesture recognition and requires no technical knowledge to design and create custom tailored hand gestures.
The study showed that gestures created with the proposed approach are perceived by users as a more natural input modality than the others.
arXiv Detail & Related papers (2022-07-03T18:33:33Z) - A Spatio-Temporal Multilayer Perceptron for Gesture Recognition [70.34489104710366]
We propose a multilayer state-weighted perceptron for gesture recognition in the context of autonomous vehicles.
An evaluation of TCG and Drive&Act datasets is provided to showcase the promising performance of our approach.
We deploy our model to our autonomous vehicle to show its real-time capability and stable execution.
arXiv Detail & Related papers (2022-04-25T08:42:47Z) - HMD-EgoPose: Head-Mounted Display-Based Egocentric Marker-Less Tool and
Hand Pose Estimation for Augmented Surgical Guidance [0.0]
We present HMD-EgoPose, a single-shot learning-based approach to hand and object pose estimation.
We demonstrate state-of-the-art performance on a benchmark dataset for marker-less hand and surgical instrument pose tracking.
arXiv Detail & Related papers (2022-02-24T04:07:34Z) - Towards Domain-Independent and Real-Time Gesture Recognition Using
mmWave Signal [11.76969975145963]
DI-Gesture is a domain-independent and real-time mmWave gesture recognition system.
In real-time scenario, the accuracy of DI-Gesutre reaches over 97% with average inference time of 2.87ms.
arXiv Detail & Related papers (2021-11-11T13:28:28Z) - Learning to Disambiguate Strongly Interacting Hands via Probabilistic
Per-pixel Part Segmentation [84.28064034301445]
Self-similarity, and the resulting ambiguities in assigning pixel observations to the respective hands, is a major cause of the final 3D pose error.
We propose DIGIT, a novel method for estimating the 3D poses of two interacting hands from a single monocular image.
We experimentally show that the proposed approach achieves new state-of-the-art performance on the InterHand2.6M dataset.
arXiv Detail & Related papers (2021-07-01T13:28:02Z) - SHREC 2021: Track on Skeleton-based Hand Gesture Recognition in the Wild [62.450907796261646]
Recognition of hand gestures can be performed directly from the stream of hand skeletons estimated by software.
Despite the recent advancements in gesture and action recognition from skeletons, it is unclear how well the current state-of-the-art techniques can perform in a real-world scenario.
This paper presents the results of the SHREC 2021: Track on Skeleton-based Hand Gesture Recognition in the Wild contest.
arXiv Detail & Related papers (2021-06-21T10:57:49Z) - SensiX: A Platform for Collaborative Machine Learning on the Edge [69.1412199244903]
We present SensiX, a personal edge platform that stays between sensor data and sensing models.
We demonstrate its efficacy in developing motion and audio-based multi-device sensing systems.
Our evaluation shows that SensiX offers a 7-13% increase in overall accuracy and up to 30% increase across different environment dynamics at the expense of 3mW power overhead.
arXiv Detail & Related papers (2020-12-04T23:06:56Z) - Physics-Based Dexterous Manipulations with Estimated Hand Poses and
Residual Reinforcement Learning [52.37106940303246]
We learn a model that maps noisy input hand poses to target virtual poses.
The agent is trained in a residual setting by using a model-free hybrid RL+IL approach.
We test our framework in two applications that use hand pose estimates for dexterous manipulations: hand-object interactions in VR and hand-object motion reconstruction in-the-wild.
arXiv Detail & Related papers (2020-08-07T17:34:28Z) - A Deep Learning Framework for Recognizing both Static and Dynamic
Gestures [0.8602553195689513]
We propose a unified framework that recognizes both static and dynamic gestures, using simple RGB vision (without depth sensing)
We employ a pose-driven spatial attention strategy, which guides our proposed Static and Dynamic gestures Network - StaDNet.
In a number of experiments, we show that the proposed approach surpasses the state-of-the-art results on the large-scale Chalearn 2016 dataset.
arXiv Detail & Related papers (2020-06-11T10:39:02Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.