Enhancing Hand Palm Motion Gesture Recognition by Eliminating Reference Frame Bias via Frame-Invariant Similarity Measures
- URL: http://arxiv.org/abs/2503.11352v1
- Date: Fri, 14 Mar 2025 12:40:43 GMT
- Title: Enhancing Hand Palm Motion Gesture Recognition by Eliminating Reference Frame Bias via Frame-Invariant Similarity Measures
- Authors: Arno Verduyn, Maxim Vochten, Joris De Schutter,
- Abstract summary: We introduce a novel dataset of recorded Hand Palm Motion (HPM) gestures.<n>The motion gestures in this dataset were specifically designed to be distinguishable without dependence on specific reference frames or directional cues.<n>Multiple invariant trajectory descriptors are benchmarked to assess how their performances generalize to this novel HPM dataset.<n>The best scoring approach is validated for online recognition by developing a real-time Proof of Concept.
- Score: 0.9421843976231371
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The ability of robots to recognize human gestures facilitates a natural and accessible human-robot collaboration. However, most work in gesture recognition remains rooted in reference frame-dependent representations. This poses a challenge when reference frames vary due to different work cell layouts, imprecise frame calibrations, or other environmental changes. This paper investigated the use of invariant trajectory descriptors for robust hand palm motion gesture recognition under reference frame changes. First, a novel dataset of recorded Hand Palm Motion (HPM) gestures is introduced. The motion gestures in this dataset were specifically designed to be distinguishable without dependence on specific reference frames or directional cues. Afterwards, multiple invariant trajectory descriptor approaches were benchmarked to assess how their performances generalize to this novel HPM dataset. After this offline benchmarking, the best scoring approach is validated for online recognition by developing a real-time Proof of Concept (PoC). In this PoC, hand palm motion gestures were used to control the real-time movement of a manipulator arm. The PoC demonstrated a high recognition reliability in real-time operation, achieving an $F_1$-score of 92.3%. This work demonstrates the effectiveness of the invariant descriptor approach as a standalone solution. Moreover, we believe that the invariant descriptor approach can also be utilized within other state-of-the-art pattern recognition and learning systems to improve their robustness against reference frame variations.
Related papers
- HandDAGT: A Denoising Adaptive Graph Transformer for 3D Hand Pose Estimation [15.606904161622017]
This paper proposes the Denoising Adaptive Graph Transformer, HandDAGT, for hand pose estimation.
It incorporates a novel attention mechanism to adaptively weigh the contribution of kinematic correspondence and local geometric features for the estimation of specific keypoints.
Experimental results show that the proposed model significantly outperforms the existing methods on four challenging hand pose benchmark datasets.
arXiv Detail & Related papers (2024-07-30T04:53:35Z) - Local Spherical Harmonics Improve Skeleton-Based Hand Action Recognition [17.62840662799232]
We propose a method specifically designed for hand action recognition which uses relative angular embeddings and local Spherical Harmonics to create novel hand representations.
The use of Spherical Harmonics creates rotation-invariant representations which make hand action recognition even more robust against inter-subject differences and viewpoint changes.
arXiv Detail & Related papers (2023-08-21T08:17:42Z) - Deformer: Dynamic Fusion Transformer for Robust Hand Pose Estimation [59.3035531612715]
Existing methods often struggle to generate plausible hand poses when the hand is heavily occluded or blurred.
In videos, the movements of the hand allow us to observe various parts of the hand that may be occluded or blurred in a single frame.
We propose the Deformer: a framework that implicitly reasons about the relationship between hand parts within the same image.
arXiv Detail & Related papers (2023-03-09T02:24:30Z) - Adaptive Local-Component-aware Graph Convolutional Network for One-shot
Skeleton-based Action Recognition [54.23513799338309]
We present an Adaptive Local-Component-aware Graph Convolutional Network for skeleton-based action recognition.
Our method provides a stronger representation than the global embedding and helps our model reach state-of-the-art.
arXiv Detail & Related papers (2022-09-21T02:33:07Z) - Hierarchical Temporal Transformer for 3D Hand Pose Estimation and Action
Recognition from Egocentric RGB Videos [50.74218823358754]
We develop a transformer-based framework to exploit temporal information for robust estimation.
We build a network hierarchy with two cascaded transformer encoders, where the first one exploits the short-term temporal cue for hand pose estimation.
Our approach achieves competitive results on two first-person hand action benchmarks, namely FPHA and H2O.
arXiv Detail & Related papers (2022-09-20T05:52:54Z) - SHREC 2021: Track on Skeleton-based Hand Gesture Recognition in the Wild [62.450907796261646]
Recognition of hand gestures can be performed directly from the stream of hand skeletons estimated by software.
Despite the recent advancements in gesture and action recognition from skeletons, it is unclear how well the current state-of-the-art techniques can perform in a real-world scenario.
This paper presents the results of the SHREC 2021: Track on Skeleton-based Hand Gesture Recognition in the Wild contest.
arXiv Detail & Related papers (2021-06-21T10:57:49Z) - Domain Adaptive Robotic Gesture Recognition with Unsupervised
Kinematic-Visual Data Alignment [60.31418655784291]
We propose a novel unsupervised domain adaptation framework which can simultaneously transfer multi-modality knowledge, i.e., both kinematic and visual data, from simulator to real robot.
It remedies the domain gap with enhanced transferable features by using temporal cues in videos, and inherent correlations in multi-modal towards recognizing gesture.
Results show that our approach recovers the performance with great improvement gains, up to 12.91% in ACC and 20.16% in F1score without using any annotations in real robot.
arXiv Detail & Related papers (2021-03-06T09:10:03Z) - Inter-class Discrepancy Alignment for Face Recognition [55.578063356210144]
We propose a unified framework calledInter-class DiscrepancyAlignment(IDA)
IDA-DAO is used to align the similarity scores considering the discrepancy between the images and its neighbors.
IDA-SSE can provide convincing inter-class neighbors by introducing virtual candidate images generated with GAN.
arXiv Detail & Related papers (2021-03-02T08:20:08Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.