Adversarial Motion Modelling helps Semi-supervised Hand Pose Estimation
- URL: http://arxiv.org/abs/2106.05954v1
- Date: Thu, 10 Jun 2021 17:50:19 GMT
- Title: Adversarial Motion Modelling helps Semi-supervised Hand Pose Estimation
- Authors: Adrian Spurr, Pavlo Molchanov, Umar Iqbal, Jan Kautz, Otmar Hilliges
- Abstract summary: We propose to combine ideas from adversarial training and motion modelling to tap into unlabeled videos.
We show that an adversarial leads to better properties of the hand pose estimator via semi-supervised training on unlabeled video sequences.
The main advantage of our approach is that we can make use of unpaired videos and joint sequence data both of which are much easier to attain than paired training data.
- Score: 116.07661813869196
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Hand pose estimation is difficult due to different environmental conditions,
object- and self-occlusion as well as diversity in hand shape and appearance.
Exhaustively covering this wide range of factors in fully annotated datasets
has remained impractical, posing significant challenges for generalization of
supervised methods. Embracing this challenge, we propose to combine ideas from
adversarial training and motion modelling to tap into unlabeled videos. To this
end we propose what to the best of our knowledge is the first motion model for
hands and show that an adversarial formulation leads to better generalization
properties of the hand pose estimator via semi-supervised training on unlabeled
video sequences. In this setting, the pose predictor must produce a valid
sequence of hand poses, as determined by a discriminative adversary. This
adversary reasons both on the structural as well as temporal domain,
effectively exploiting the spatio-temporal structure in the task. The main
advantage of our approach is that we can make use of unpaired videos and joint
sequence data both of which are much easier to attain than paired training
data. We perform extensive evaluation, investigating essential components
needed for the proposed framework and empirically demonstrate in two
challenging settings that the proposed approach leads to significant
improvements in pose estimation accuracy. In the lowest label setting, we
attain an improvement of $40\%$ in absolute mean joint error.
Related papers
- Exploring the Impact of Hand Pose and Shadow on Hand-washing Action Recognition [0.0]
In this paper, we investigate how pose and shadow impact a classifier's performance.
We show these are heavily impacted by pose and shadow conditions.
It is intriguing to observe model accuracy drop to almost zero with bigger changes in pose.
arXiv Detail & Related papers (2024-06-19T21:49:12Z) - A comprehensive framework for occluded human pose estimation [10.92234109536279]
Occlusion presents a significant challenge in human pose estimation.
We propose DAG (Data, Attention, Graph) to address the performance degradation caused by occluded human pose estimation.
We also present the Feature-Guided Multi-Hop GCN (FGMP-GCN) to fully explore the prior knowledge of body structure and improve pose estimation results.
arXiv Detail & Related papers (2023-12-30T06:55:30Z) - STRIDE: Single-video based Temporally Continuous Occlusion Robust 3D Pose Estimation [27.854074900345314]
We propose STRIDE, a novel Test-Time Training (TTT) approach to fit a human motion prior to each video.
Our framework demonstrates flexibility by being model-agnostic, allowing us to use any off-the-shelf 3D pose estimation method for improving robustness and temporal consistency.
We validate STRIDE's efficacy through comprehensive experiments on challenging datasets like Occluded Human3.6M, Human3.6M, and OCMotion.
arXiv Detail & Related papers (2023-12-24T11:05:10Z) - Generative Hierarchical Temporal Transformer for Hand Pose and Action Modeling [67.94143911629143]
We propose a generative Transformer VAE architecture to model hand pose and action.
To faithfully model the semantic dependency and different temporal granularity of hand pose and action, we decompose the framework into two cascaded VAE blocks.
Results show that our joint modeling of recognition and prediction improves over isolated solutions.
arXiv Detail & Related papers (2023-11-29T05:28:39Z) - 3D Interacting Hand Pose Estimation by Hand De-occlusion and Removal [85.30756038989057]
Estimating 3D interacting hand pose from a single RGB image is essential for understanding human actions.
We propose to decompose the challenging interacting hand pose estimation task and estimate the pose of each hand separately.
Experiments show that the proposed method significantly outperforms previous state-of-the-art interacting hand pose estimation approaches.
arXiv Detail & Related papers (2022-07-22T13:04:06Z) - Uncertainty-Aware Adaptation for Self-Supervised 3D Human Pose
Estimation [70.32536356351706]
We introduce MRP-Net that constitutes a common deep network backbone with two output heads subscribing to two diverse configurations.
We derive suitable measures to quantify prediction uncertainty at both pose and joint level.
We present a comprehensive evaluation of the proposed approach and demonstrate state-of-the-art performance on benchmark datasets.
arXiv Detail & Related papers (2022-03-29T07:14:58Z) - Selective Spatio-Temporal Aggregation Based Pose Refinement System:
Towards Understanding Human Activities in Real-World Videos [8.571131862820833]
State-of-the-art pose estimators struggle in obtaining high-quality 2D or 3D pose data due to truncation and low-resolution in real-world un-annotated videos.
We propose a Selective Spatio-Temporal Aggregation mechanism, named SST-A, that refines and smooths the keypoint locations extracted by multiple expert pose estimators.
We demonstrate that the skeleton data refined by our Pose-Refinement system (SSTA-PRS) is effective at boosting various existing action recognition models.
arXiv Detail & Related papers (2020-11-10T19:19:51Z) - Accurate and Robust Feature Importance Estimation under Distribution
Shifts [49.58991359544005]
PRoFILE is a novel feature importance estimation method.
We show significant improvements over state-of-the-art approaches, both in terms of fidelity and robustness.
arXiv Detail & Related papers (2020-09-30T05:29:01Z) - Kinematic-Structure-Preserved Representation for Unsupervised 3D Human
Pose Estimation [58.72192168935338]
Generalizability of human pose estimation models developed using supervision on large-scale in-studio datasets remains questionable.
We propose a novel kinematic-structure-preserved unsupervised 3D pose estimation framework, which is not restrained by any paired or unpaired weak supervisions.
Our proposed model employs three consecutive differentiable transformations named as forward-kinematics, camera-projection and spatial-map transformation.
arXiv Detail & Related papers (2020-06-24T23:56:33Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.