Spatial-Temporal Parallel Transformer for Arm-Hand Dynamic Estimation
- URL: http://arxiv.org/abs/2203.16202v1
- Date: Wed, 30 Mar 2022 10:51:41 GMT
- Title: Spatial-Temporal Parallel Transformer for Arm-Hand Dynamic Estimation
- Authors: Shuying Liu, Wenbin Wu, Jiaxian Wu, Yue Lin
- Abstract summary: We propose an approach to estimate arm and hand dynamics from monocular video by utilizing the relationship between arm and hand.
By integrating a 2D hand pose estimation model and a 3D human pose estimation model, the proposed method can produce plausible arm and hand dynamics from monocular video.
- Score: 7.043124227237034
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We propose an approach to estimate arm and hand dynamics from monocular video
by utilizing the relationship between arm and hand. Although monocular full
human motion capture technologies have made great progress in recent years,
recovering accurate and plausible arm twists and hand gestures from in-the-wild
videos still remains a challenge. To solve this problem, our solution is
proposed based on the fact that arm poses and hand gestures are highly
correlated in most real situations. To fully exploit arm-hand correlation as
well as inter-frame information, we carefully design a Spatial-Temporal
Parallel Arm-Hand Motion Transformer (PAHMT) to predict the arm and hand
dynamics simultaneously. We also introduce new losses to encourage the
estimations to be smooth and accurate. Besides, we collect a motion capture
dataset including 200K frames of hand gestures and use this data to train our
model. By integrating a 2D hand pose estimation model and a 3D human pose
estimation model, the proposed method can produce plausible arm and hand
dynamics from monocular video. Extensive evaluations demonstrate that the
proposed method has advantages over previous state-of-the-art approaches and
shows robustness under various challenging scenarios.
Related papers
- Posture-Informed Muscular Force Learning for Robust Hand Pressure Estimation [6.912016522494431]
We present PiMForce, a novel framework that enhances hand pressure estimation.
Our approach utilizes detailed spatial information from 3D hand poses in conjunction with dynamic muscle activity from sEMG.
Our framework enables precise hand pressure estimation in complex and natural interaction scenarios.
arXiv Detail & Related papers (2024-10-31T04:42:43Z) - HMP: Hand Motion Priors for Pose and Shape Estimation from Video [52.39020275278984]
We develop a generative motion prior specific for hands, trained on the AMASS dataset which features diverse and high-quality hand motions.
Our integration of a robust motion prior significantly enhances performance, especially in occluded scenarios.
We demonstrate our method's efficacy via qualitative and quantitative evaluations on the HO3D and DexYCB datasets.
arXiv Detail & Related papers (2023-12-27T22:35:33Z) - Deformer: Dynamic Fusion Transformer for Robust Hand Pose Estimation [59.3035531612715]
Existing methods often struggle to generate plausible hand poses when the hand is heavily occluded or blurred.
In videos, the movements of the hand allow us to observe various parts of the hand that may be occluded or blurred in a single frame.
We propose the Deformer: a framework that implicitly reasons about the relationship between hand parts within the same image.
arXiv Detail & Related papers (2023-03-09T02:24:30Z) - 3D Interacting Hand Pose Estimation by Hand De-occlusion and Removal [85.30756038989057]
Estimating 3D interacting hand pose from a single RGB image is essential for understanding human actions.
We propose to decompose the challenging interacting hand pose estimation task and estimate the pose of each hand separately.
Experiments show that the proposed method significantly outperforms previous state-of-the-art interacting hand pose estimation approaches.
arXiv Detail & Related papers (2022-07-22T13:04:06Z) - Adversarial Motion Modelling helps Semi-supervised Hand Pose Estimation [116.07661813869196]
We propose to combine ideas from adversarial training and motion modelling to tap into unlabeled videos.
We show that an adversarial leads to better properties of the hand pose estimator via semi-supervised training on unlabeled video sequences.
The main advantage of our approach is that we can make use of unpaired videos and joint sequence data both of which are much easier to attain than paired training data.
arXiv Detail & Related papers (2021-06-10T17:50:19Z) - Body2Hands: Learning to Infer 3D Hands from Conversational Gesture Body
Dynamics [87.17505994436308]
We build upon the insight that body motion and hand gestures are strongly correlated in non-verbal communication settings.
We formulate the learning of this prior as a prediction task of 3D hand shape over time given body motion input alone.
Our hand prediction model produces convincing 3D hand gestures given only the 3D motion of the speaker's arms as input.
arXiv Detail & Related papers (2020-07-23T22:58:15Z) - SeqHAND:RGB-Sequence-Based 3D Hand Pose and Shape Estimation [48.456638103309544]
3D hand pose estimation based on RGB images has been studied for a long time.
We propose a novel method that generates a synthetic dataset that mimics natural human hand movements.
We show that utilizing temporal information for 3D hand pose estimation significantly enhances general pose estimations.
arXiv Detail & Related papers (2020-07-10T05:11:14Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.