Body2Hands: Learning to Infer 3D Hands from Conversational Gesture Body
Dynamics
- URL: http://arxiv.org/abs/2007.12287v3
- Date: Wed, 7 Apr 2021 15:13:12 GMT
- Title: Body2Hands: Learning to Infer 3D Hands from Conversational Gesture Body
Dynamics
- Authors: Evonne Ng, Shiry Ginosar, Trevor Darrell, Hanbyul Joo
- Abstract summary: We build upon the insight that body motion and hand gestures are strongly correlated in non-verbal communication settings.
We formulate the learning of this prior as a prediction task of 3D hand shape over time given body motion input alone.
Our hand prediction model produces convincing 3D hand gestures given only the 3D motion of the speaker's arms as input.
- Score: 87.17505994436308
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We propose a novel learned deep prior of body motion for 3D hand shape
synthesis and estimation in the domain of conversational gestures. Our model
builds upon the insight that body motion and hand gestures are strongly
correlated in non-verbal communication settings. We formulate the learning of
this prior as a prediction task of 3D hand shape over time given body motion
input alone. Trained with 3D pose estimations obtained from a large-scale
dataset of internet videos, our hand prediction model produces convincing 3D
hand gestures given only the 3D motion of the speaker's arms as input. We
demonstrate the efficacy of our method on hand gesture synthesis from body
motion input, and as a strong body prior for single-view image-based 3D hand
pose estimation. We demonstrate that our method outperforms previous
state-of-the-art approaches and can generalize beyond the monologue-based
training data to multi-person conversations. Video results are available at
http://people.eecs.berkeley.edu/~evonne_ng/projects/body2hands/.
Related papers
- HMP: Hand Motion Priors for Pose and Shape Estimation from Video [52.39020275278984]
We develop a generative motion prior specific for hands, trained on the AMASS dataset which features diverse and high-quality hand motions.
Our integration of a robust motion prior significantly enhances performance, especially in occluded scenarios.
We demonstrate our method's efficacy via qualitative and quantitative evaluations on the HO3D and DexYCB datasets.
arXiv Detail & Related papers (2023-12-27T22:35:33Z) - BodyFormer: Semantics-guided 3D Body Gesture Synthesis with Transformer [42.87095473590205]
We propose a novel framework for automatic 3D body gesture synthesis from speech.
Our system is trained with either the Trinity speech-gesture dataset or the Talking With Hands 16.2M dataset.
The results show that our system can produce more realistic, appropriate, and diverse body gestures compared to existing state-of-the-art approaches.
arXiv Detail & Related papers (2023-09-07T01:11:11Z) - GRIP: Generating Interaction Poses Using Spatial Cues and Latent Consistency [57.9920824261925]
Hands are dexterous and highly versatile manipulators that are central to how humans interact with objects and their environment.
modeling realistic hand-object interactions is critical for applications in computer graphics, computer vision, and mixed reality.
GRIP is a learning-based method that takes as input the 3D motion of the body and the object, and synthesizes realistic motion for both hands before, during, and after object interaction.
arXiv Detail & Related papers (2023-08-22T17:59:51Z) - Diverse 3D Hand Gesture Prediction from Body Dynamics by Bilateral Hand
Disentanglement [42.98335775548796]
We introduce a novel bilateral hand disentanglement based two-stage 3D hand generation method.
In the first stage, we intend to generate natural hand gestures by two hand-disentanglement branches.
The second stage is built upon the insight that 3D hand predictions should be non-deterministic.
arXiv Detail & Related papers (2023-03-03T08:08:04Z) - Generating Holistic 3D Human Motion from Speech [97.11392166257791]
We build a high-quality dataset of 3D holistic body meshes with synchronous speech.
We then define a novel speech-to-motion generation framework in which the face, body, and hands are modeled separately.
arXiv Detail & Related papers (2022-12-08T17:25:19Z) - Learning Speech-driven 3D Conversational Gestures from Video [106.15628979352738]
We propose the first approach to automatically and jointly synthesize both the synchronous 3D conversational body and hand gestures.
Our algorithm uses a CNN architecture that leverages the inherent correlation between facial expression and hand gestures.
We also contribute a new way to create a large corpus of more than 33 hours of annotated body, hand, and face data from in-the-wild videos of talking people.
arXiv Detail & Related papers (2021-02-13T01:05:39Z) - Accurate 3D Hand Pose Estimation for Whole-Body 3D Human Mesh Estimation [70.23652933572647]
Whole-body 3D human mesh estimation aims to reconstruct the 3D human body, hands, and face simultaneously.
We present Hand4Whole, which has two strong points over previous works.
Our Hand4Whole is trained in an end-to-end manner and produces much better 3D hand results than previous whole-body 3D human mesh estimation methods.
arXiv Detail & Related papers (2020-11-23T16:48:35Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.