GOAL: Generating 4D Whole-Body Motion for Hand-Object Grasping
- URL: http://arxiv.org/abs/2112.11454v1
- Date: Tue, 21 Dec 2021 18:59:34 GMT
- Title: GOAL: Generating 4D Whole-Body Motion for Hand-Object Grasping
- Authors: Omid Taheri, Vasileios Choutas, Michael J. Black, and Dimitrios
Tzionas
- Abstract summary: Existing methods focus on the major limbs of the body, ignoring the hands and head. Hands have been separately studied but the focus has been on generating realistic static grasps of objects.
We need to generate full-body motions and realistic hand grasps simultaneously.
For the first time, we address the problem of generating full-body, hand and head motions of an avatar grasping an unknown object.
- Score: 47.49549115570664
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Generating digital humans that move realistically has many applications and
is widely studied, but existing methods focus on the major limbs of the body,
ignoring the hands and head. Hands have been separately studied but the focus
has been on generating realistic static grasps of objects. To synthesize
virtual characters that interact with the world, we need to generate full-body
motions and realistic hand grasps simultaneously. Both sub-problems are
challenging on their own and, together, the state-space of poses is
significantly larger, the scales of hand and body motions differ, and the
whole-body posture and the hand grasp must agree, satisfy physical constraints,
and be plausible. Additionally, the head is involved because the avatar must
look at the object to interact with it. For the first time, we address the
problem of generating full-body, hand and head motions of an avatar grasping an
unknown object. As input, our method, called GOAL, takes a 3D object, its
position, and a starting 3D body pose and shape. GOAL outputs a sequence of
whole-body poses using two novel networks. First, GNet generates a goal
whole-body grasp with a realistic body, head, arm, and hand pose, as well as
hand-object contact. Second, MNet generates the motion between the starting and
goal pose. This is challenging, as it requires the avatar to walk towards the
object with foot-ground contact, orient the head towards it, reach out, and
grasp it with a realistic hand pose and hand-object contact. To achieve this
the networks exploit a representation that combines SMPL-X body parameters and
3D vertex offsets. We train and evaluate GOAL, both qualitatively and
quantitatively, on the GRAB dataset. Results show that GOAL generalizes well to
unseen objects, outperforming baselines. GOAL takes a step towards synthesizing
realistic full-body object grasping.
Related papers
Err
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.