Grasping Field: Learning Implicit Representations for Human Grasps
- URL: http://arxiv.org/abs/2008.04451v3
- Date: Thu, 26 Nov 2020 16:07:13 GMT
- Title: Grasping Field: Learning Implicit Representations for Human Grasps
- Authors: Korrawe Karunratanakul, Jinlong Yang, Yan Zhang, Michael Black,
Krikamol Muandet, Siyu Tang
- Abstract summary: We propose an expressive representation for human grasp modelling that is efficient and easy to integrate with deep neural networks.
We name this 3D to 2D mapping as Grasping Field, parameterize it with a deep neural network, and learn it from data.
Our generative model is able to synthesize high-quality human grasps, given only on a 3D object point cloud.
- Score: 16.841780141055505
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Robotic grasping of house-hold objects has made remarkable progress in recent
years. Yet, human grasps are still difficult to synthesize realistically. There
are several key reasons: (1) the human hand has many degrees of freedom (more
than robotic manipulators); (2) the synthesized hand should conform to the
surface of the object; and (3) it should interact with the object in a
semantically and physically plausible manner. To make progress in this
direction, we draw inspiration from the recent progress on learning-based
implicit representations for 3D object reconstruction. Specifically, we propose
an expressive representation for human grasp modelling that is efficient and
easy to integrate with deep neural networks. Our insight is that every point in
a three-dimensional space can be characterized by the signed distances to the
surface of the hand and the object, respectively. Consequently, the hand, the
object, and the contact area can be represented by implicit surfaces in a
common space, in which the proximity between the hand and the object can be
modelled explicitly. We name this 3D to 2D mapping as Grasping Field,
parameterize it with a deep neural network, and learn it from data. We
demonstrate that the proposed grasping field is an effective and expressive
representation for human grasp generation. Specifically, our generative model
is able to synthesize high-quality human grasps, given only on a 3D object
point cloud. The extensive experiments demonstrate that our generative model
compares favorably with a strong baseline and approaches the level of natural
human grasps. Our method improves the physical plausibility of the hand-object
contact reconstruction and achieves comparable performance for 3D hand
reconstruction compared to state-of-the-art methods.
Related papers
- G-HOP: Generative Hand-Object Prior for Interaction Reconstruction and Grasp Synthesis [57.07638884476174]
G-HOP is a denoising diffusion based generative prior for hand-object interactions.
We represent the human hand via a skeletal distance field to obtain a representation aligned with the signed distance field for the object.
We show that this hand-object prior can then serve as generic guidance to facilitate other tasks like reconstruction from interaction clip and human grasp synthesis.
arXiv Detail & Related papers (2024-04-18T17:59:28Z) - Beyond the Contact: Discovering Comprehensive Affordance for 3D Objects from Pre-trained 2D Diffusion Models [8.933560282929726]
We introduce a novel affordance representation, named Comprehensive Affordance (ComA)
Given a 3D object mesh, ComA models the distribution of relative orientation and proximity of vertices in interacting human meshes.
We demonstrate that ComA outperforms competitors that rely on human annotations in modeling contact-based affordance.
arXiv Detail & Related papers (2024-01-23T18:59:59Z) - Primitive-based 3D Human-Object Interaction Modelling and Programming [59.47308081630886]
We propose a novel 3D geometric primitive-based language to encode both humans and objects.
We build a new benchmark on 3D HAOI consisting of primitives together with their images.
We believe this primitive-based 3D HAOI representation would pave the way for 3D HAOI studies.
arXiv Detail & Related papers (2023-12-17T13:16:49Z) - Decaf: Monocular Deformation Capture for Face and Hand Interactions [77.75726740605748]
This paper introduces the first method that allows tracking human hands interacting with human faces in 3D from single monocular RGB videos.
We model hands as articulated objects inducing non-rigid face deformations during an active interaction.
Our method relies on a new hand-face motion and interaction capture dataset with realistic face deformations acquired with a markerless multi-view camera system.
arXiv Detail & Related papers (2023-09-28T17:59:51Z) - CHORUS: Learning Canonicalized 3D Human-Object Spatial Relations from
Unbounded Synthesized Images [10.4286198282079]
We present a method for teaching machines to understand and model the underlying spatial common sense of diverse human-object interactions in 3D.
We show multiple 2D images captured from different viewpoints when humans interact with the same type of objects.
Despite its imperfection of the image quality over real images, we demonstrate that the synthesized images are sufficient to learn the 3D human-object spatial relations.
arXiv Detail & Related papers (2023-08-23T17:59:11Z) - Learning Explicit Contact for Implicit Reconstruction of Hand-held
Objects from Monocular Images [59.49985837246644]
We show how to model contacts in an explicit way to benefit the implicit reconstruction of hand-held objects.
In the first part, we propose a new subtask of directly estimating 3D hand-object contacts from a single image.
In the second part, we introduce a novel method to diffuse estimated contact states from the hand mesh surface to nearby 3D space.
arXiv Detail & Related papers (2023-05-31T17:59:26Z) - Reconstructing Action-Conditioned Human-Object Interactions Using
Commonsense Knowledge Priors [42.17542596399014]
We present a method for inferring diverse 3D models of human-object interactions from images.
Our method extracts high-level commonsense knowledge from large language models.
We quantitatively evaluate the inferred 3D models on a large human-object interaction dataset.
arXiv Detail & Related papers (2022-09-06T13:32:55Z) - CHORE: Contact, Human and Object REconstruction from a single RGB image [40.817960406002506]
CHORE is a novel method that learns to jointly reconstruct the human and the object from a single RGB image.
We compute a neural reconstruction of human and object represented implicitly with two unsigned distance fields.
Experiments show that our joint reconstruction learned with the proposed strategy significantly outperforms the SOTA.
arXiv Detail & Related papers (2022-04-05T18:38:06Z) - LatentHuman: Shape-and-Pose Disentangled Latent Representation for Human
Bodies [78.17425779503047]
We propose a novel neural implicit representation for the human body.
It is fully differentiable and optimizable with disentangled shape and pose latent spaces.
Our model can be trained and fine-tuned directly on non-watertight raw data with well-designed losses.
arXiv Detail & Related papers (2021-11-30T04:10:57Z) - PLACE: Proximity Learning of Articulation and Contact in 3D Environments [70.50782687884839]
We propose a novel interaction generation method, named PLACE, which explicitly models the proximity between the human body and the 3D scene around it.
Our perceptual study shows that PLACE significantly improves the state-of-the-art method, approaching the realism of real human-scene interaction.
arXiv Detail & Related papers (2020-08-12T21:00:10Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.