G-HOP: Generative Hand-Object Prior for Interaction Reconstruction and Grasp Synthesis
- URL: http://arxiv.org/abs/2404.12383v1
- Date: Thu, 18 Apr 2024 17:59:28 GMT
- Title: G-HOP: Generative Hand-Object Prior for Interaction Reconstruction and Grasp Synthesis
- Authors: Yufei Ye, Abhinav Gupta, Kris Kitani, Shubham Tulsiani,
- Abstract summary: G-HOP is a denoising diffusion based generative prior for hand-object interactions.
We represent the human hand via a skeletal distance field to obtain a representation aligned with the signed distance field for the object.
We show that this hand-object prior can then serve as generic guidance to facilitate other tasks like reconstruction from interaction clip and human grasp synthesis.
- Score: 57.07638884476174
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We propose G-HOP, a denoising diffusion based generative prior for hand-object interactions that allows modeling both the 3D object and a human hand, conditioned on the object category. To learn a 3D spatial diffusion model that can capture this joint distribution, we represent the human hand via a skeletal distance field to obtain a representation aligned with the (latent) signed distance field for the object. We show that this hand-object prior can then serve as generic guidance to facilitate other tasks like reconstruction from interaction clip and human grasp synthesis. We believe that our model, trained by aggregating seven diverse real-world interaction datasets spanning across 155 categories, represents a first approach that allows jointly generating both hand and object. Our empirical evaluations demonstrate the benefit of this joint prior in video-based reconstruction and human grasp synthesis, outperforming current task-specific baselines. Project website: https://judyye.github.io/ghop-www
Related papers
- Closely Interactive Human Reconstruction with Proxemics and Physics-Guided Adaption [64.07607726562841]
Existing multi-person human reconstruction approaches mainly focus on recovering accurate poses or avoiding penetration.
In this work, we tackle the task of reconstructing closely interactive humans from a monocular video.
We propose to leverage knowledge from proxemic behavior and physics to compensate the lack of visual information.
arXiv Detail & Related papers (2024-04-17T11:55:45Z) - HOIDiffusion: Generating Realistic 3D Hand-Object Interaction Data [42.49031063635004]
We propose HOIDiffusion for generating realistic and diverse 3D hand-object interaction data.
Our model is a conditional diffusion model that takes both the 3D hand-object geometric structure and text description as inputs for image synthesis.
We adopt the generated 3D data for learning 6D object pose estimation and show its effectiveness in improving perception systems.
arXiv Detail & Related papers (2024-03-18T17:48:31Z) - Beyond the Contact: Discovering Comprehensive Affordance for 3D Objects from Pre-trained 2D Diffusion Models [8.933560282929726]
We introduce a novel affordance representation, named Comprehensive Affordance (ComA)
Given a 3D object mesh, ComA models the distribution of relative orientation and proximity of vertices in interacting human meshes.
We demonstrate that ComA outperforms competitors that rely on human annotations in modeling contact-based affordance.
arXiv Detail & Related papers (2024-01-23T18:59:59Z) - HOLD: Category-agnostic 3D Reconstruction of Interacting Hands and
Objects from Video [70.11702620562889]
HOLD -- the first category-agnostic method that reconstructs an articulated hand and object jointly from a monocular interaction video.
We develop a compositional articulated implicit model that can disentangled 3D hand and object from 2D images.
Our method does not rely on 3D hand-object annotations while outperforming fully-supervised baselines in both in-the-lab and challenging in-the-wild settings.
arXiv Detail & Related papers (2023-11-30T10:50:35Z) - ContactArt: Learning 3D Interaction Priors for Category-level Articulated Object and Hand Poses Estimation [46.815231896011284]
We propose a new dataset and a novel approach to learning hand-object interaction priors for hand and articulated object pose estimation.
We first collect a dataset using visual teleoperation, where the human operator can directly play within a physical simulator to manipulate the articulated objects.
Our system only requires an iPhone to record human hand motion, which can be easily scaled up and largely lower the costs of data and annotation collection.
arXiv Detail & Related papers (2023-05-02T17:24:08Z) - Affordance Diffusion: Synthesizing Hand-Object Interactions [81.98499943996394]
Given an RGB image of an object, we aim to hallucinate plausible images of a human hand interacting with it.
We propose a two-step generative approach: a LayoutNet that samples an articulation-agnostic hand-object-interaction layout, and a ContentNet that synthesizes images of a hand grasping the object.
arXiv Detail & Related papers (2023-03-21T17:59:10Z) - Full-Body Articulated Human-Object Interaction [61.01135739641217]
CHAIRS is a large-scale motion-captured f-AHOI dataset consisting of 16.2 hours of versatile interactions.
CHAIRS provides 3D meshes of both humans and articulated objects during the entire interactive process.
By learning the geometrical relationships in HOI, we devise the very first model that leverage human pose estimation.
arXiv Detail & Related papers (2022-12-20T19:50:54Z) - Grasping Field: Learning Implicit Representations for Human Grasps [16.841780141055505]
We propose an expressive representation for human grasp modelling that is efficient and easy to integrate with deep neural networks.
We name this 3D to 2D mapping as Grasping Field, parameterize it with a deep neural network, and learn it from data.
Our generative model is able to synthesize high-quality human grasps, given only on a 3D object point cloud.
arXiv Detail & Related papers (2020-08-10T23:08:26Z) - Joint Hand-object 3D Reconstruction from a Single Image with
Cross-branch Feature Fusion [78.98074380040838]
We propose to consider hand and object jointly in feature space and explore the reciprocity of the two branches.
We employ an auxiliary depth estimation module to augment the input RGB image with the estimated depth map.
Our approach significantly outperforms existing approaches in terms of the reconstruction accuracy of objects.
arXiv Detail & Related papers (2020-06-28T09:50:25Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.