NL2Contact: Natural Language Guided 3D Hand-Object Contact Modeling with Diffusion Model
- URL: http://arxiv.org/abs/2407.12727v1
- Date: Wed, 17 Jul 2024 16:46:40 GMT
- Title: NL2Contact: Natural Language Guided 3D Hand-Object Contact Modeling with Diffusion Model
- Authors: Zhongqun Zhang, Hengfei Wang, Ziwei Yu, Yihua Cheng, Angela Yao, Hyung Jin Chang,
- Abstract summary: NL2Contact is a model that generates controllable contacts by leveraging staged diffusion models.
Given a language description of the hand and contact, NL2Contact generates realistic and faithful 3D hand-object contacts.
We show applications of our model to grasp pose optimization and novel human grasp generation.
- Score: 45.00669505173757
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Modeling the physical contacts between the hand and object is standard for refining inaccurate hand poses and generating novel human grasp in 3D hand-object reconstruction. However, existing methods rely on geometric constraints that cannot be specified or controlled. This paper introduces a novel task of controllable 3D hand-object contact modeling with natural language descriptions. Challenges include i) the complexity of cross-modal modeling from language to contact, and ii) a lack of descriptive text for contact patterns. To address these issues, we propose NL2Contact, a model that generates controllable contacts by leveraging staged diffusion models. Given a language description of the hand and contact, NL2Contact generates realistic and faithful 3D hand-object contacts. To train the model, we build \textit{ContactDescribe}, the first dataset with hand-centered contact descriptions. It contains multi-level and diverse descriptions generated by large language models based on carefully designed prompts (e.g., grasp action, grasp type, contact location, free finger status). We show applications of our model to grasp pose optimization and novel human grasp generation, both based on a textual contact description.
Related papers
- Pose Priors from Language Models [74.61186408764559]
We present a zero-shot pose optimization method that enforces accurate physical contact constraints.
Our method produces surprisingly compelling pose reconstructions of people in close contact.
Unlike previous approaches, our method provides a unified framework for resolving self-contact and person-to-person contact.
arXiv Detail & Related papers (2024-05-06T17:59:36Z) - Text2HOI: Text-guided 3D Motion Generation for Hand-Object Interaction [8.253265795150401]
This paper introduces the first text-guided work for generating the sequence of hand-object interaction in 3D.
For contact generation, a VAE-based network takes as input a text and an object mesh, and generates the probability of contacts between the surfaces of hands and the object.
For motion generation, a Transformer-based diffusion model utilizes this 3D contact map as a strong prior for generating physically plausible hand-object motion.
arXiv Detail & Related papers (2024-03-31T04:56:30Z) - Contact-aware Human Motion Generation from Textual Descriptions [57.871692507044344]
This paper addresses the problem of generating 3D interactive human motion from text.
We create a novel dataset named RICH-CAT, representing "Contact-Aware Texts"
We propose a novel approach named CATMO for text-driven interactive human motion synthesis.
arXiv Detail & Related papers (2024-03-23T04:08:39Z) - GraphDreamer: Compositional 3D Scene Synthesis from Scene Graphs [74.98581417902201]
We propose a novel framework to generate compositional 3D scenes from scene graphs.
By exploiting node and edge information in scene graphs, our method makes better use of the pretrained text-to-image diffusion model.
We conduct both qualitative and quantitative experiments to validate the effectiveness of GraphDreamer.
arXiv Detail & Related papers (2023-11-30T18:59:58Z) - ContactGen: Generative Contact Modeling for Grasp Generation [37.56729700157981]
This paper presents a novel object-centric contact representation ContactGen for hand-object interaction.
We propose a conditional generative model to predict ContactGen and adopt model-based optimization to predict diverse and geometrically feasible grasps.
arXiv Detail & Related papers (2023-10-05T17:59:45Z) - DECO: Dense Estimation of 3D Human-Scene Contact In The Wild [54.44345845842109]
We train a novel 3D contact detector that uses both body-part-driven and scene-context-driven attention to estimate contact on the SMPL body.
We significantly outperform existing SOTA methods across all benchmarks.
We also show qualitatively that DECO generalizes well to diverse and challenging real-world human interactions in natural images.
arXiv Detail & Related papers (2023-09-26T21:21:07Z) - Learning Explicit Contact for Implicit Reconstruction of Hand-held
Objects from Monocular Images [59.49985837246644]
We show how to model contacts in an explicit way to benefit the implicit reconstruction of hand-held objects.
In the first part, we propose a new subtask of directly estimating 3D hand-object contacts from a single image.
In the second part, we introduce a novel method to diffuse estimated contact states from the hand mesh surface to nearby 3D space.
arXiv Detail & Related papers (2023-05-31T17:59:26Z) - ContactPose: A Dataset of Grasps with Object Contact and Hand Pose [27.24450178180785]
We introduce ContactPose, the first dataset of hand-object contact paired with hand pose, object pose, and RGB-D images.
ContactPose has 2306 unique grasps of 25 household objects grasped with 2 functional intents by 50 participants, and more than 2.9 M RGB-D grasp images.
arXiv Detail & Related papers (2020-07-19T01:01:14Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.