ContactArt: Learning 3D Interaction Priors for Category-level
Articulated Object and Hand Poses Estimation
- URL: http://arxiv.org/abs/2305.01618v1
- Date: Tue, 2 May 2023 17:24:08 GMT
- Title: ContactArt: Learning 3D Interaction Priors for Category-level
Articulated Object and Hand Poses Estimation
- Authors: Zehao Zhu, Jiashun Wang, Yuzhe Qin, Deqing Sun, Varun Jampani,
Xiaolong Wang
- Abstract summary: We propose a new dataset and a novel approach to learning hand-object interaction priors for hand and articulated object pose estimation.
We first collect a dataset using visual teleoperation, where the human operator can directly play within a physical simulator to manipulate the articulated objects.
Our system only requires an iPhone to record human hand motion, which can be easily scaled up and largely lower the costs of data and annotation collection.
- Score: 34.7068170774934
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We propose a new dataset and a novel approach to learning hand-object
interaction priors for hand and articulated object pose estimation. We first
collect a dataset using visual teleoperation, where the human operator can
directly play within a physical simulator to manipulate the articulated
objects. We record the data and obtain free and accurate annotations on object
poses and contact information from the simulator. Our system only requires an
iPhone to record human hand motion, which can be easily scaled up and largely
lower the costs of data and annotation collection. With this data, we learn 3D
interaction priors including a discriminator (in a GAN) capturing the
distribution of how object parts are arranged, and a diffusion model which
generates the contact regions on articulated objects, guiding the hand pose
estimation. Such structural and contact priors can easily transfer to
real-world data with barely any domain gap. By using our data and learned
priors, our method significantly improves the performance on joint hand and
articulated object poses estimation over the existing state-of-the-art methods.
The project is available at https://zehaozhu.github.io/ContactArt/ .
Related papers
- G-HOP: Generative Hand-Object Prior for Interaction Reconstruction and Grasp Synthesis [57.07638884476174]
G-HOP is a denoising diffusion based generative prior for hand-object interactions.
We represent the human hand via a skeletal distance field to obtain a representation aligned with the signed distance field for the object.
We show that this hand-object prior can then serve as generic guidance to facilitate other tasks like reconstruction from interaction clip and human grasp synthesis.
arXiv Detail & Related papers (2024-04-18T17:59:28Z) - Text2HOI: Text-guided 3D Motion Generation for Hand-Object Interaction [8.253265795150401]
This paper introduces the first text-guided work for generating the sequence of hand-object interaction in 3D.
For contact generation, a VAE-based network takes as input a text and an object mesh, and generates the probability of contacts between the surfaces of hands and the object.
For motion generation, a Transformer-based diffusion model utilizes this 3D contact map as a strong prior for generating physically plausible hand-object motion.
arXiv Detail & Related papers (2024-03-31T04:56:30Z) - Novel-view Synthesis and Pose Estimation for Hand-Object Interaction
from Sparse Views [41.50710846018882]
We propose a neural rendering and pose estimation system for hand-object interaction from sparse views.
We first learn the shape and appearance prior knowledge of hands and objects separately with the neural representation.
During the online stage, we design a rendering-based joint model fitting framework to understand the dynamic hand-object interaction.
arXiv Detail & Related papers (2023-08-22T05:17:41Z) - Learning Explicit Contact for Implicit Reconstruction of Hand-held
Objects from Monocular Images [59.49985837246644]
We show how to model contacts in an explicit way to benefit the implicit reconstruction of hand-held objects.
In the first part, we propose a new subtask of directly estimating 3D hand-object contacts from a single image.
In the second part, we introduce a novel method to diffuse estimated contact states from the hand mesh surface to nearby 3D space.
arXiv Detail & Related papers (2023-05-31T17:59:26Z) - S$^2$Contact: Graph-based Network for 3D Hand-Object Contact Estimation
with Semi-Supervised Learning [70.72037296392642]
We propose a novel semi-supervised framework that allows us to learn contact from monocular images.
Specifically, we leverage visual and geometric consistency constraints in large-scale datasets for generating pseudo-labels.
We show benefits from using a contact map that rules hand-object interactions to produce more accurate reconstructions.
arXiv Detail & Related papers (2022-08-01T14:05:23Z) - Towards unconstrained joint hand-object reconstruction from RGB videos [81.97694449736414]
Reconstructing hand-object manipulations holds a great potential for robotics and learning from human demonstrations.
We first propose a learning-free fitting approach for hand-object reconstruction which can seamlessly handle two-hand object interactions.
arXiv Detail & Related papers (2021-08-16T12:26:34Z) - H2O: Two Hands Manipulating Objects for First Person Interaction
Recognition [70.46638409156772]
We present a comprehensive framework for egocentric interaction recognition using markerless 3D annotations of two hands manipulating objects.
Our method produces annotations of the 3D pose of two hands and the 6D pose of the manipulated objects, along with their interaction labels for each frame.
Our dataset, called H2O (2 Hands and Objects), provides synchronized multi-view RGB-D images, interaction labels, object classes, ground-truth 3D poses for left & right hands, 6D object poses, ground-truth camera poses, object meshes and scene point clouds.
arXiv Detail & Related papers (2021-04-22T17:10:42Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.