CHORE: Contact, Human and Object REconstruction from a single RGB image
- URL: http://arxiv.org/abs/2204.02445v3
- Date: Tue, 31 Oct 2023 16:39:13 GMT
- Title: CHORE: Contact, Human and Object REconstruction from a single RGB image
- Authors: Xianghui Xie, Bharat Lal Bhatnagar, Gerard Pons-Moll
- Abstract summary: CHORE is a novel method that learns to jointly reconstruct the human and the object from a single RGB image.
We compute a neural reconstruction of human and object represented implicitly with two unsigned distance fields.
Experiments show that our joint reconstruction learned with the proposed strategy significantly outperforms the SOTA.
- Score: 40.817960406002506
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Most prior works in perceiving 3D humans from images reason human in
isolation without their surroundings. However, humans are constantly
interacting with the surrounding objects, thus calling for models that can
reason about not only the human but also the object and their interaction. The
problem is extremely challenging due to heavy occlusions between humans and
objects, diverse interaction types and depth ambiguity. In this paper, we
introduce CHORE, a novel method that learns to jointly reconstruct the human
and the object from a single RGB image. CHORE takes inspiration from recent
advances in implicit surface learning and classical model-based fitting. We
compute a neural reconstruction of human and object represented implicitly with
two unsigned distance fields, a correspondence field to a parametric body and
an object pose field. This allows us to robustly fit a parametric body model
and a 3D object template, while reasoning about interactions. Furthermore,
prior pixel-aligned implicit learning methods use synthetic data and make
assumptions that are not met in the real data. We propose a elegant depth-aware
scaling that allows more efficient shape learning on real data. Experiments
show that our joint reconstruction learned with the proposed strategy
significantly outperforms the SOTA. Our code and models are available at
https://virtualhumans.mpi-inf.mpg.de/chore
Related papers
- StackFLOW: Monocular Human-Object Reconstruction by Stacked Normalizing Flow with Offset [56.71580976007712]
We propose to use the Human-Object Offset between anchors which are densely sampled from the surface of human mesh and object mesh to represent human-object spatial relation.
Based on this representation, we propose Stacked Normalizing Flow (StackFLOW) to infer the posterior distribution of human-object spatial relations from the image.
During the optimization stage, we finetune the human body pose and object 6D pose by maximizing the likelihood of samples.
arXiv Detail & Related papers (2024-07-30T04:57:21Z) - Primitive-based 3D Human-Object Interaction Modelling and Programming [59.47308081630886]
We propose a novel 3D geometric primitive-based language to encode both humans and objects.
We build a new benchmark on 3D HAOI consisting of primitives together with their images.
We believe this primitive-based 3D HAOI representation would pave the way for 3D HAOI studies.
arXiv Detail & Related papers (2023-12-17T13:16:49Z) - Template Free Reconstruction of Human-object Interaction with Procedural Interaction Generation [38.08445005326031]
We propose ProciGen to procedurally generate datasets with both, plausible interaction and diverse object variation.
We generate 1M+ human-object interaction pairs in 3D and leverage this large-scale data to train our HDM (Procedural Diffusion Model)
Our HDM is an image-conditioned diffusion model that learns both realistic interaction and highly accurate human and object shapes.
arXiv Detail & Related papers (2023-12-12T08:32:55Z) - NCHO: Unsupervised Learning for Neural 3D Composition of Humans and
Objects [28.59349134574698]
We present a framework for learning a compositional generative model of humans and objects from real-world 3D scans.
Our approach learns to decompose objects and naturally compose them back into a generative human model in an unsupervised manner.
arXiv Detail & Related papers (2023-05-23T17:59:52Z) - Full-Body Articulated Human-Object Interaction [61.01135739641217]
CHAIRS is a large-scale motion-captured f-AHOI dataset consisting of 16.2 hours of versatile interactions.
CHAIRS provides 3D meshes of both humans and articulated objects during the entire interactive process.
By learning the geometrical relationships in HOI, we devise the very first model that leverage human pose estimation.
arXiv Detail & Related papers (2022-12-20T19:50:54Z) - Reconstructing Action-Conditioned Human-Object Interactions Using
Commonsense Knowledge Priors [42.17542596399014]
We present a method for inferring diverse 3D models of human-object interactions from images.
Our method extracts high-level commonsense knowledge from large language models.
We quantitatively evaluate the inferred 3D models on a large human-object interaction dataset.
arXiv Detail & Related papers (2022-09-06T13:32:55Z) - BEHAVE: Dataset and Method for Tracking Human Object Interactions [105.77368488612704]
We present the first full body human- object interaction dataset with multi-view RGBD frames and corresponding 3D SMPL and object fits along with the annotated contacts between them.
We use this data to learn a model that can jointly track humans and objects in natural environments with an easy-to-use portable multi-camera setup.
arXiv Detail & Related papers (2022-04-14T13:21:19Z) - DemoGrasp: Few-Shot Learning for Robotic Grasping with Human
Demonstration [42.19014385637538]
We propose to teach a robot how to grasp an object with a simple and short human demonstration.
We first present a small sequence of RGB-D images displaying a human-object interaction.
This sequence is then leveraged to build associated hand and object meshes that represent the interaction.
arXiv Detail & Related papers (2021-12-06T08:17:12Z) - S3: Neural Shape, Skeleton, and Skinning Fields for 3D Human Modeling [103.65625425020129]
We represent the pedestrian's shape, pose and skinning weights as neural implicit functions that are directly learned from data.
We demonstrate the effectiveness of our approach on various datasets and show that our reconstructions outperform existing state-of-the-art methods.
arXiv Detail & Related papers (2021-01-17T02:16:56Z) - Grasping Field: Learning Implicit Representations for Human Grasps [16.841780141055505]
We propose an expressive representation for human grasp modelling that is efficient and easy to integrate with deep neural networks.
We name this 3D to 2D mapping as Grasping Field, parameterize it with a deep neural network, and learn it from data.
Our generative model is able to synthesize high-quality human grasps, given only on a 3D object point cloud.
arXiv Detail & Related papers (2020-08-10T23:08:26Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.