NCHO: Unsupervised Learning for Neural 3D Composition of Humans and
Objects
- URL: http://arxiv.org/abs/2305.14345v2
- Date: Mon, 29 May 2023 13:51:25 GMT
- Title: NCHO: Unsupervised Learning for Neural 3D Composition of Humans and
Objects
- Authors: Taeksoo Kim, Shunsuke Saito, Hanbyul Joo
- Abstract summary: We present a framework for learning a compositional generative model of humans and objects from real-world 3D scans.
Our approach learns to decompose objects and naturally compose them back into a generative human model in an unsupervised manner.
- Score: 28.59349134574698
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Deep generative models have been recently extended to synthesizing 3D digital
humans. However, previous approaches treat clothed humans as a single chunk of
geometry without considering the compositionality of clothing and accessories.
As a result, individual items cannot be naturally composed into novel
identities, leading to limited expressiveness and controllability of generative
3D avatars. While several methods attempt to address this by leveraging
synthetic data, the interaction between humans and objects is not authentic due
to the domain gap, and manual asset creation is difficult to scale for a wide
variety of objects. In this work, we present a novel framework for learning a
compositional generative model of humans and objects (backpacks, coats,
scarves, and more) from real-world 3D scans. Our compositional model is
interaction-aware, meaning the spatial relationship between humans and objects,
and the mutual shape change by physical contact is fully incorporated. The key
challenge is that, since humans and objects are in contact, their 3D scans are
merged into a single piece. To decompose them without manual annotations, we
propose to leverage two sets of 3D scans of a single person with and without
objects. Our approach learns to decompose objects and naturally compose them
back into a generative human model in an unsupervised manner. Despite our
simple setup requiring only the capture of a single subject with objects, our
experiments demonstrate the strong generalization of our model by enabling the
natural composition of objects to diverse identities in various poses and the
composition of multiple objects, which is unseen in training data.
https://taeksuu.github.io/ncho/
Related papers
- Synthesizing Moving People with 3D Control [88.68284137105654]
We present a diffusion model-based framework for animating people from a single image for a given target 3D motion sequence.
For the first part, we learn an in-filling diffusion model to hallucinate unseen parts of a person given a single image.
Second, we develop a diffusion-based rendering pipeline, which is controlled by 3D human poses.
arXiv Detail & Related papers (2024-01-19T18:59:11Z) - Primitive-based 3D Human-Object Interaction Modelling and Programming [59.47308081630886]
We propose a novel 3D geometric primitive-based language to encode both humans and objects.
We build a new benchmark on 3D HAOI consisting of primitives together with their images.
We believe this primitive-based 3D HAOI representation would pave the way for 3D HAOI studies.
arXiv Detail & Related papers (2023-12-17T13:16:49Z) - CHORUS: Learning Canonicalized 3D Human-Object Spatial Relations from
Unbounded Synthesized Images [10.4286198282079]
We present a method for teaching machines to understand and model the underlying spatial common sense of diverse human-object interactions in 3D.
We show multiple 2D images captured from different viewpoints when humans interact with the same type of objects.
Despite its imperfection of the image quality over real images, we demonstrate that the synthesized images are sufficient to learn the 3D human-object spatial relations.
arXiv Detail & Related papers (2023-08-23T17:59:11Z) - Compositional 3D Human-Object Neural Animation [93.38239238988719]
Human-object interactions (HOIs) are crucial for human-centric scene understanding applications such as human-centric visual generation, AR/VR, and robotics.
In this paper, we address this challenge in HOI animation from a compositional perspective.
We adopt neural human-object deformation to model and render HOI dynamics based on implicit neural representations.
arXiv Detail & Related papers (2023-04-27T10:04:56Z) - Full-Body Articulated Human-Object Interaction [61.01135739641217]
CHAIRS is a large-scale motion-captured f-AHOI dataset consisting of 16.2 hours of versatile interactions.
CHAIRS provides 3D meshes of both humans and articulated objects during the entire interactive process.
By learning the geometrical relationships in HOI, we devise the very first model that leverage human pose estimation.
arXiv Detail & Related papers (2022-12-20T19:50:54Z) - Reconstructing Action-Conditioned Human-Object Interactions Using
Commonsense Knowledge Priors [42.17542596399014]
We present a method for inferring diverse 3D models of human-object interactions from images.
Our method extracts high-level commonsense knowledge from large language models.
We quantitatively evaluate the inferred 3D models on a large human-object interaction dataset.
arXiv Detail & Related papers (2022-09-06T13:32:55Z) - BEHAVE: Dataset and Method for Tracking Human Object Interactions [105.77368488612704]
We present the first full body human- object interaction dataset with multi-view RGBD frames and corresponding 3D SMPL and object fits along with the annotated contacts between them.
We use this data to learn a model that can jointly track humans and objects in natural environments with an easy-to-use portable multi-camera setup.
arXiv Detail & Related papers (2022-04-14T13:21:19Z) - CHORE: Contact, Human and Object REconstruction from a single RGB image [40.817960406002506]
CHORE is a novel method that learns to jointly reconstruct the human and the object from a single RGB image.
We compute a neural reconstruction of human and object represented implicitly with two unsigned distance fields.
Experiments show that our joint reconstruction learned with the proposed strategy significantly outperforms the SOTA.
arXiv Detail & Related papers (2022-04-05T18:38:06Z) - SAGA: Stochastic Whole-Body Grasping with Contact [60.43627793243098]
Human grasping synthesis has numerous applications including AR/VR, video games, and robotics.
In this work, our goal is to synthesize whole-body grasping motion. Given a 3D object, we aim to generate diverse and natural whole-body human motions that approach and grasp the object.
arXiv Detail & Related papers (2021-12-19T10:15:30Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.