Populating 3D Scenes by Learning Human-Scene Interaction
- URL: http://arxiv.org/abs/2012.11581v2
- Date: Mon, 5 Apr 2021 15:26:07 GMT
- Title: Populating 3D Scenes by Learning Human-Scene Interaction
- Authors: Mohamed Hassan, Partha Ghosh, Joachim Tesch, Dimitrios Tzionas,
Michael J. Black
- Abstract summary: We learn how humans interact with scenes and leverage this to enable virtual characters to do the same.
The representation of interaction is body-centric, which enables it to generalize to new scenes.
We show that POSA's learned representation of body-scene interaction supports monocular human pose estimation.
- Score: 47.42049393299
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Humans live within a 3D space and constantly interact with it to perform
tasks. Such interactions involve physical contact between surfaces that is
semantically meaningful. Our goal is to learn how humans interact with scenes
and leverage this to enable virtual characters to do the same. To that end, we
introduce a novel Human-Scene Interaction (HSI) model that encodes proximal
relationships, called POSA for "Pose with prOximitieS and contActs". The
representation of interaction is body-centric, which enables it to generalize
to new scenes. Specifically, POSA augments the SMPL-X parametric human body
model such that, for every mesh vertex, it encodes (a) the contact probability
with the scene surface and (b) the corresponding semantic scene label. We learn
POSA with a VAE conditioned on the SMPL-X vertices, and train on the PROX
dataset, which contains SMPL-X meshes of people interacting with 3D scenes, and
the corresponding scene semantics from the PROX-E dataset. We demonstrate the
value of POSA with two applications. First, we automatically place 3D scans of
people in scenes. We use a SMPL-X model fit to the scan as a proxy and then
find its most likely placement in 3D. POSA provides an effective representation
to search for "affordances" in the scene that match the likely contact
relationships for that pose. We perform a perceptual study that shows
significant improvement over the state of the art on this task. Second, we show
that POSA's learned representation of body-scene interaction supports monocular
human pose estimation that is consistent with a 3D scene, improving on the
state of the art. Our model and code are available for research purposes at
https://posa.is.tue.mpg.de.
Related papers
- GenZI: Zero-Shot 3D Human-Scene Interaction Generation [39.9039943099911]
We propose GenZI, the first zero-shot approach to generating 3D human-scene interactions.
Key to GenZI is our distillation of interaction priors from large vision-language models (VLMs), which have learned a rich semantic space of 2D human-scene compositions.
In contrast to existing learning-based approaches, GenZI circumvents the conventional need for captured 3D interaction data.
arXiv Detail & Related papers (2023-11-29T15:40:11Z) - DECO: Dense Estimation of 3D Human-Scene Contact In The Wild [54.44345845842109]
We train a novel 3D contact detector that uses both body-part-driven and scene-context-driven attention to estimate contact on the SMPL body.
We significantly outperform existing SOTA methods across all benchmarks.
We also show qualitatively that DECO generalizes well to diverse and challenging real-world human interactions in natural images.
arXiv Detail & Related papers (2023-09-26T21:21:07Z) - HUMANISE: Language-conditioned Human Motion Generation in 3D Scenes [54.61610144668777]
We present a novel scene-and-language conditioned generative model that can produce 3D human motions in 3D scenes.
Our experiments demonstrate that our model generates diverse and semantically consistent human motions in 3D scenes.
arXiv Detail & Related papers (2022-10-18T10:14:11Z) - Embodied Scene-aware Human Pose Estimation [25.094152307452]
We propose embodied scene-aware human pose estimation.
Our method is one stage, causal, and recovers global 3D human poses in a simulated environment.
arXiv Detail & Related papers (2022-06-18T03:50:19Z) - Human-Aware Object Placement for Visual Environment Reconstruction [63.14733166375534]
We show that human-scene interactions can be leveraged to improve the 3D reconstruction of a scene from a monocular RGB video.
Our key idea is that, as a person moves through a scene and interacts with it, we accumulate HSIs across multiple input images.
We show that our scene reconstruction can be used to refine the initial 3D human pose and shape estimation.
arXiv Detail & Related papers (2022-03-07T18:59:02Z) - Recognizing Scenes from Novel Viewpoints [99.90914180489456]
Humans can perceive scenes in 3D from a handful of 2D views. For AI agents, the ability to recognize a scene from any viewpoint given only a few images enables them to efficiently interact with the scene and its objects.
We propose a model which takes as input a few RGB images of a new scene and recognizes the scene from novel viewpoints by segmenting it into semantic categories.
arXiv Detail & Related papers (2021-12-02T18:59:40Z) - PLACE: Proximity Learning of Articulation and Contact in 3D Environments [70.50782687884839]
We propose a novel interaction generation method, named PLACE, which explicitly models the proximity between the human body and the 3D scene around it.
Our perceptual study shows that PLACE significantly improves the state-of-the-art method, approaching the realism of real human-scene interaction.
arXiv Detail & Related papers (2020-08-12T21:00:10Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.