MIME: Human-Aware 3D Scene Generation
- URL: http://arxiv.org/abs/2212.04360v1
- Date: Thu, 8 Dec 2022 15:56:17 GMT
- Title: MIME: Human-Aware 3D Scene Generation
- Authors: Hongwei Yi, Chun-Hao P. Huang, Shashank Tripathi, Lea Hering, Justus
Thies, Michael J. Black
- Abstract summary: We generate 3D indoor scenes given 3D human motion.
Human movement indicates the free-space in a room.
Human contact indicates surfaces or objects that support activities such as sitting, lying or touching.
- Score: 55.30202416702207
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Generating realistic 3D worlds occupied by moving humans has many
applications in games, architecture, and synthetic data creation. But
generating such scenes is expensive and labor intensive. Recent work generates
human poses and motions given a 3D scene. Here, we take the opposite approach
and generate 3D indoor scenes given 3D human motion. Such motions can come from
archival motion capture or from IMU sensors worn on the body, effectively
turning human movement in a "scanner" of the 3D world. Intuitively, human
movement indicates the free-space in a room and human contact indicates
surfaces or objects that support activities such as sitting, lying or touching.
We propose MIME (Mining Interaction and Movement to infer 3D Environments),
which is a generative model of indoor scenes that produces furniture layouts
that are consistent with the human movement. MIME uses an auto-regressive
transformer architecture that takes the already generated objects in the scene
as well as the human motion as input, and outputs the next plausible object. To
train MIME, we build a dataset by populating the 3D FRONT scene dataset with 3D
humans. Our experiments show that MIME produces more diverse and plausible 3D
scenes than a recent generative scene method that does not know about human
movement. Code and data will be available for research at
https://mime.is.tue.mpg.de.
Related papers
- TRACE: 5D Temporal Regression of Avatars with Dynamic Cameras in 3D
Environments [106.80978555346958]
Current methods can't reliably estimate moving humans in global coordinates.
TRACE is the first one-stage method to jointly recover and track 3D humans in global coordinates from dynamic cameras.
It achieves state-of-the-art performance on tracking and HPS benchmarks.
arXiv Detail & Related papers (2023-06-05T13:00:44Z) - CIRCLE: Capture In Rich Contextual Environments [69.97976304918149]
We propose a novel motion acquisition system in which the actor perceives and operates in a highly contextual virtual world.
We present CIRCLE, a dataset containing 10 hours of full-body reaching motion from 5 subjects across nine scenes.
We use this dataset to train a model that generates human motion conditioned on scene information.
arXiv Detail & Related papers (2023-03-31T09:18:12Z) - 3D Segmentation of Humans in Point Clouds with Synthetic Data [21.518379214837278]
We propose the task of joint 3D human semantic segmentation, instance segmentation and multi-human body-part segmentation.
We propose a framework for generating training data of synthetic humans interacting with real 3D scenes.
We also propose a novel transformer-based model, Human3D, which is the first end-to-end model for segmenting multiple human instances and their body-parts.
arXiv Detail & Related papers (2022-12-01T18:59:21Z) - HUMANISE: Language-conditioned Human Motion Generation in 3D Scenes [54.61610144668777]
We present a novel scene-and-language conditioned generative model that can produce 3D human motions in 3D scenes.
Our experiments demonstrate that our model generates diverse and semantically consistent human motions in 3D scenes.
arXiv Detail & Related papers (2022-10-18T10:14:11Z) - Human-Aware Object Placement for Visual Environment Reconstruction [63.14733166375534]
We show that human-scene interactions can be leveraged to improve the 3D reconstruction of a scene from a monocular RGB video.
Our key idea is that, as a person moves through a scene and interacts with it, we accumulate HSIs across multiple input images.
We show that our scene reconstruction can be used to refine the initial 3D human pose and shape estimation.
arXiv Detail & Related papers (2022-03-07T18:59:02Z) - Synthesizing Long-Term 3D Human Motion and Interaction in 3D Scenes [27.443701512923177]
We propose to bridge human motion synthesis and scene affordance reasoning.
We present a hierarchical generative framework to synthesize long-term 3D human motion conditioning on the 3D scene structure.
Our experiments show significant improvements over previous approaches on generating natural and physically plausible human motion in a scene.
arXiv Detail & Related papers (2020-12-10T09:09:38Z) - PLACE: Proximity Learning of Articulation and Contact in 3D Environments [70.50782687884839]
We propose a novel interaction generation method, named PLACE, which explicitly models the proximity between the human body and the 3D scene around it.
Our perceptual study shows that PLACE significantly improves the state-of-the-art method, approaching the realism of real human-scene interaction.
arXiv Detail & Related papers (2020-08-12T21:00:10Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.