Pose2Room: Understanding 3D Scenes from Human Activities
- URL: http://arxiv.org/abs/2112.03030v1
- Date: Wed, 1 Dec 2021 20:54:36 GMT
- Title: Pose2Room: Understanding 3D Scenes from Human Activities
- Authors: Yinyu Nie, Angela Dai, Xiaoguang Han, Matthias Nie{\ss}ner
- Abstract summary: With wearable IMU sensors, one can estimate human poses from wearable devices without requiring visual input.
We show that P2R-Net can effectively learn multi-modal distributions of likely objects for human motions.
- Score: 35.702234343672565
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: With wearable IMU sensors, one can estimate human poses from wearable devices
without requiring visual input \cite{von2017sparse}. In this work, we pose the
question: Can we reason about object structure in real-world environments
solely from human trajectory information? Crucially, we observe that human
motion and interactions tend to give strong information about the objects in a
scene -- for instance a person sitting indicates the likely presence of a chair
or sofa. To this end, we propose P2R-Net to learn a probabilistic 3D model of
the objects in a scene characterized by their class categories and oriented 3D
bounding boxes, based on an input observed human trajectory in the environment.
P2R-Net models the probability distribution of object class as well as a deep
Gaussian mixture model for object boxes, enabling sampling of multiple,
diverse, likely modes of object configurations from an observed human
trajectory. In our experiments we demonstrate that P2R-Net can effectively
learn multi-modal distributions of likely objects for human motions, and
produce a variety of plausible object structures of the environment, even
without any visual information.
Related papers
- Beyond the Contact: Discovering Comprehensive Affordance for 3D Objects from Pre-trained 2D Diffusion Models [8.933560282929726]
We introduce a novel affordance representation, named Comprehensive Affordance (ComA)
Given a 3D object mesh, ComA models the distribution of relative orientation and proximity of vertices in interacting human meshes.
We demonstrate that ComA outperforms competitors that rely on human annotations in modeling contact-based affordance.
arXiv Detail & Related papers (2024-01-23T18:59:59Z) - Object pop-up: Can we infer 3D objects and their poses from human
interactions alone? [36.68984504569907]
We show that a generic 3D human point cloud is enough to pop up an unobserved object, even when the user is just imitating a functionality.
We validate our method qualitatively and quantitatively, with synthetic data and sequences acquired for the task, showing applicability for XR/VR.
arXiv Detail & Related papers (2023-06-01T15:08:15Z) - Reconstructing Action-Conditioned Human-Object Interactions Using
Commonsense Knowledge Priors [42.17542596399014]
We present a method for inferring diverse 3D models of human-object interactions from images.
Our method extracts high-level commonsense knowledge from large language models.
We quantitatively evaluate the inferred 3D models on a large human-object interaction dataset.
arXiv Detail & Related papers (2022-09-06T13:32:55Z) - Learn to Predict How Humans Manipulate Large-sized Objects from
Interactive Motions [82.90906153293585]
We propose a graph neural network, HO-GCN, to fuse motion data and dynamic descriptors for the prediction task.
We show the proposed network that consumes dynamic descriptors can achieve state-of-the-art prediction results and help the network better generalize to unseen objects.
arXiv Detail & Related papers (2022-06-25T09:55:39Z) - BEHAVE: Dataset and Method for Tracking Human Object Interactions [105.77368488612704]
We present the first full body human- object interaction dataset with multi-view RGBD frames and corresponding 3D SMPL and object fits along with the annotated contacts between them.
We use this data to learn a model that can jointly track humans and objects in natural environments with an easy-to-use portable multi-camera setup.
arXiv Detail & Related papers (2022-04-14T13:21:19Z) - Learning Continuous Environment Fields via Implicit Functions [144.4913852552954]
We propose a novel scene representation that encodes reaching distance -- the distance between any position in the scene to a goal along a feasible trajectory.
We demonstrate that this environment field representation can directly guide the dynamic behaviors of agents in 2D mazes or 3D indoor scenes.
arXiv Detail & Related papers (2021-11-27T22:36:58Z) - 3D Neural Scene Representations for Visuomotor Control [78.79583457239836]
We learn models for dynamic 3D scenes purely from 2D visual observations.
A dynamics model, constructed over the learned representation space, enables visuomotor control for challenging manipulation tasks.
arXiv Detail & Related papers (2021-07-08T17:49:37Z) - TRiPOD: Human Trajectory and Pose Dynamics Forecasting in the Wild [77.59069361196404]
TRiPOD is a novel method for predicting body dynamics based on graph attentional networks.
To incorporate a real-world challenge, we learn an indicator representing whether an estimated body joint is visible/invisible at each frame.
Our evaluation shows that TRiPOD outperforms all prior work and state-of-the-art specifically designed for each of the trajectory and pose forecasting tasks.
arXiv Detail & Related papers (2021-04-08T20:01:00Z) - Perceiving 3D Human-Object Spatial Arrangements from a Single Image in
the Wild [96.08358373137438]
We present a method that infers spatial arrangements and shapes of humans and objects in a globally consistent 3D scene.
Our method runs on datasets without any scene- or object-level 3D supervision.
arXiv Detail & Related papers (2020-07-30T17:59:50Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.