Holistic 3D Human and Scene Mesh Estimation from Single View Images
- URL: http://arxiv.org/abs/2012.01591v2
- Date: Fri, 16 Apr 2021 17:30:41 GMT
- Title: Holistic 3D Human and Scene Mesh Estimation from Single View Images
- Authors: Zhenzhen Weng, Serena Yeung
- Abstract summary: We propose an end-to-end trainable model that perceives the 3D scene from a single RGB image.
We show that our model outperforms existing human body mesh methods and indoor scene reconstruction methods.
- Score: 5.100152971410397
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The 3D world limits the human body pose and the human body pose conveys
information about the surrounding objects. Indeed, from a single image of a
person placed in an indoor scene, we as humans are adept at resolving
ambiguities of the human pose and room layout through our knowledge of the
physical laws and prior perception of the plausible object and human poses.
However, few computer vision models fully leverage this fact. In this work, we
propose an end-to-end trainable model that perceives the 3D scene from a single
RGB image, estimates the camera pose and the room layout, and reconstructs both
human body and object meshes. By imposing a set of comprehensive and
sophisticated losses on all aspects of the estimations, we show that our model
outperforms existing human body mesh methods and indoor scene reconstruction
methods. To the best of our knowledge, this is the first model that outputs
both object and human predictions at the mesh level, and performs joint
optimization on the scene and human poses.
Related papers
- Scene-Aware 3D Multi-Human Motion Capture from a Single Camera [83.06768487435818]
We consider the problem of estimating the 3D position of multiple humans in a scene as well as their body shape and articulation from a single RGB video recorded with a static camera.
We leverage recent advances in computer vision using large-scale pre-trained models for a variety of modalities, including 2D body joints, joint angles, normalized disparity maps, and human segmentation masks.
In particular, we estimate the scene depth and unique person scale from normalized disparity predictions using the 2D body joints and joint angles.
arXiv Detail & Related papers (2023-01-12T18:01:28Z) - Embodied Scene-aware Human Pose Estimation [25.094152307452]
We propose embodied scene-aware human pose estimation.
Our method is one stage, causal, and recovers global 3D human poses in a simulated environment.
arXiv Detail & Related papers (2022-06-18T03:50:19Z) - Single-view 3D Body and Cloth Reconstruction under Complex Poses [37.86174829271747]
We extend existing implicit function-based models to deal with images of humans with arbitrary poses and self-occluded limbs.
We learn an implicit function that maps the input image to a 3D body shape with a low level of detail.
We then learn a displacement map, conditioned on the smoothed surface, which encodes the high-frequency details of the clothes and body.
arXiv Detail & Related papers (2022-05-09T07:34:06Z) - NeuMan: Neural Human Radiance Field from a Single Video [26.7471970027198]
We train two NeRF models: a human NeRF model and a scene NeRF model.
Our method is able to learn subject specific details, including cloth wrinkles and accessories, from just a 10 seconds video clip.
arXiv Detail & Related papers (2022-03-23T17:35:50Z) - Hallucinating Pose-Compatible Scenes [55.064949607528405]
We present a large-scale generative adversarial network for pose-conditioned scene generation.
We curating a massive meta-dataset containing over 19 million frames of humans in everyday environments.
We leverage our trained model for various applications: hallucinating pose-compatible scene(s) with or without humans, visualizing incompatible scenes and poses, placing a person from one generated image into another scene, and animating pose.
arXiv Detail & Related papers (2021-12-13T18:59:26Z) - Animatable Neural Radiance Fields from Monocular RGB Video [72.6101766407013]
We present animatable neural radiance fields for detailed human avatar creation from monocular videos.
Our approach extends neural radiance fields to the dynamic scenes with human movements via introducing explicit pose-guided deformation.
In experiments we show that the proposed approach achieves 1) implicit human geometry and appearance reconstruction with high-quality details, 2) photo-realistic rendering of the human from arbitrary views, and 3) animation of the human with arbitrary poses.
arXiv Detail & Related papers (2021-06-25T13:32:23Z) - Collaborative Regression of Expressive Bodies using Moderation [54.730550151409474]
Methods that estimate 3D bodies, faces, or hands have progressed significantly, yet separately.
We introduce PIXIE, which produces animatable, whole-body 3D avatars from a single image.
We label training images as male, female, or non-binary, and train PIXIE to infer "gendered" 3D body shapes with a novel shape loss.
arXiv Detail & Related papers (2021-05-11T18:55:59Z) - Perceiving 3D Human-Object Spatial Arrangements from a Single Image in
the Wild [96.08358373137438]
We present a method that infers spatial arrangements and shapes of humans and objects in a globally consistent 3D scene.
Our method runs on datasets without any scene- or object-level 3D supervision.
arXiv Detail & Related papers (2020-07-30T17:59:50Z) - Long-term Human Motion Prediction with Scene Context [60.096118270451974]
We propose a novel three-stage framework for predicting human motion.
Our method first samples multiple human motion goals, then plans 3D human paths towards each goal, and finally predicts 3D human pose sequences following each path.
arXiv Detail & Related papers (2020-07-07T17:59:53Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.