Learning High Fidelity Depths of Dressed Humans by Watching Social Media
Dance Videos
- URL: http://arxiv.org/abs/2103.03319v1
- Date: Thu, 4 Mar 2021 20:46:30 GMT
- Title: Learning High Fidelity Depths of Dressed Humans by Watching Social Media
Dance Videos
- Authors: Yasamin Jafarian, Hyun Soo Park
- Abstract summary: We present a new method to use the local transformation that warps the predicted local geometry of the person from an image to that of another image at a different time instant.
Our method is end-to-end trainable, resulting in high fidelity depth estimation that predicts fine geometry faithful to the input real image.
- Score: 21.11427729302936
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: A key challenge of learning the geometry of dressed humans lies in the
limited availability of the ground truth data (e.g., 3D scanned models), which
results in the performance degradation of 3D human reconstruction when applying
to real-world imagery. We address this challenge by leveraging a new data
resource: a number of social media dance videos that span diverse appearance,
clothing styles, performances, and identities. Each video depicts dynamic
movements of the body and clothes of a single person while lacking the 3D
ground truth geometry. To utilize these videos, we present a new method to use
the local transformation that warps the predicted local geometry of the person
from an image to that of another image at a different time instant. This allows
self-supervision as enforcing a temporal coherence over the predictions. In
addition, we jointly learn the depth along with the surface normals that are
highly responsive to local texture, wrinkle, and shade by maximizing their
geometric consistency. Our method is end-to-end trainable, resulting in high
fidelity depth estimation that predicts fine geometry faithful to the input
real image. We demonstrate that our method outperforms the state-of-the-art
human depth estimation and human shape recovery approaches on both real and
rendered images.
Related papers
- Single Image, Any Face: Generalisable 3D Face Generation [59.9369171926757]
We propose a novel model, Gen3D-Face, which generates 3D human faces with unconstrained single image input.
To the best of our knowledge, this is the first attempt and benchmark for creating photorealistic 3D human face avatars from single images.
arXiv Detail & Related papers (2024-09-25T14:56:37Z) - MultiPly: Reconstruction of Multiple People from Monocular Video in the Wild [32.6521941706907]
We present MultiPly, a novel framework to reconstruct multiple people in 3D from monocular in-the-wild videos.
We first define a layered neural representation for the entire scene, composited by individual human and background models.
We learn the layered neural representation from videos via our layer-wise differentiable volume rendering.
arXiv Detail & Related papers (2024-06-03T17:59:57Z) - Dynamic Appearance Modeling of Clothed 3D Human Avatars using a Single
Camera [8.308263758475938]
We introduce a method for high-quality modeling of clothed 3D human avatars using a video of a person with dynamic movements.
For explicit modeling, a neural network learns to generate point-wise shape residuals and appearance features of a 3D body model.
For implicit modeling, an implicit network combines the appearance and 3D motion features to decode high-fidelity clothed 3D human avatars.
arXiv Detail & Related papers (2023-12-28T06:04:39Z) - SiTH: Single-view Textured Human Reconstruction with Image-Conditioned Diffusion [35.73448283467723]
SiTH is a novel pipeline that integrates an image-conditioned diffusion model into a 3D mesh reconstruction workflow.
We employ a powerful generative diffusion model to hallucinate unseen back-view appearance based on the input images.
For the latter, we leverage skinned body meshes as guidance to recover full-body texture meshes from the input and back-view images.
arXiv Detail & Related papers (2023-11-27T14:22:07Z) - Vid2Avatar: 3D Avatar Reconstruction from Videos in the Wild via
Self-supervised Scene Decomposition [40.46674919612935]
We present Vid2Avatar, a method to learn human avatars from monocular in-the-wild videos.
Our method does not require any groundtruth supervision or priors extracted from large datasets of clothed human scans.
It solves the tasks of scene decomposition and surface reconstruction directly in 3D by modeling both the human and the background in the scene jointly.
arXiv Detail & Related papers (2023-02-22T18:59:17Z) - Neural Novel Actor: Learning a Generalized Animatable Neural
Representation for Human Actors [98.24047528960406]
We propose a new method for learning a generalized animatable neural representation from a sparse set of multi-view imagery of multiple persons.
The learned representation can be used to synthesize novel view images of an arbitrary person from a sparse set of cameras, and further animate them with the user's pose control.
arXiv Detail & Related papers (2022-08-25T07:36:46Z) - Detailed Avatar Recovery from Single Image [50.82102098057822]
This paper presents a novel framework to recover emphdetailed avatar from a single image.
We use the deep neural networks to refine the 3D shape in a Hierarchical Mesh Deformation framework.
Our method can restore detailed human body shapes with complete textures beyond skinned models.
arXiv Detail & Related papers (2021-08-06T03:51:26Z) - Animatable Neural Radiance Fields from Monocular RGB Video [72.6101766407013]
We present animatable neural radiance fields for detailed human avatar creation from monocular videos.
Our approach extends neural radiance fields to the dynamic scenes with human movements via introducing explicit pose-guided deformation.
In experiments we show that the proposed approach achieves 1) implicit human geometry and appearance reconstruction with high-quality details, 2) photo-realistic rendering of the human from arbitrary views, and 3) animation of the human with arbitrary poses.
arXiv Detail & Related papers (2021-06-25T13:32:23Z) - Deep3DPose: Realtime Reconstruction of Arbitrarily Posed Human Bodies
from Single RGB Images [5.775625085664381]
We introduce an approach that accurately reconstructs 3D human poses and detailed 3D full-body geometric models from single images in realtime.
Key idea of our approach is a novel end-to-end multi-task deep learning framework that uses single images to predict five outputs simultaneously.
We show the system advances the frontier of 3D human body and pose reconstruction from single images by quantitative evaluations and comparisons with state-of-the-art methods.
arXiv Detail & Related papers (2021-06-22T04:26:11Z) - Neural Actor: Neural Free-view Synthesis of Human Actors with Pose
Control [80.79820002330457]
We propose a new method for high-quality synthesis of humans from arbitrary viewpoints and under arbitrary controllable poses.
Our method achieves better quality than the state-of-the-arts on playback as well as novel pose synthesis, and can even generalize well to new poses that starkly differ from the training poses.
arXiv Detail & Related papers (2021-06-03T17:40:48Z) - Neural Re-Rendering of Humans from a Single Image [80.53438609047896]
We propose a new method for neural re-rendering of a human under a novel user-defined pose and viewpoint.
Our algorithm represents body pose and shape as a parametric mesh which can be reconstructed from a single image.
arXiv Detail & Related papers (2021-01-11T18:53:47Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.