MetaAvatar: Learning Animatable Clothed Human Models from Few Depth
Images
- URL: http://arxiv.org/abs/2106.11944v1
- Date: Tue, 22 Jun 2021 17:30:12 GMT
- Title: MetaAvatar: Learning Animatable Clothed Human Models from Few Depth
Images
- Authors: Shaofei Wang, Marko Mihajlovic, Qianli Ma, Andreas Geiger, Siyu Tang
- Abstract summary: To generate realistic cloth deformations from novel input poses, watertight meshes or dense full-body scans are usually needed as inputs.
We propose an approach that can quickly generate realistic clothed human avatars, represented as controllable neural SDFs, given only monocular depth images.
- Score: 60.56518548286836
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: In this paper, we aim to create generalizable and controllable neural signed
distance fields (SDFs) that represent clothed humans from monocular depth
observations. Recent advances in deep learning, especially neural implicit
representations, have enabled human shape reconstruction and controllable
avatar generation from different sensor inputs. However, to generate realistic
cloth deformations from novel input poses, watertight meshes or dense full-body
scans are usually needed as inputs. Furthermore, due to the difficulty of
effectively modeling pose-dependent cloth deformations for diverse body shapes
and cloth types, existing approaches resort to per-subject/cloth-type
optimization from scratch, which is computationally expensive. In contrast, we
propose an approach that can quickly generate realistic clothed human avatars,
represented as controllable neural SDFs, given only monocular depth images. We
achieve this by using meta-learning to learn an initialization of a
hypernetwork that predicts the parameters of neural SDFs. The hypernetwork is
conditioned on human poses and represents a clothed neural avatar that deforms
non-rigidly according to the input poses. Meanwhile, it is meta-learned to
effectively incorporate priors of diverse body shapes and cloth types and thus
can be much faster to fine-tune, compared to models trained from scratch. We
qualitatively and quantitatively show that our approach outperforms
state-of-the-art approaches that require complete meshes as inputs while our
approach requires only depth frames as inputs and runs orders of magnitudes
faster. Furthermore, we demonstrate that our meta-learned hypernetwork is very
robust, being the first to generate avatars with realistic dynamic cloth
deformations given as few as 8 monocular depth frames.
Related papers
- HR Human: Modeling Human Avatars with Triangular Mesh and High-Resolution Textures from Videos [52.23323966700072]
We present a framework for acquiring human avatars that are attached with high-resolution physically-based material textures and mesh from monocular video.
Our method introduces a novel information fusion strategy to combine the information from the monocular video and synthesize virtual multi-view images.
Experiments show that our approach outperforms previous representations in terms of high fidelity, and this explicit result supports deployment on common triangulars.
arXiv Detail & Related papers (2024-05-18T11:49:09Z) - Deformable 3D Gaussian Splatting for Animatable Human Avatars [50.61374254699761]
We propose a fully explicit approach to construct a digital avatar from as little as a single monocular sequence.
ParDy-Human constitutes an explicit model for realistic dynamic human avatars which requires significantly fewer training views and images.
Our avatars learning is free of additional annotations such as Splat masks and can be trained with variable backgrounds while inferring full-resolution images efficiently even on consumer hardware.
arXiv Detail & Related papers (2023-12-22T20:56:46Z) - Human Gaussian Splatting: Real-time Rendering of Animatable Avatars [8.719797382786464]
This work addresses the problem of real-time rendering of photorealistic human body avatars learned from multi-view videos.
We propose an animatable human model based on 3D Gaussian Splatting, that has recently emerged as a very efficient alternative to neural radiance fields.
Our method achieves 1.5 dB PSNR improvement over the state-of-the-art on THuman4 dataset while being able to render in real-time (80 fps for 512x512 resolution)
arXiv Detail & Related papers (2023-11-28T12:05:41Z) - DINAR: Diffusion Inpainting of Neural Textures for One-Shot Human
Avatars [7.777410338143783]
We present an approach for creating realistic rigged fullbody avatars from single RGB images.
Our method uses neural textures combined with the SMPL-X body model to achieve photo-realistic quality of avatars.
In the experiments, our approach achieves state-of-the-art rendering quality and good generalization to new poses and viewpoints.
arXiv Detail & Related papers (2023-03-16T15:04:10Z) - Neural Rendering of Humans in Novel View and Pose from Monocular Video [68.37767099240236]
We introduce a new method that generates photo-realistic humans under novel views and poses given a monocular video as input.
Our method significantly outperforms existing approaches under unseen poses and novel views given monocular videos as input.
arXiv Detail & Related papers (2022-04-04T03:09:20Z) - LatentHuman: Shape-and-Pose Disentangled Latent Representation for Human
Bodies [78.17425779503047]
We propose a novel neural implicit representation for the human body.
It is fully differentiable and optimizable with disentangled shape and pose latent spaces.
Our model can be trained and fine-tuned directly on non-watertight raw data with well-designed losses.
arXiv Detail & Related papers (2021-11-30T04:10:57Z) - Neural Actor: Neural Free-view Synthesis of Human Actors with Pose
Control [80.79820002330457]
We propose a new method for high-quality synthesis of humans from arbitrary viewpoints and under arbitrary controllable poses.
Our method achieves better quality than the state-of-the-arts on playback as well as novel pose synthesis, and can even generalize well to new poses that starkly differ from the training poses.
arXiv Detail & Related papers (2021-06-03T17:40:48Z) - PVA: Pixel-aligned Volumetric Avatars [34.929560973779466]
We devise a novel approach for predicting volumetric avatars of the human head given just a small number of inputs.
Our approach is trained in an end-to-end manner solely based on a photometric re-rendering loss without requiring explicit 3D supervision.
arXiv Detail & Related papers (2021-01-07T18:58:46Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.