Reality's Canvas, Language's Brush: Crafting 3D Avatars from Monocular Video
- URL: http://arxiv.org/abs/2312.04784v2
- Date: Sun, 24 Mar 2024 13:56:31 GMT
- Title: Reality's Canvas, Language's Brush: Crafting 3D Avatars from Monocular Video
- Authors: Yuchen Rao, Eduardo Perez Pellitero, Benjamin Busam, Yiren Zhou, Jifei Song,
- Abstract summary: ReCaLaB is a pipeline that learns high-fidelity 3D human avatars from just a single RGB video.
A pose-conditioned NeRF is optimized to volumetrically represent a human subject in canonical T-pose.
An image-conditioned diffusion model thereby helps to animate appearance and pose of the 3D avatar to create video sequences with previously unseen human motion.
- Score: 14.140380599168628
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Recent advancements in 3D avatar generation excel with multi-view supervision for photorealistic models. However, monocular counterparts lag in quality despite broader applicability. We propose ReCaLaB to close this gap. ReCaLaB is a fully-differentiable pipeline that learns high-fidelity 3D human avatars from just a single RGB video. A pose-conditioned deformable NeRF is optimized to volumetrically represent a human subject in canonical T-pose. The canonical representation is then leveraged to efficiently associate neural textures using 2D-3D correspondences. This enables the separation of diffused color generation and lighting correction branches that jointly compose an RGB prediction. The design allows to control intermediate results for human pose, body shape, texture, and lighting with text prompts. An image-conditioned diffusion model thereby helps to animate appearance and pose of the 3D avatar to create video sequences with previously unseen human motion. Extensive experiments show that ReCaLaB outperforms previous monocular approaches in terms of image quality for image synthesis tasks. Moreover, natural language offers an intuitive user interface for creative manipulation of 3D human avatars.
Related papers
- Human 3Diffusion: Realistic Avatar Creation via Explicit 3D Consistent Diffusion Models [29.73743772971411]
We propose Human 3Diffusion: Realistic Avatar Creation via Explicit 3D Consistent Diffusion.
Our key insight is that 2D multi-view diffusion and 3D reconstruction models provide complementary information for each other.
Our proposed framework outperforms state-of-the-art methods and enables the creation of realistic avatars from a single RGB image.
arXiv Detail & Related papers (2024-06-12T17:57:25Z) - NECA: Neural Customizable Human Avatar [36.69012172745299]
We introduce NECA, an approach capable of learning versatile human representation from monocular or sparse-view videos.
The core of our approach is to represent humans in complementary dual spaces and predict disentangled neural fields of geometry, albedo, shadow, as well as an external lighting.
arXiv Detail & Related papers (2024-03-15T14:23:06Z) - UltrAvatar: A Realistic Animatable 3D Avatar Diffusion Model with Authenticity Guided Textures [80.047065473698]
We propose a novel 3D avatar generation approach termed UltrAvatar with enhanced fidelity of geometry, and superior quality of physically based rendering (PBR) textures without unwanted lighting.
We demonstrate the effectiveness and robustness of the proposed method, outperforming the state-of-the-art methods by a large margin in the experiments.
arXiv Detail & Related papers (2024-01-20T01:55:17Z) - Deformable 3D Gaussian Splatting for Animatable Human Avatars [50.61374254699761]
We propose a fully explicit approach to construct a digital avatar from as little as a single monocular sequence.
ParDy-Human constitutes an explicit model for realistic dynamic human avatars which requires significantly fewer training views and images.
Our avatars learning is free of additional annotations such as Splat masks and can be trained with variable backgrounds while inferring full-resolution images efficiently even on consumer hardware.
arXiv Detail & Related papers (2023-12-22T20:56:46Z) - InceptionHuman: Controllable Prompt-to-NeRF for Photorealistic 3D Human Generation [61.62346472443454]
InceptionHuman is a prompt-to-NeRF framework that allows easy control via a combination of prompts in different modalities to generate photorealistic 3D humans.
InceptionHuman achieves consistent 3D human generation within a progressively refined NeRF space.
arXiv Detail & Related papers (2023-11-27T15:49:41Z) - PointAvatar: Deformable Point-based Head Avatars from Videos [103.43941945044294]
PointAvatar is a deformable point-based representation that disentangles the source color into intrinsic albedo and normal-dependent shading.
We show that our method is able to generate animatable 3D avatars using monocular videos from multiple sources.
arXiv Detail & Related papers (2022-12-16T10:05:31Z) - DRaCoN -- Differentiable Rasterization Conditioned Neural Radiance
Fields for Articulated Avatars [92.37436369781692]
We present DRaCoN, a framework for learning full-body volumetric avatars.
It exploits the advantages of both the 2D and 3D neural rendering techniques.
Experiments on the challenging ZJU-MoCap and Human3.6M datasets indicate that DRaCoN outperforms state-of-the-art methods.
arXiv Detail & Related papers (2022-03-29T17:59:15Z) - High-fidelity Face Tracking for AR/VR via Deep Lighting Adaptation [117.32310997522394]
3D video avatars can empower virtual communications by providing compression, privacy, entertainment, and a sense of presence in AR/VR.
Existing person-specific 3D models are not robust to lighting, hence their results typically miss subtle facial behaviors and cause artifacts in the avatar.
This paper addresses previous limitations by learning a deep learning lighting model, that in combination with a high-quality 3D face tracking algorithm, provides a method for subtle and robust facial motion transfer from a regular video to a 3D photo-realistic avatar.
arXiv Detail & Related papers (2021-03-29T18:33:49Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.