VOODOO 3D: Volumetric Portrait Disentanglement for One-Shot 3D Head
Reenactment
- URL: http://arxiv.org/abs/2312.04651v1
- Date: Thu, 7 Dec 2023 19:19:57 GMT
- Title: VOODOO 3D: Volumetric Portrait Disentanglement for One-Shot 3D Head
Reenactment
- Authors: Phong Tran, Egor Zakharov, Long-Nhat Ho, Anh Tuan Tran, Liwen Hu, Hao
Li
- Abstract summary: We present a 3D-aware one-shot head reenactment method based on a fully neural disentanglement framework for source appearance and driver expressions.
Our method is real-time and produces high-fidelity and view-consistent output, suitable for 3D teleconferencing systems based on holographic displays.
- Score: 17.372274738231443
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We present a 3D-aware one-shot head reenactment method based on a fully
volumetric neural disentanglement framework for source appearance and driver
expressions. Our method is real-time and produces high-fidelity and
view-consistent output, suitable for 3D teleconferencing systems based on
holographic displays. Existing cutting-edge 3D-aware reenactment methods often
use neural radiance fields or 3D meshes to produce view-consistent appearance
encoding, but, at the same time, they rely on linear face models, such as 3DMM,
to achieve its disentanglement with facial expressions. As a result, their
reenactment results often exhibit identity leakage from the driver or have
unnatural expressions. To address these problems, we propose a neural
self-supervised disentanglement approach that lifts both the source image and
driver video frame into a shared 3D volumetric representation based on
tri-planes. This representation can then be freely manipulated with expression
tri-planes extracted from the driving images and rendered from an arbitrary
view using neural radiance fields. We achieve this disentanglement via
self-supervised learning on a large in-the-wild video dataset. We further
introduce a highly effective fine-tuning approach to improve the
generalizability of the 3D lifting using the same real-world data. We
demonstrate state-of-the-art performance on a wide range of datasets, and also
showcase high-quality 3D-aware head reenactment on highly challenging and
diverse subjects, including non-frontal head poses and complex expressions for
both source and driver.
Related papers
- VOODOO XP: Expressive One-Shot Head Reenactment for VR Telepresence [14.010324388059866]
VOODOO XP is a 3D-aware one-shot head reenactment method that can generate highly expressive facial expressions from any input driver video and a single 2D portrait.
We show our solution on a monocular video setting and an end-to-end VR telepresence system for two-way communication.
arXiv Detail & Related papers (2024-05-25T12:33:40Z) - NeRF-MAE: Masked AutoEncoders for Self-Supervised 3D Representation Learning for Neural Radiance Fields [57.617972778377215]
We show how to generate effective 3D representations from posed RGB images.
We pretrain this representation at scale on our proposed curated posed-RGB data, totaling over 1.8 million images.
Our novel self-supervised pretraining for NeRFs, NeRF-MAE, scales remarkably well and improves performance on various challenging 3D tasks.
arXiv Detail & Related papers (2024-04-01T17:59:55Z) - 3D-SceneDreamer: Text-Driven 3D-Consistent Scene Generation [51.64796781728106]
We propose a generative refinement network to synthesize new contents with higher quality by exploiting the natural image prior to 2D diffusion model and the global 3D information of the current scene.
Our approach supports wide variety of scene generation and arbitrary camera trajectories with improved visual quality and 3D consistency.
arXiv Detail & Related papers (2024-03-14T14:31:22Z) - Denoising Diffusion via Image-Based Rendering [54.20828696348574]
We introduce the first diffusion model able to perform fast, detailed reconstruction and generation of real-world 3D scenes.
First, we introduce a new neural scene representation, IB-planes, that can efficiently and accurately represent large 3D scenes.
Second, we propose a denoising-diffusion framework to learn a prior over this novel 3D scene representation, using only 2D images.
arXiv Detail & Related papers (2024-02-05T19:00:45Z) - PonderV2: Pave the Way for 3D Foundation Model with A Universal
Pre-training Paradigm [114.47216525866435]
We introduce a novel universal 3D pre-training framework designed to facilitate the acquisition of efficient 3D representation.
For the first time, PonderV2 achieves state-of-the-art performance on 11 indoor and outdoor benchmarks, implying its effectiveness.
arXiv Detail & Related papers (2023-10-12T17:59:57Z) - Learning Personalized High Quality Volumetric Head Avatars from
Monocular RGB Videos [47.94545609011594]
We propose a method to learn a high-quality implicit 3D head avatar from a monocular RGB video captured in the wild.
Our hybrid pipeline combines the geometry prior and dynamic tracking of a 3DMM with a neural radiance field to achieve fine-grained control and photorealism.
arXiv Detail & Related papers (2023-04-04T01:10:04Z) - Next3D: Generative Neural Texture Rasterization for 3D-Aware Head
Avatars [36.4402388864691]
3D-aware generative adversarial networks (GANs) synthesize high-fidelity and multi-view-consistent facial images using only collections of single-view 2D imagery.
Recent efforts incorporate 3D Morphable Face Model (3DMM) to describe deformation in generative radiance fields either explicitly or implicitly.
We propose a novel 3D GAN framework for unsupervised learning of generative, high-quality and 3D-consistent facial avatars from unstructured 2D images.
arXiv Detail & Related papers (2022-11-21T06:40:46Z) - Free-HeadGAN: Neural Talking Head Synthesis with Explicit Gaze Control [54.079327030892244]
Free-HeadGAN is a person-generic neural talking head synthesis system.
We show that modeling faces with sparse 3D facial landmarks are sufficient for achieving state-of-the-art generative performance.
arXiv Detail & Related papers (2022-08-03T16:46:08Z) - Learning Ego 3D Representation as Ray Tracing [42.400505280851114]
We present a novel end-to-end architecture for ego 3D representation learning from unconstrained camera views.
Inspired by the ray tracing principle, we design a polarized grid of "imaginary eyes" as the learnable ego 3D representation.
We show that our model outperforms all state-of-the-art alternatives significantly.
arXiv Detail & Related papers (2022-06-08T17:55:50Z) - DRaCoN -- Differentiable Rasterization Conditioned Neural Radiance
Fields for Articulated Avatars [92.37436369781692]
We present DRaCoN, a framework for learning full-body volumetric avatars.
It exploits the advantages of both the 2D and 3D neural rendering techniques.
Experiments on the challenging ZJU-MoCap and Human3.6M datasets indicate that DRaCoN outperforms state-of-the-art methods.
arXiv Detail & Related papers (2022-03-29T17:59:15Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.