Learning Dense Correspondence for NeRF-Based Face Reenactment
- URL: http://arxiv.org/abs/2312.10422v2
- Date: Tue, 19 Dec 2023 03:12:59 GMT
- Title: Learning Dense Correspondence for NeRF-Based Face Reenactment
- Authors: Songlin Yang, Wei Wang, Yushi Lan, Xiangyu Fan, Bo Peng, Lei Yang,
Jing Dong
- Abstract summary: We propose a novel framework, which adopts tri-planes as fundamental NeRF representation and decomposes face tri-planes into three components: canonical tri-planes, identity deformations, and motion.
Our framework is the first method that achieves one-shot multi-view face reenactment without a 3D parametric model prior.
- Score: 24.072019889495966
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Face reenactment is challenging due to the need to establish dense
correspondence between various face representations for motion transfer. Recent
studies have utilized Neural Radiance Field (NeRF) as fundamental
representation, which further enhanced the performance of multi-view face
reenactment in photo-realism and 3D consistency. However, establishing dense
correspondence between different face NeRFs is non-trivial, because implicit
representations lack ground-truth correspondence annotations like mesh-based 3D
parametric models (e.g., 3DMM) with index-aligned vertexes. Although aligning
3DMM space with NeRF-based face representations can realize motion control, it
is sub-optimal for their limited face-only modeling and low identity fidelity.
Therefore, we are inspired to ask: Can we learn the dense correspondence
between different NeRF-based face representations without a 3D parametric model
prior? To address this challenge, we propose a novel framework, which adopts
tri-planes as fundamental NeRF representation and decomposes face tri-planes
into three components: canonical tri-planes, identity deformations, and motion.
In terms of motion control, our key contribution is proposing a Plane
Dictionary (PlaneDict) module, which efficiently maps the motion conditions to
a linear weighted addition of learnable orthogonal plane bases. To the best of
our knowledge, our framework is the first method that achieves one-shot
multi-view face reenactment without a 3D parametric model prior. Extensive
experiments demonstrate that we produce better results in fine-grained motion
control and identity preservation than previous methods.
Related papers
- A Hierarchical Representation Network for Accurate and Detailed Face
Reconstruction from In-The-Wild Images [15.40230841242637]
We present a novel hierarchical representation network (HRN) to achieve accurate and detailed face reconstruction from a single image.
Our framework can be extended to a multi-view fashion by considering detail consistency of different views.
Our method outperforms the existing methods in both reconstruction accuracy and visual effects.
arXiv Detail & Related papers (2023-02-28T09:24:36Z) - CGOF++: Controllable 3D Face Synthesis with Conditional Generative
Occupancy Fields [52.14985242487535]
We propose a new conditional 3D face synthesis framework, which enables 3D controllability over generated face images.
At its core is a conditional Generative Occupancy Field (cGOF++) that effectively enforces the shape of the generated face to conform to a given 3D Morphable Model (3DMM) mesh.
Experiments validate the effectiveness of the proposed method and show more precise 3D controllability than state-of-the-art 2D-based controllable face synthesis methods.
arXiv Detail & Related papers (2022-11-23T19:02:50Z) - Semantic-aware One-shot Face Re-enactment with Dense Correspondence
Estimation [100.60938767993088]
One-shot face re-enactment is a challenging task due to the identity mismatch between source and driving faces.
This paper proposes to use 3D Morphable Model (3DMM) for explicit facial semantic decomposition and identity disentanglement.
arXiv Detail & Related papers (2022-11-23T03:02:34Z) - Shape, Pose, and Appearance from a Single Image via Bootstrapped
Radiance Field Inversion [54.151979979158085]
We introduce a principled end-to-end reconstruction framework for natural images, where accurate ground-truth poses are not available.
We leverage an unconditional 3D-aware generator, to which we apply a hybrid inversion scheme where a model produces a first guess of the solution.
Our framework can de-render an image in as few as 10 steps, enabling its use in practical scenarios.
arXiv Detail & Related papers (2022-11-21T17:42:42Z) - Next3D: Generative Neural Texture Rasterization for 3D-Aware Head
Avatars [36.4402388864691]
3D-aware generative adversarial networks (GANs) synthesize high-fidelity and multi-view-consistent facial images using only collections of single-view 2D imagery.
Recent efforts incorporate 3D Morphable Face Model (3DMM) to describe deformation in generative radiance fields either explicitly or implicitly.
We propose a novel 3D GAN framework for unsupervised learning of generative, high-quality and 3D-consistent facial avatars from unstructured 2D images.
arXiv Detail & Related papers (2022-11-21T06:40:46Z) - Beyond 3DMM: Learning to Capture High-fidelity 3D Face Shape [77.95154911528365]
3D Morphable Model (3DMM) fitting has widely benefited face analysis due to its strong 3D priori.
Previous reconstructed 3D faces suffer from degraded visual verisimilitude due to the loss of fine-grained geometry.
This paper proposes a complete solution to capture the personalized shape so that the reconstructed shape looks identical to the corresponding person.
arXiv Detail & Related papers (2022-04-09T03:46:18Z) - FaceVerse: a Fine-grained and Detail-controllable 3D Face Morphable
Model from a Hybrid Dataset [36.688730105295015]
FaceVerse is built from hybrid East Asian face datasets containing 60K fused RGB-D images and 2K high-fidelity 3D head scan models.
In the coarse module, we generate a base parametric model from large-scale RGB-D images, which is able to predict accurate rough 3D face models in different genders, ages, etc.
In the fine module, a conditional StyleGAN architecture trained with high-fidelity scan models is introduced to enrich elaborate facial geometric and texture details.
arXiv Detail & Related papers (2022-03-26T12:13:14Z) - LiP-Flow: Learning Inference-time Priors for Codec Avatars via
Normalizing Flows in Latent Space [90.74976459491303]
We introduce a prior model that is conditioned on the runtime inputs and tie this prior space to the 3D face model via a normalizing flow in the latent space.
A normalizing flow bridges the two representation spaces and transforms latent samples from one domain to another, allowing us to define a latent likelihood objective.
We show that our approach leads to an expressive and effective prior, capturing facial dynamics and subtle expressions better.
arXiv Detail & Related papers (2022-03-15T13:22:57Z) - Implicit Neural Deformation for Multi-View Face Reconstruction [43.88676778013593]
We present a new method for 3D face reconstruction from multi-view RGB images.
Unlike previous methods which are built upon 3D morphable models, our method leverages an implicit representation to encode rich geometric features.
Our experimental results on several benchmark datasets demonstrate that our approach outperforms alternative baselines and achieves superior face reconstruction results compared to state-of-the-art methods.
arXiv Detail & Related papers (2021-12-05T07:02:53Z) - Shape My Face: Registering 3D Face Scans by Surface-to-Surface
Translation [75.59415852802958]
Shape-My-Face (SMF) is a powerful encoder-decoder architecture based on an improved point cloud encoder, a novel visual attention mechanism, graph convolutional decoders with skip connections, and a specialized mouth model.
Our model provides topologically-sound meshes with minimal supervision, offers faster training time, has orders of magnitude fewer trainable parameters, is more robust to noise, and can generalize to previously unseen datasets.
arXiv Detail & Related papers (2020-12-16T20:02:36Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.