Learning Dynamic Facial Radiance Fields for Few-Shot Talking Head
Synthesis
- URL: http://arxiv.org/abs/2207.11770v1
- Date: Sun, 24 Jul 2022 16:46:03 GMT
- Title: Learning Dynamic Facial Radiance Fields for Few-Shot Talking Head
Synthesis
- Authors: Shuai Shen, Wanhua Li, Zheng Zhu, Yueqi Duan, Jie Zhou, Jiwen Lu
- Abstract summary: We propose Dynamic Facial Radiance Fields (DFRF) for few-shot talking head synthesis.
DFRF conditions face radiance field on 2D appearance images to learn the face prior.
Experiments show DFRF can synthesize natural and high-quality audio-driven talking head videos for novel identities with only 40k iterations.
- Score: 90.43371339871105
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Talking head synthesis is an emerging technology with wide applications in
film dubbing, virtual avatars and online education. Recent NeRF-based methods
generate more natural talking videos, as they better capture the 3D structural
information of faces. However, a specific model needs to be trained for each
identity with a large dataset. In this paper, we propose Dynamic Facial
Radiance Fields (DFRF) for few-shot talking head synthesis, which can rapidly
generalize to an unseen identity with few training data. Different from the
existing NeRF-based methods which directly encode the 3D geometry and
appearance of a specific person into the network, our DFRF conditions face
radiance field on 2D appearance images to learn the face prior. Thus the facial
radiance field can be flexibly adjusted to the new identity with few reference
images. Additionally, for better modeling of the facial deformations, we
propose a differentiable face warping module conditioned on audio signals to
deform all reference images to the query space. Extensive experiments show that
with only tens of seconds of training clip available, our proposed DFRF can
synthesize natural and high-quality audio-driven talking head videos for novel
identities with only 40k iterations. We highly recommend readers view our
supplementary video for intuitive comparisons. Code is available in
https://sstzal.github.io/DFRF/.
Related papers
- S^3D-NeRF: Single-Shot Speech-Driven Neural Radiance Field for High Fidelity Talking Head Synthesis [14.437741528053504]
We design a Single-Shot Speech-Driven Radiance Field (S3D-NeRF) method to tackle the three difficulties: learning a representative appearance feature for each identity, modeling motion of different face regions with audio, and keeping the temporal consistency of the lip area.
Our S3D-NeRF surpasses previous arts on both video fidelity and audio-lip synchronization.
arXiv Detail & Related papers (2024-08-18T03:59:57Z) - NeRFFaceSpeech: One-shot Audio-driven 3D Talking Head Synthesis via Generative Prior [5.819784482811377]
We propose a novel method, NeRFFaceSpeech, which enables to produce high-quality 3D-aware talking head.
Our method can craft a 3D-consistent facial feature space corresponding to a single image.
We also introduce LipaintNet that can replenish the lacking information in the inner-mouth area.
arXiv Detail & Related papers (2024-05-09T13:14:06Z) - Talk3D: High-Fidelity Talking Portrait Synthesis via Personalized 3D Generative Prior [29.120669908374424]
We introduce a novel audio-driven talking head synthesis framework, called Talk3D.
It can faithfully reconstruct its plausible facial geometries by effectively adopting the pre-trained 3D-aware generative prior.
Compared to existing methods, our method excels in generating realistic facial geometries even under extreme head poses.
arXiv Detail & Related papers (2024-03-29T12:49:40Z) - Single-Shot Implicit Morphable Faces with Consistent Texture
Parameterization [91.52882218901627]
We propose a novel method for constructing implicit 3D morphable face models that are both generalizable and intuitive for editing.
Our method improves upon photo-realism, geometry, and expression accuracy compared to state-of-the-art methods.
arXiv Detail & Related papers (2023-05-04T17:58:40Z) - GeneFace: Generalized and High-Fidelity Audio-Driven 3D Talking Face
Synthesis [62.297513028116576]
GeneFace is a general and high-fidelity NeRF-based talking face generation method.
A head-aware torso-NeRF is proposed to eliminate the head-torso problem.
arXiv Detail & Related papers (2023-01-31T05:56:06Z) - 3DMM-RF: Convolutional Radiance Fields for 3D Face Modeling [111.98096975078158]
We introduce a style-based generative network that synthesizes in one pass all and only the required rendering samples of a neural radiance field.
We show that this model can accurately be fit to "in-the-wild" facial images of arbitrary pose and illumination, extract the facial characteristics, and be used to re-render the face in controllable conditions.
arXiv Detail & Related papers (2022-09-15T15:28:45Z) - Semantic-Aware Implicit Neural Audio-Driven Video Portrait Generation [61.8546794105462]
We propose Semantic-aware Speaking Portrait NeRF (SSP-NeRF), which creates delicate audio-driven portraits using one unified set of NeRF.
We first propose a Semantic-Aware Dynamic Ray Sampling module with an additional parsing branch that facilitates audio-driven volume rendering.
To enable portrait rendering in one unified neural radiance field, a Torso Deformation module is designed to stabilize the large-scale non-rigid torso motions.
arXiv Detail & Related papers (2022-01-19T18:54:41Z) - MoFaNeRF: Morphable Facial Neural Radiance Field [12.443638713719357]
MoFaNeRF is a parametric model that maps free-view images into a vector space coded facial shape, expression and appearance.
By introducing identity-specific modulation and encoder texture, our model synthesizes accurate photometric details.
Our model shows strong ability on multiple applications including image-based fitting, random generation, face rigging, face editing, and novel view.
arXiv Detail & Related papers (2021-12-04T11:25:28Z) - Audio-driven Talking Face Video Generation with Learning-based
Personalized Head Pose [67.31838207805573]
We propose a deep neural network model that takes an audio signal A of a source person and a short video V of a target person as input.
We outputs a synthesized high-quality talking face video with personalized head pose.
Our method can generate high-quality talking face videos with more distinguishing head movement effects than state-of-the-art methods.
arXiv Detail & Related papers (2020-02-24T10:02:10Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.