NeRF-AD: Neural Radiance Field with Attention-based Disentanglement for
Talking Face Synthesis
- URL: http://arxiv.org/abs/2401.12568v1
- Date: Tue, 23 Jan 2024 08:54:10 GMT
- Title: NeRF-AD: Neural Radiance Field with Attention-based Disentanglement for
Talking Face Synthesis
- Authors: Chongke Bi, Xiaoxing Liu, Zhilei Liu
- Abstract summary: Talking face synthesis driven by audio is one of the current research hotspots in the fields of multidimensional signal processing and multimedia.
NeRF has recently been brought to this research field in order to enhance the realism and 3D effect of the generated faces.
This paper proposes a talking face synthesis method via NeRF with attention-based disentanglement (NeRF-AD)
- Score: 2.5387791616637587
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Talking face synthesis driven by audio is one of the current research
hotspots in the fields of multidimensional signal processing and multimedia.
Neural Radiance Field (NeRF) has recently been brought to this research field
in order to enhance the realism and 3D effect of the generated faces. However,
most existing NeRF-based methods either burden NeRF with complex learning tasks
while lacking methods for supervised multimodal feature fusion, or cannot
precisely map audio to the facial region related to speech movements. These
reasons ultimately result in existing methods generating inaccurate lip shapes.
This paper moves a portion of NeRF learning tasks ahead and proposes a talking
face synthesis method via NeRF with attention-based disentanglement (NeRF-AD).
In particular, an Attention-based Disentanglement module is introduced to
disentangle the face into Audio-face and Identity-face using speech-related
facial action unit (AU) information. To precisely regulate how audio affects
the talking face, we only fuse the Audio-face with audio feature. In addition,
AU information is also utilized to supervise the fusion of these two
modalities. Extensive qualitative and quantitative experiments demonstrate that
our NeRF-AD outperforms state-of-the-art methods in generating realistic
talking face videos, including image quality and lip synchronization. To view
video results, please refer to https://xiaoxingliu02.github.io/NeRF-AD.
Related papers
- S^3D-NeRF: Single-Shot Speech-Driven Neural Radiance Field for High Fidelity Talking Head Synthesis [14.437741528053504]
We design a Single-Shot Speech-Driven Radiance Field (S3D-NeRF) method to tackle the three difficulties: learning a representative appearance feature for each identity, modeling motion of different face regions with audio, and keeping the temporal consistency of the lip area.
Our S3D-NeRF surpasses previous arts on both video fidelity and audio-lip synchronization.
arXiv Detail & Related papers (2024-08-18T03:59:57Z) - 3D Visibility-aware Generalizable Neural Radiance Fields for Interacting
Hands [51.305421495638434]
Neural radiance fields (NeRFs) are promising 3D representations for scenes, objects, and humans.
This paper proposes a generalizable visibility-aware NeRF framework for interacting hands.
Experiments on the Interhand2.6M dataset demonstrate that our proposed VA-NeRF outperforms conventional NeRFs significantly.
arXiv Detail & Related papers (2024-01-02T00:42:06Z) - AE-NeRF: Audio Enhanced Neural Radiance Field for Few Shot Talking Head
Synthesis [42.203900183584665]
We present Audio Enhanced Neural Radiance Field (AE-NeRF) to generate realistic portraits of a new speaker with fewshot dataset.
AE-NeRF surpasses the state-of-the-art on image fidelity, audio-lip synchronization, and generalization ability, even in limited training set or training iterations.
arXiv Detail & Related papers (2023-12-18T04:14:38Z) - GeneFace++: Generalized and Stable Real-Time Audio-Driven 3D Talking
Face Generation [71.73912454164834]
A modern talking face generation method is expected to achieve the goals of generalized audio-lip synchronization, good video quality, and high system efficiency.
NeRF has become a popular technique in this field since it could achieve high-fidelity and 3D-consistent talking face generation with a few-minute-long training video.
We propose GeneFace++ to handle these challenges by utilizing the rendering pitch contour as an auxiliary feature and introducing a temporal loss in the facial motion prediction process.
arXiv Detail & Related papers (2023-05-01T12:24:09Z) - GeneFace: Generalized and High-Fidelity Audio-Driven 3D Talking Face
Synthesis [62.297513028116576]
GeneFace is a general and high-fidelity NeRF-based talking face generation method.
A head-aware torso-NeRF is proposed to eliminate the head-torso problem.
arXiv Detail & Related papers (2023-01-31T05:56:06Z) - Learning Dynamic Facial Radiance Fields for Few-Shot Talking Head
Synthesis [90.43371339871105]
We propose Dynamic Facial Radiance Fields (DFRF) for few-shot talking head synthesis.
DFRF conditions face radiance field on 2D appearance images to learn the face prior.
Experiments show DFRF can synthesize natural and high-quality audio-driven talking head videos for novel identities with only 40k iterations.
arXiv Detail & Related papers (2022-07-24T16:46:03Z) - Semantic-Aware Implicit Neural Audio-Driven Video Portrait Generation [61.8546794105462]
We propose Semantic-aware Speaking Portrait NeRF (SSP-NeRF), which creates delicate audio-driven portraits using one unified set of NeRF.
We first propose a Semantic-Aware Dynamic Ray Sampling module with an additional parsing branch that facilitates audio-driven volume rendering.
To enable portrait rendering in one unified neural radiance field, a Torso Deformation module is designed to stabilize the large-scale non-rigid torso motions.
arXiv Detail & Related papers (2022-01-19T18:54:41Z) - AD-NeRF: Audio Driven Neural Radiance Fields for Talking Head Synthesis [55.24336227884039]
We present a novel framework to generate high-fidelity talking head video.
We use neural scene representation networks to bridge the gap between audio input and video output.
Our framework can (1) produce high-fidelity and natural results, and (2) support free adjustment of audio signals, viewing directions, and background images.
arXiv Detail & Related papers (2021-03-20T02:58:13Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.