Related papers: AD-NeRF: Audio Driven Neural Radiance Fields for Talking Head Synthesis

AD-NeRF: Audio Driven Neural Radiance Fields for Talking Head Synthesis

URL: http://arxiv.org/abs/2103.11078v1
Date: Sat, 20 Mar 2021 02:58:13 GMT
Title: AD-NeRF: Audio Driven Neural Radiance Fields for Talking Head Synthesis
Authors: Yudong Guo, Keyu Chen, Sen Liang, Yongjin Liu, Hujun Bao, Juyong Zhang
Abstract summary: We present a novel framework to generate high-fidelity talking head video. We use neural scene representation networks to bridge the gap between audio input and video output. Our framework can (1) produce high-fidelity and natural results, and (2) support free adjustment of audio signals, viewing directions, and background images.
Score: 55.24336227884039
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Generating high-fidelity talking head video by fitting with the input audio sequence is a challenging problem that receives considerable attentions recently. In this paper, we address this problem with the aid of neural scene representation networks. Our method is completely different from existing methods that rely on intermediate representations like 2D landmarks or 3D face models to bridge the gap between audio input and video output. Specifically, the feature of input audio signal is directly fed into a conditional implicit function to generate a dynamic neural radiance field, from which a high-fidelity talking-head video corresponding to the audio signal is synthesized using volume rendering. Another advantage of our framework is that not only the head (with hair) region is synthesized as previous methods did, but also the upper body is generated via two individual neural radiance fields. Experimental results demonstrate that our novel framework can (1) produce high-fidelity and natural results, and (2) support free adjustment of audio signals, viewing directions, and background images.

Related papers

PointTalk: Audio-Driven Dynamic Lip Point Cloud for 3D Gaussian-based Talking Head Synthesis [27.97031664678664]
Methods based on radiance fields have received increasing attention due to their ability to synthesize high-fidelity talking heads. We propose a novel 3D Gaussian-based method called PointTalk, which constructs a static 3D Gaussian field of the head and deforms it in sync with the audio. Our method achieves superior high-fidelity and audio-lip synchronization in talking head synthesis compared to previous methods.
arXiv Detail & Related papers (2024-12-11T16:15:14Z)
S^3D-NeRF: Single-Shot Speech-Driven Neural Radiance Field for High Fidelity Talking Head Synthesis [14.437741528053504]
We design a Single-Shot Speech-Driven Radiance Field (S3D-NeRF) method to tackle the three difficulties: learning a representative appearance feature for each identity, modeling motion of different face regions with audio, and keeping the temporal consistency of the lip area. Our S3D-NeRF surpasses previous arts on both video fidelity and audio-lip synchronization.
arXiv Detail & Related papers (2024-08-18T03:59:57Z)
AV-GS: Learning Material and Geometry Aware Priors for Novel View Acoustic Synthesis [62.33446681243413]
view acoustic synthesis aims to render audio at any target viewpoint, given a mono audio emitted by a sound source at a 3D scene. Existing methods have proposed NeRF-based implicit models to exploit visual cues as a condition for synthesizing audio. We propose a novel Audio-Visual Gaussian Splatting (AV-GS) model to characterize the entire scene environment. Experiments validate the superiority of our AV-GS over existing alternatives on the real-world RWAS and simulation-based SoundSpaces datasets.
arXiv Detail & Related papers (2024-06-13T08:34:12Z)
NeRFFaceSpeech: One-shot Audio-driven 3D Talking Head Synthesis via Generative Prior [5.819784482811377]
We propose a novel method, NeRFFaceSpeech, which enables to produce high-quality 3D-aware talking head. Our method can craft a 3D-consistent facial feature space corresponding to a single image. We also introduce LipaintNet that can replenish the lacking information in the inner-mouth area.
arXiv Detail & Related papers (2024-05-09T13:14:06Z)
NeRF-AD: Neural Radiance Field with Attention-based Disentanglement for Talking Face Synthesis [2.5387791616637587]
Talking face synthesis driven by audio is one of the current research hotspots in the fields of multidimensional signal processing and multimedia. NeRF has recently been brought to this research field in order to enhance the realism and 3D effect of the generated faces. This paper proposes a talking face synthesis method via NeRF with attention-based disentanglement (NeRF-AD)
arXiv Detail & Related papers (2024-01-23T08:54:10Z)
AE-NeRF: Audio Enhanced Neural Radiance Field for Few Shot Talking Head Synthesis [42.203900183584665]
We present Audio Enhanced Neural Radiance Field (AE-NeRF) to generate realistic portraits of a new speaker with fewshot dataset. AE-NeRF surpasses the state-of-the-art on image fidelity, audio-lip synchronization, and generalization ability, even in limited training set or training iterations.
arXiv Detail & Related papers (2023-12-18T04:14:38Z)
FaceTalk: Audio-Driven Motion Diffusion for Neural Parametric Head Models [85.16273912625022]
We introduce FaceTalk, a novel generative approach designed for synthesizing high-fidelity 3D motion sequences of talking human heads from audio signal. To the best of our knowledge, this is the first work to propose a generative approach for realistic and high-quality motion synthesis of human heads.
arXiv Detail & Related papers (2023-12-13T19:01:07Z)
GeneFace: Generalized and High-Fidelity Audio-Driven 3D Talking Face Synthesis [62.297513028116576]
GeneFace is a general and high-fidelity NeRF-based talking face generation method. A head-aware torso-NeRF is proposed to eliminate the head-torso problem.
arXiv Detail & Related papers (2023-01-31T05:56:06Z)
BinauralGrad: A Two-Stage Conditional Diffusion Probabilistic Model for Binaural Audio Synthesis [129.86743102915986]
We formulate the synthesis process from a different perspective by decomposing the audio into a common part. We propose BinauralGrad, a novel two-stage framework equipped with diffusion models to synthesize them respectively. Experiment results show that BinauralGrad outperforms the existing baselines by a large margin in terms of both object and subject evaluation metrics.
arXiv Detail & Related papers (2022-05-30T02:09:26Z)
Semantic-Aware Implicit Neural Audio-Driven Video Portrait Generation [61.8546794105462]
We propose Semantic-aware Speaking Portrait NeRF (SSP-NeRF), which creates delicate audio-driven portraits using one unified set of NeRF. We first propose a Semantic-Aware Dynamic Ray Sampling module with an additional parsing branch that facilitates audio-driven volume rendering. To enable portrait rendering in one unified neural radiance field, a Torso Deformation module is designed to stabilize the large-scale non-rigid torso motions.
arXiv Detail & Related papers (2022-01-19T18:54:41Z)

This list is automatically generated from the titles and abstracts of the papers in this site.