Related papers: FaceTalk: Audio-Driven Motion Diffusion for Neural Parametric Head Models

FaceTalk: Audio-Driven Motion Diffusion for Neural Parametric Head Models

URL: http://arxiv.org/abs/2312.08459v2
Date: Sun, 17 Mar 2024 23:45:01 GMT
Title: FaceTalk: Audio-Driven Motion Diffusion for Neural Parametric Head Models
Authors: Shivangi Aneja, Justus Thies, Angela Dai, Matthias Nießner,
Abstract summary: We introduce FaceTalk, a novel generative approach designed for synthesizing high-fidelity 3D motion sequences of talking human heads from audio signal. To the best of our knowledge, this is the first work to propose a generative approach for realistic and high-quality motion synthesis of human heads.
Score: 85.16273912625022
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: We introduce FaceTalk, a novel generative approach designed for synthesizing high-fidelity 3D motion sequences of talking human heads from input audio signal. To capture the expressive, detailed nature of human heads, including hair, ears, and finer-scale eye movements, we propose to couple speech signal with the latent space of neural parametric head models to create high-fidelity, temporally coherent motion sequences. We propose a new latent diffusion model for this task, operating in the expression space of neural parametric head models, to synthesize audio-driven realistic head sequences. In the absence of a dataset with corresponding NPHM expressions to audio, we optimize for these correspondences to produce a dataset of temporally-optimized NPHM expressions fit to audio-video recordings of people talking. To the best of our knowledge, this is the first work to propose a generative approach for realistic and high-quality motion synthesis of volumetric human heads, representing a significant advancement in the field of audio-driven 3D animation. Notably, our approach stands out in its ability to generate plausible motion sequences that can produce high-fidelity head animation coupled with the NPHM shape space. Our experimental results substantiate the effectiveness of FaceTalk, consistently achieving superior and visually natural motion, encompassing diverse facial expressions and styles, outperforming existing methods by 75% in perceptual user study evaluation.

Related papers

Audio-Plane: Audio Factorization Plane Gaussian Splatting for Real-Time Talking Head Synthesis [56.749927786910554]
We propose a novel framework that integrates Gaussian Splatting with a structured Audio Factorization Plane (Audio-Plane) to enable high-quality, audio-synchronized, and real-time talking head generation.<n>Our method achieves state-of-the-art visual quality, precise audio-lip synchronization, and real-time performance, outperforming prior approaches across both 2D- and 3D-based paradigms.
arXiv Detail & Related papers (2025-03-28T16:50:27Z)
GaussianSpeech: Audio-Driven Gaussian Avatars [76.10163891172192]
We introduce GaussianSpeech, a novel approach that synthesizes high-fidelity animation sequences of photo-realistic, personalized 3D human head avatars from spoken audio. We propose a compact and efficient 3DGS-based avatar representation that generates expression-dependent color and leverages wrinkle- and perceptually-based losses to synthesize facial details.
arXiv Detail & Related papers (2024-11-27T18:54:08Z)
Speech2UnifiedExpressions: Synchronous Synthesis of Co-Speech Affective Face and Body Expressions from Affordable Inputs [67.27840327499625]
We present a multimodal learning-based method to simultaneously synthesize co-speech facial expressions and upper-body gestures for digital characters. Our approach learns from sparse face landmarks and upper-body joints, estimated directly from video data, to generate plausible emotive character motions.
arXiv Detail & Related papers (2024-06-26T04:53:11Z)
NeRFFaceSpeech: One-shot Audio-driven 3D Talking Head Synthesis via Generative Prior [5.819784482811377]
We propose a novel method, NeRFFaceSpeech, which enables to produce high-quality 3D-aware talking head. Our method can craft a 3D-consistent facial feature space corresponding to a single image. We also introduce LipaintNet that can replenish the lacking information in the inner-mouth area.
arXiv Detail & Related papers (2024-05-09T13:14:06Z)
Talk3D: High-Fidelity Talking Portrait Synthesis via Personalized 3D Generative Prior [29.120669908374424]
We introduce a novel audio-driven talking head synthesis framework, called Talk3D. It can faithfully reconstruct its plausible facial geometries by effectively adopting the pre-trained 3D-aware generative prior. Compared to existing methods, our method excels in generating realistic facial geometries even under extreme head poses.
arXiv Detail & Related papers (2024-03-29T12:49:40Z)
Probabilistic Speech-Driven 3D Facial Motion Synthesis: New Benchmarks, Methods, and Applications [20.842799581850617]
We consider the task of animating 3D facial geometry from speech signal. Existing works are primarily deterministic, focusing on learning a one-to-one mapping from speech signal to 3D face meshes on small datasets with limited speakers.
arXiv Detail & Related papers (2023-11-30T01:14:43Z)
Pose-Controllable 3D Facial Animation Synthesis using Hierarchical Audio-Vertex Attention [52.63080543011595]
A novel pose-controllable 3D facial animation synthesis method is proposed by utilizing hierarchical audio-vertex attention. The proposed method can produce more realistic facial expressions and head posture movements.
arXiv Detail & Related papers (2023-02-24T09:36:31Z)
GeneFace: Generalized and High-Fidelity Audio-Driven 3D Talking Face Synthesis [62.297513028116576]
GeneFace is a general and high-fidelity NeRF-based talking face generation method. A head-aware torso-NeRF is proposed to eliminate the head-torso problem.
arXiv Detail & Related papers (2023-01-31T05:56:06Z)
Learning to Listen: Modeling Non-Deterministic Dyadic Facial Motion [89.01668641930206]
We present a framework for modeling interactional communication in dyadic conversations. We autoregressively output multiple possibilities of corresponding listener motion. Our method organically captures the multimodal and non-deterministic nature of nonverbal dyadic interactions.
arXiv Detail & Related papers (2022-04-18T17:58:04Z)
DFA-NeRF: Personalized Talking Head Generation via Disentangled Face Attributes Neural Rendering [69.9557427451339]
We propose a framework based on neural radiance field to pursue high-fidelity talking head generation. Specifically, neural radiance field takes lip movements features and personalized attributes as two disentangled conditions. We show that our method achieves significantly better results than state-of-the-art methods.
arXiv Detail & Related papers (2022-01-03T18:23:38Z)
Audio2Head: Audio-driven One-shot Talking-head Generation with Natural Head Motion [34.406907667904996]
We propose an audio-driven talking-head method to generate photo-realistic talking-head videos from a single reference image. We first design a head pose predictor by modeling rigid 6D head movements with a motion-aware recurrent neural network (RNN) Then, we develop a motion field generator to produce the dense motion fields from input audio, head poses, and a reference image.
arXiv Detail & Related papers (2021-07-20T07:22:42Z)

This list is automatically generated from the titles and abstracts of the papers in this site.