Related papers: TalkingGaussian: Structure-Persistent 3D Talking Head Synthesis via Gaussian Splatting

TalkingGaussian: Structure-Persistent 3D Talking Head Synthesis via Gaussian Splatting

URL: http://arxiv.org/abs/2404.15264v2
Date: Fri, 5 Jul 2024 04:09:46 GMT
Title: TalkingGaussian: Structure-Persistent 3D Talking Head Synthesis via Gaussian Splatting
Authors: Jiahe Li, Jiawei Zhang, Xiao Bai, Jin Zheng, Xin Ning, Jun Zhou, Lin Gu,
Abstract summary: We introduce TalkingGaussian, a deformation-based radiance fields framework for high-fidelity talking head synthesis. Our method renders high-quality lip-synchronized talking head videos, with better facial fidelity and higher efficiency compared with previous methods.
Score: 21.474938045227702
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Radiance fields have demonstrated impressive performance in synthesizing lifelike 3D talking heads. However, due to the difficulty in fitting steep appearance changes, the prevailing paradigm that presents facial motions by directly modifying point appearance may lead to distortions in dynamic regions. To tackle this challenge, we introduce TalkingGaussian, a deformation-based radiance fields framework for high-fidelity talking head synthesis. Leveraging the point-based Gaussian Splatting, facial motions can be represented in our method by applying smooth and continuous deformations to persistent Gaussian primitives, without requiring to learn the difficult appearance change like previous methods. Due to this simplification, precise facial motions can be synthesized while keeping a highly intact facial feature. Under such a deformation paradigm, we further identify a face-mouth motion inconsistency that would affect the learning of detailed speaking motions. To address this conflict, we decompose the model into two branches separately for the face and inside mouth areas, therefore simplifying the learning tasks to help reconstruct more accurate motion and structure of the mouth region. Extensive experiments demonstrate that our method renders high-quality lip-synchronized talking head videos, with better facial fidelity and higher efficiency compared with previous methods.

Related papers

OT-Talk: Animating 3D Talking Head with Optimal Transportation [20.023346831300373]
OT-Talk is the first approach to leverage optimal transportation to optimize the learning model in talking head animation.<n>Building on existing learning frameworks, we utilize a pre-trained Hubert model to extract audio features and a transformer model to process temporal sequences.<n>Our experiments on two public audio-mesh datasets demonstrate that our method outperforms state-of-the-art techniques.
arXiv Detail & Related papers (2025-05-03T21:49:23Z)
GaussianHeads: End-to-End Learning of Drivable Gaussian Head Avatars from Coarse-to-fine Representations [54.94362657501809]
We propose a new method to generate highly dynamic and deformable human head avatars from multi-view imagery in real-time. At the core of our method is a hierarchical representation of head models that allows to capture the complex dynamics of facial expressions and head movements. We train this coarse-to-fine facial avatar model along with the head pose as a learnable parameter in an end-to-end framework.
arXiv Detail & Related papers (2024-09-18T13:05:43Z)
High-fidelity and Lip-synced Talking Face Synthesis via Landmark-based Diffusion Model [89.29655924125461]
We propose a novel landmark-based diffusion model for talking face generation. We first establish the less ambiguous mapping from audio to landmark motion of lip and jaw. Then, we introduce an innovative conditioning module called TalkFormer to align the synthesized motion with the motion represented by landmarks.
arXiv Detail & Related papers (2024-08-10T02:58:28Z)
Parametric Implicit Face Representation for Audio-Driven Facial Reenactment [52.33618333954383]
We propose a novel audio-driven facial reenactment framework that is both controllable and can generate high-quality talking heads. Specifically, our parametric implicit representation parameterizes the implicit representation with interpretable parameters of 3D face models. Our method can generate more realistic results than previous methods with greater fidelity to the identities and talking styles of speakers.
arXiv Detail & Related papers (2023-06-13T07:08:22Z)
One-Shot High-Fidelity Talking-Head Synthesis with Deformable Neural Radiance Field [81.07651217942679]
Talking head generation aims to generate faces that maintain the identity information of the source image and imitate the motion of the driving image. We propose HiDe-NeRF, which achieves high-fidelity and free-view talking-head synthesis.
arXiv Detail & Related papers (2023-04-11T09:47:35Z)
Pose-Controllable 3D Facial Animation Synthesis using Hierarchical Audio-Vertex Attention [52.63080543011595]
A novel pose-controllable 3D facial animation synthesis method is proposed by utilizing hierarchical audio-vertex attention. The proposed method can produce more realistic facial expressions and head posture movements.
arXiv Detail & Related papers (2023-02-24T09:36:31Z)
Expression-preserving face frontalization improves visually assisted speech processing [35.647888055229956]
The main contribution of this paper is a frontalization methodology that preserves non-rigid facial deformations. We show that the method, when incorporated into deep learning pipelines, improves word recognition and speech intelligibilty scores by a considerable margin.
arXiv Detail & Related papers (2022-04-06T13:22:24Z)
ImFace: A Nonlinear 3D Morphable Face Model with Implicit Neural Representations [21.389170615787368]
This paper presents a novel 3D morphable face model, namely ImFace, to learn a nonlinear and continuous space with implicit neural representations. It builds two explicitly disentangled deformation fields to model complex shapes associated with identities and expressions, respectively, and designs an improved learning strategy to extend embeddings of expressions. In addition to ImFace, an effective preprocessing pipeline is proposed to address the issue of watertight input requirement in implicit representations.
arXiv Detail & Related papers (2022-03-28T05:37:59Z)
DFA-NeRF: Personalized Talking Head Generation via Disentangled Face Attributes Neural Rendering [69.9557427451339]
We propose a framework based on neural radiance field to pursue high-fidelity talking head generation. Specifically, neural radiance field takes lip movements features and personalized attributes as two disentangled conditions. We show that our method achieves significantly better results than state-of-the-art methods.
arXiv Detail & Related papers (2022-01-03T18:23:38Z)

This list is automatically generated from the titles and abstracts of the papers in this site.