GaussianTalker: Real-Time High-Fidelity Talking Head Synthesis with Audio-Driven 3D Gaussian Splatting
- URL: http://arxiv.org/abs/2404.16012v2
- Date: Thu, 25 Apr 2024 10:25:11 GMT
- Title: GaussianTalker: Real-Time High-Fidelity Talking Head Synthesis with Audio-Driven 3D Gaussian Splatting
- Authors: Kyusun Cho, Joungbin Lee, Heeji Yoon, Yeobin Hong, Jaehoon Ko, Sangjun Ahn, Seungryong Kim,
- Abstract summary: We propose a novel framework for real-time generation of pose-controllable talking heads.
GaussianTalker builds a canonical 3DGS representation of the head and deforms it in sync with the audio.
It exploits the spatial-aware features and enforces interactions between neighboring points.
- Score: 25.78134656333095
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We propose GaussianTalker, a novel framework for real-time generation of pose-controllable talking heads. It leverages the fast rendering capabilities of 3D Gaussian Splatting (3DGS) while addressing the challenges of directly controlling 3DGS with speech audio. GaussianTalker constructs a canonical 3DGS representation of the head and deforms it in sync with the audio. A key insight is to encode the 3D Gaussian attributes into a shared implicit feature representation, where it is merged with audio features to manipulate each Gaussian attribute. This design exploits the spatial-aware features and enforces interactions between neighboring points. The feature embeddings are then fed to a spatial-audio attention module, which predicts frame-wise offsets for the attributes of each Gaussian. It is more stable than previous concatenation or multiplication approaches for manipulating the numerous Gaussians and their intricate parameters. Experimental results showcase GaussianTalker's superiority in facial fidelity, lip synchronization accuracy, and rendering speed compared to previous methods. Specifically, GaussianTalker achieves a remarkable rendering speed up to 120 FPS, surpassing previous benchmarks. Our code is made available at https://github.com/KU-CVLAB/GaussianTalker/ .
Related papers
- Dynamic Gaussian Marbles for Novel View Synthesis of Casual Monocular Videos [58.22272760132996]
We show that existing 4D Gaussian methods dramatically fail in this setup because the monocular setting is underconstrained.
We propose Dynamic Gaussian Marbles, which consist of three core modifications that target the difficulties of the monocular setting.
We evaluate on the Nvidia Dynamic Scenes dataset and the DyCheck iPhone dataset, and show that Gaussian Marbles significantly outperforms other Gaussian baselines in quality.
arXiv Detail & Related papers (2024-06-26T19:37:07Z) - PUP 3D-GS: Principled Uncertainty Pruning for 3D Gaussian Splatting [59.277480452459315]
We propose a principled spatial sensitivity pruning score that outperforms current approaches.
We also propose a multi-round prune-refine pipeline that can be applied to any pretrained 3D-GS model.
Our pipeline increases the average rendering speed of 3D-GS by 2.65$times$ while retaining more salient foreground information.
arXiv Detail & Related papers (2024-06-14T17:53:55Z) - GaussianForest: Hierarchical-Hybrid 3D Gaussian Splatting for Compressed Scene Modeling [40.743135560583816]
We introduce the Gaussian-Forest modeling framework, which hierarchically represents a scene as a forest of hybrid 3D Gaussians.
Experiments demonstrate that Gaussian-Forest not only maintains comparable speed and quality but also achieves a compression rate surpassing 10 times.
arXiv Detail & Related papers (2024-06-13T02:41:11Z) - Adversarial Generation of Hierarchical Gaussians for 3D Generative Model [20.833116566243408]
In this paper, we exploit Gaussian as a 3D representation for 3D GANs by leveraging its efficient and explicit characteristics.
We introduce a generator architecture with a hierarchical multi-scale Gaussian representation that effectively regularizes the position and scale of generated Gaussians.
Experimental results demonstrate that ours achieves a significantly faster rendering speed (x100) compared to state-of-the-art 3D consistent GANs.
arXiv Detail & Related papers (2024-06-05T05:52:20Z) - RTG-SLAM: Real-time 3D Reconstruction at Scale using Gaussian Splatting [51.51310922527121]
We present a real-time 3D reconstruction system with an RGBD camera for large-scale environments using Gaussian splatting.
We force each Gaussian to be either opaque or nearly transparent, with the opaque ones fitting the surface and dominant colors, and transparent ones fitting residual colors.
We show real-time reconstructions of a variety of large scenes and show superior performance in the realism of novel view synthesis and camera tracking accuracy.
arXiv Detail & Related papers (2024-04-30T16:54:59Z) - GSTalker: Real-time Audio-Driven Talking Face Generation via Deformable Gaussian Splatting [57.59261043916292]
GStalker is a 3D audio-driven talking face generation model with Gaussian Splatting.
It can generate high-fidelity and audio-lips synchronized results with fast training and real-time rendering speed.
arXiv Detail & Related papers (2024-04-29T18:28:36Z) - GaussianTalker: Speaker-specific Talking Head Synthesis via 3D Gaussian Splatting [27.699313086744237]
GaussianTalker is a novel method for audio-driven talking head synthesis based on 3D Gaussian Splatting.
Speaker-specific Motion Translator achieves accurate lip movements specific to the target speaker through universalized audio feature extraction.
Dynamic Gaussian Renderer introduces Speaker-specific BlendShapes to enhance facial detail representation via a latent pose.
arXiv Detail & Related papers (2024-04-22T09:51:43Z) - GES: Generalized Exponential Splatting for Efficient Radiance Field Rendering [112.16239342037714]
GES (Generalized Exponential Splatting) is a novel representation that employs Generalized Exponential Function (GEF) to model 3D scenes.
With the aid of a frequency-modulated loss, GES achieves competitive performance in novel-view synthesis benchmarks.
arXiv Detail & Related papers (2024-02-15T17:32:50Z) - HumanGaussian: Text-Driven 3D Human Generation with Gaussian Splatting [113.37908093915837]
Existing methods optimize 3D representations like mesh or neural fields via score distillation sampling (SDS), which suffers from inadequate fine details or excessive training time.
In this paper, we propose an efficient yet effective framework, HumanGaussian, that generates high-quality 3D humans with fine-grained geometry and realistic appearance.
arXiv Detail & Related papers (2023-11-28T18:59:58Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.