PGSTalker: Real-Time Audio-Driven Talking Head Generation via 3D Gaussian Splatting with Pixel-Aware Density Control
- URL: http://arxiv.org/abs/2509.16922v1
- Date: Sun, 21 Sep 2025 05:01:54 GMT
- Title: PGSTalker: Real-Time Audio-Driven Talking Head Generation via 3D Gaussian Splatting with Pixel-Aware Density Control
- Authors: Tianheng Zhu, Yinfeng Yu, Liejun Wang, Fuchun Sun, Wendong Zheng,
- Abstract summary: We present PGSTalker, a real-time audio-driven talking head synthesis framework based on 3D Gaussian Splatting (3DGS)<n>To improve rendering performance, we propose a pixel-aware density control strategy that adaptively allocates point density, enhancing detail in dynamic facial regions while reducing redundancy elsewhere.
- Score: 37.390794417927644
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Audio-driven talking head generation is crucial for applications in virtual reality, digital avatars, and film production. While NeRF-based methods enable high-fidelity reconstruction, they suffer from low rendering efficiency and suboptimal audio-visual synchronization. This work presents PGSTalker, a real-time audio-driven talking head synthesis framework based on 3D Gaussian Splatting (3DGS). To improve rendering performance, we propose a pixel-aware density control strategy that adaptively allocates point density, enhancing detail in dynamic facial regions while reducing redundancy elsewhere. Additionally, we introduce a lightweight Multimodal Gated Fusion Module to effectively fuse audio and spatial features, thereby improving the accuracy of Gaussian deformation prediction. Extensive experiments on public datasets demonstrate that PGSTalker outperforms existing NeRF- and 3DGS-based approaches in rendering quality, lip-sync precision, and inference speed. Our method exhibits strong generalization capabilities and practical potential for real-world deployment.
Related papers
- Toward Fine-Grained Facial Control in 3D Talking Head Generation [47.03887859473704]
Fine-Grained 3D Gaussian Splatting is a novel framework that enables temporally consistent and high-fidelity head generation.<n>Our method outperforms recent state-of-the-art approaches in producing high-fidelity, lip-synced talking head videos.
arXiv Detail & Related papers (2026-02-10T12:49:50Z) - Lightweight High-Fidelity Low-Bitrate Talking Face Compression for 3D Video Conference [16.973019571440556]
Traditional 2D video compression techniques fail to preserve fine-grained and geometric appearance details.<n>We propose a lightweight, high-fidelity, low-bitrate 3D talking face compression framework that integrates FLAME-based parametric modeling with 3DGS neural rendering.
arXiv Detail & Related papers (2026-01-29T05:03:29Z) - EGSTalker: Real-Time Audio-Driven Talking Head Generation with Efficient Gaussian Deformation [37.390794417927644]
EGSTalker is a real-time audio-driven talking head generation framework based on 3D Gaussian Splatting (3DGS)<n>It requires only 3-5 minutes of training video to synthesize high-quality facial animations.<n>EGSTalker achieves rendering quality and lip-sync accuracy comparable to state-of-the-art methods, while significantly outperforming them in inference speed.
arXiv Detail & Related papers (2025-10-03T14:31:20Z) - Perceive-Sample-Compress: Towards Real-Time 3D Gaussian Splatting [7.421996491601524]
We introduce a novel perceive-sample-compress framework for 3D Gaussian Splatting.<n>We show that our method significantly improves memory efficiency and high visual quality while maintaining real-time rendering speed.
arXiv Detail & Related papers (2025-08-07T01:34:38Z) - Audio-Plane: Audio Factorization Plane Gaussian Splatting for Real-Time Talking Head Synthesis [56.749927786910554]
We propose a novel framework that integrates Gaussian Splatting with a structured Audio Factorization Plane (Audio-Plane) to enable high-quality, audio-synchronized, and real-time talking head generation.<n>Our method achieves state-of-the-art visual quality, precise audio-lip synchronization, and real-time performance, outperforming prior approaches across both 2D- and 3D-based paradigms.
arXiv Detail & Related papers (2025-03-28T16:50:27Z) - EVolSplat: Efficient Volume-based Gaussian Splatting for Urban View Synthesis [61.1662426227688]
Existing NeRF and 3DGS-based methods show promising results in achieving photorealistic renderings but require slow, per-scene optimization.<n>We introduce EVolSplat, an efficient 3D Gaussian Splatting model for urban scenes that works in a feed-forward manner.
arXiv Detail & Related papers (2025-03-26T02:47:27Z) - Event3DGS: Event-Based 3D Gaussian Splatting for High-Speed Robot Egomotion [54.197343533492486]
Event3DGS can reconstruct high-fidelity 3D structure and appearance under high-speed egomotion.
Experiments on multiple synthetic and real-world datasets demonstrate the superiority of Event3DGS compared with existing event-based dense 3D scene reconstruction frameworks.
Our framework also allows one to incorporate a few motion-blurred frame-based measurements into the reconstruction process to further improve appearance fidelity without loss of structural accuracy.
arXiv Detail & Related papers (2024-06-05T06:06:03Z) - GSTalker: Real-time Audio-Driven Talking Face Generation via Deformable Gaussian Splatting [57.59261043916292]
GStalker is a 3D audio-driven talking face generation model with Gaussian Splatting.
It can generate high-fidelity and audio-lips synchronized results with fast training and real-time rendering speed.
arXiv Detail & Related papers (2024-04-29T18:28:36Z) - GaussianTalker: Speaker-specific Talking Head Synthesis via 3D Gaussian Splatting [27.699313086744237]
GaussianTalker is a novel method for audio-driven talking head synthesis based on 3D Gaussian Splatting.
Speaker-specific Motion Translator achieves accurate lip movements specific to the target speaker through universalized audio feature extraction.
Dynamic Gaussian Renderer introduces Speaker-specific BlendShapes to enhance facial detail representation via a latent pose.
arXiv Detail & Related papers (2024-04-22T09:51:43Z) - Real-time Neural Radiance Talking Portrait Synthesis via Audio-spatial
Decomposition [61.6677901687009]
We propose an efficient NeRF-based framework that enables real-time synthesizing of talking portraits.
Our method can generate realistic and audio-lips synchronized talking portrait videos.
arXiv Detail & Related papers (2022-11-22T16:03:11Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.