Related papers: Splat-Portrait: Generalizing Talking Heads with Gaussian Splatting

Splat-Portrait: Generalizing Talking Heads with Gaussian Splatting

URL: http://arxiv.org/abs/2601.18633v1
Date: Mon, 26 Jan 2026 16:06:57 GMT
Title: Splat-Portrait: Generalizing Talking Heads with Gaussian Splatting
Authors: Tong Shi, Melonie de Almeida, Daniela Ivanova, Nicolas Pugeault, Paul Henderson,
Abstract summary: Talking Head Generation aims at synthesizing natural-looking talking videos from speech and a single portrait image.<n>Previous 3D talking head generation methods have relied on domain-specifics such as warping-based facial motion representation priors to animate talking motions.<n>We introduce Splat-Portrait, a Gaussian-splatting-based method that addresses the challenges of 3D head reconstruction and lip motion synthesis.
Score: 6.62155043692653
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Talking Head Generation aims at synthesizing natural-looking talking videos from speech and a single portrait image. Previous 3D talking head generation methods have relied on domain-specific heuristics such as warping-based facial motion representation priors to animate talking motions, yet still produce inaccurate 3D avatar reconstructions, thus undermining the realism of generated animations. We introduce Splat-Portrait, a Gaussian-splatting-based method that addresses the challenges of 3D head reconstruction and lip motion synthesis. Our approach automatically learns to disentangle a single portrait image into a static 3D reconstruction represented as static Gaussian Splatting, and a predicted whole-image 2D background. It then generates natural lip motion conditioned on input audio, without any motion driven priors. Training is driven purely by 2D reconstruction and score-distillation losses, without 3D supervision nor landmarks. Experimental results demonstrate that Splat-Portrait exhibits superior performance on talking head generation and novel view synthesis, achieving better visual quality compared to previous works. Our project code and supplementary documents are public available at https://github.com/stonewalking/Splat-portrait.

Related papers

GeoDiff4D: Geometry-Aware Diffusion for 4D Head Avatar Reconstruction [49.70452913749897]
We propose a novel framework that leverages geometry-aware diffusion to learn strong geometry priors for high-fidelity head avatar reconstruction.<n>Our approach jointly synthesizes portrait images and corresponding surface normals, while a pose-free expression captures implicit expression representations.<n>Our method substantially outperforms state-of-the-art approaches in visual quality, expression fidelity, and cross-identity generalization.
arXiv Detail & Related papers (2026-02-27T16:41:21Z)
From Blurry to Believable: Enhancing Low-quality Talking Heads with 3D Generative Priors [49.37666175170832]
We introduce SuperHead, a framework for enhancing low-resolution, animatable 3D head avatars.<n>SuperHead synthesizes high-quality geometry and textures, while ensuring both 3D and temporal consistency.<n>Experiments demonstrate that SuperHead generates avatars with fine-grained facial details under dynamic motions.
arXiv Detail & Related papers (2026-02-05T19:00:50Z)
VASA-3D: Lifelike Audio-Driven Gaussian Head Avatars from a Single Image [27.76629170122787]
VASA-3D is an audio-driven, single-shot 3D head avatar generator.<n>This research tackles two major challenges: capturing the subtle expression details present in real human faces, and reconstructing an intricate 3D head avatar from a single portrait image.
arXiv Detail & Related papers (2025-12-16T18:44:00Z)
GaussianSpeech: Audio-Driven Gaussian Avatars [76.10163891172192]
We introduce GaussianSpeech, a novel approach that synthesizes high-fidelity animation sequences of photo-realistic, personalized 3D human head avatars from spoken audio.<n>We propose a compact and efficient 3DGS-based avatar representation that generates expression-dependent color and leverages wrinkle- and perceptually-based losses to synthesize facial details.
arXiv Detail & Related papers (2024-11-27T18:54:08Z)
EmoTalk3D: High-Fidelity Free-View Synthesis of Emotional 3D Talking Head [30.138347111341748]
We present a novel approach for synthesizing 3D talking heads with controllable emotion. Our model enables controllable emotion in the generated talking heads and can be rendered in wide-range views. Experiments demonstrate the effectiveness of our approach in generating high-fidelity and emotion-controllable 3D talking heads.
arXiv Detail & Related papers (2024-08-01T05:46:57Z)
GSD: View-Guided Gaussian Splatting Diffusion for 3D Reconstruction [52.04103235260539]
We present a diffusion model approach based on Gaussian Splatting representation for 3D object reconstruction from a single view. The model learns to generate 3D objects represented by sets of GS ellipsoids. The final reconstructed objects explicitly come with high-quality 3D structure and texture, and can be efficiently rendered in arbitrary views.
arXiv Detail & Related papers (2024-07-05T03:43:08Z)
Real3D-Portrait: One-shot Realistic 3D Talking Portrait Synthesis [88.17520303867099]
One-shot 3D talking portrait generation aims to reconstruct a 3D avatar from an unseen image, and then animate it with a reference video or audio. We present Real3D-Potrait, a framework that improves the one-shot 3D reconstruction power with a large image-to-plane model. Experiments show that Real3D-Portrait generalizes well to unseen identities and generates more realistic talking portrait videos.
arXiv Detail & Related papers (2024-01-16T17:04:30Z)
SadTalker: Learning Realistic 3D Motion Coefficients for Stylized Audio-Driven Single Image Talking Face Animation [33.651156455111916]
We present SadTalker, which generates 3D motion coefficients (head pose, expression) of the 3DMM from audio. Precisely, we present ExpNet to learn the accurate facial expression from audio by distilling both coefficients and 3D-rendered faces.
arXiv Detail & Related papers (2022-11-22T11:35:07Z)
Audio- and Gaze-driven Facial Animation of Codec Avatars [149.0094713268313]
We describe the first approach to animate Codec Avatars in real-time using audio and/or eye tracking. Our goal is to display expressive conversations between individuals that exhibit important social signals.
arXiv Detail & Related papers (2020-08-11T22:28:48Z)
Audio-driven Talking Face Video Generation with Learning-based Personalized Head Pose [67.31838207805573]
We propose a deep neural network model that takes an audio signal A of a source person and a short video V of a target person as input. We outputs a synthesized high-quality talking face video with personalized head pose. Our method can generate high-quality talking face videos with more distinguishing head movement effects than state-of-the-art methods.
arXiv Detail & Related papers (2020-02-24T10:02:10Z)

This list is automatically generated from the titles and abstracts of the papers in this site.