GaussianSwap: Animatable Video Face Swapping with 3D Gaussian Splatting
- URL: http://arxiv.org/abs/2601.05511v1
- Date: Fri, 09 Jan 2026 03:39:29 GMT
- Title: GaussianSwap: Animatable Video Face Swapping with 3D Gaussian Splatting
- Authors: Xuan Cheng, Jiahao Rao, Chengyang Li, Wenhao Wang, Weilin Chen, Lvqing Yang,
- Abstract summary: We introduce a video face swapping framework that constructs a 3D Gaussian Splatting based face avatar from a target video.<n>The resulting swapped faces exist merely as a set of unstructured pixels without any capacity for animation or interactive manipulation.
- Score: 15.546712348750425
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We introduce GaussianSwap, a novel video face swapping framework that constructs a 3D Gaussian Splatting based face avatar from a target video while transferring identity from a source image to the avatar. Conventional video swapping frameworks are limited to generating facial representations in pixel-based formats. The resulting swapped faces exist merely as a set of unstructured pixels without any capacity for animation or interactive manipulation. Our work introduces a paradigm shift from conventional pixel-based video generation to the creation of high-fidelity avatar with swapped faces. The framework first preprocesses target video to extract FLAME parameters, camera poses and segmentation masks, and then rigs 3D Gaussian splats to the FLAME model across frames, enabling dynamic facial control. To ensure identity preserving, we propose an compound identity embedding constructed from three state-of-the-art face recognition models for avatar finetuning. Finally, we render the face-swapped avatar on the background frames to obtain the face-swapped video. Experimental results demonstrate that GaussianSwap achieves superior identity preservation, visual clarity and temporal consistency, while enabling previously unattainable interactive applications.
Related papers
- LegacyAvatars: Volumetric Face Avatars For Traditional Graphics Pipelines [33.45496857979673]
We introduce a novel representation for efficient classical rendering of 3D face avatars.<n>Our approach achieves controllable rendering of complex facial features, including hair, skin, and eyes.
arXiv Detail & Related papers (2026-01-18T06:46:05Z) - ScaffoldAvatar: High-Fidelity Gaussian Avatars with Patch Expressions [49.34398022152462]
We propose to couple locally-defined facial expressions with 3D Gaussian splatting to enable creating ultra-high fidelity, expressive and photorealistic 3D head avatars.<n>In particular, we leverage a patch-based geometric 3D face model to extract patch expressions and learn how to translate these into local dynamic skin appearance and motion.<n>We employ color-based densification and progressive training to obtain high-quality results and faster convergence for high resolution 3K training images.
arXiv Detail & Related papers (2025-07-14T17:59:03Z) - ToonifyGB: StyleGAN-based Gaussian Blendshapes for 3D Stylized Head Avatars [0.916825397273032]
Toonify, a StyleGAN-based method, has become widely used for facial image stylization.<n>We propose an efficient two-stage framework, ToonifyGB, to extend Toonify for diverse stylized 3D head avatars.
arXiv Detail & Related papers (2025-05-15T08:16:12Z) - PERSE: Personalized 3D Generative Avatars from A Single Portrait [18.069177711777662]
PERSE is a method for building a personalized 3D generative avatar from a reference portrait.<n>Our method begins by large-scale synthetic 2D video datasets.
arXiv Detail & Related papers (2024-12-30T18:59:58Z) - GaussianHeads: End-to-End Learning of Drivable Gaussian Head Avatars from Coarse-to-fine Representations [54.94362657501809]
We propose a new method to generate highly dynamic and deformable human head avatars from multi-view imagery in real-time.
At the core of our method is a hierarchical representation of head models that allows to capture the complex dynamics of facial expressions and head movements.
We train this coarse-to-fine facial avatar model along with the head pose as a learnable parameter in an end-to-end framework.
arXiv Detail & Related papers (2024-09-18T13:05:43Z) - G3FA: Geometry-guided GAN for Face Animation [14.488117084637631]
We introduce Geometry-guided GAN for Face Animation (G3FA) to tackle this limitation.
Our novel approach empowers the face animation model to incorporate 3D information using only 2D images.
In our face reenactment model, we leverage 2D motion warping to capture motion dynamics.
arXiv Detail & Related papers (2024-08-23T13:13:24Z) - GSmoothFace: Generalized Smooth Talking Face Generation via Fine Grained
3D Face Guidance [83.43852715997596]
GSmoothFace is a novel two-stage generalized talking face generation model guided by a fine-grained 3d face model.
It can synthesize smooth lip dynamics while preserving the speaker's identity.
Both quantitative and qualitative experiments confirm the superiority of our method in terms of realism, lip synchronization, and visual quality.
arXiv Detail & Related papers (2023-12-12T16:00:55Z) - Video2StyleGAN: Encoding Video in Latent Space for Manipulation [63.03250800510085]
We propose a novel network to encode face videos into the latent space of StyleGAN for semantic face video manipulation.
Our approach can significantly outperform existing single image methods, while achieving real-time (66 fps) speed.
arXiv Detail & Related papers (2022-06-27T06:48:15Z) - UniFaceGAN: A Unified Framework for Temporally Consistent Facial Video
Editing [78.26925404508994]
We propose a unified temporally consistent facial video editing framework termed UniFaceGAN.
Our framework is designed to handle face swapping and face reenactment simultaneously.
Compared with the state-of-the-art facial image editing methods, our framework generates video portraits that are more photo-realistic and temporally smooth.
arXiv Detail & Related papers (2021-08-12T10:35:22Z) - Image-to-Video Generation via 3D Facial Dynamics [78.01476554323179]
We present a versatile model, FaceAnime, for various video generation tasks from still images.
Our model is versatile for various AR/VR and entertainment applications, such as face video and face video prediction.
arXiv Detail & Related papers (2021-05-31T02:30:11Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.