Related papers: Learning Semantic Facial Descriptors for Accurate Face Animation

Learning Semantic Facial Descriptors for Accurate Face Animation

URL: http://arxiv.org/abs/2501.17718v1
Date: Wed, 29 Jan 2025 15:40:42 GMT
Title: Learning Semantic Facial Descriptors for Accurate Face Animation
Authors: Lei Zhu, Yuanqi Chen, Xiaohang Liu, Thomas H. Li, Ge Li,
Abstract summary: We introduce the semantic facial descriptors in learnable disentangled vector space to address the dilemma.<n>We obtain basis vector coefficients by employing an encoder on the source and driving faces, leading to effective facial descriptors in the identity and motion subspaces.<n>Our approach successfully addresses the issue of model-based methods' limitations in high-fidelity identity and the challenges faced by model-free methods in accurate motion transfer.
Score: 43.370084532812044
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: Face animation is a challenging task. Existing model-based methods (utilizing 3DMMs or landmarks) often result in a model-like reconstruction effect, which doesn't effectively preserve identity. Conversely, model-free approaches face challenges in attaining a decoupled and semantically rich feature space, thereby making accurate motion transfer difficult to achieve. We introduce the semantic facial descriptors in learnable disentangled vector space to address the dilemma. The approach involves decoupling the facial space into identity and motion subspaces while endowing each of them with semantics by learning complete orthogonal basis vectors. We obtain basis vector coefficients by employing an encoder on the source and driving faces, leading to effective facial descriptors in the identity and motion subspaces. Ultimately, these descriptors can be recombined as latent codes to animate faces. Our approach successfully addresses the issue of model-based methods' limitations in high-fidelity identity and the challenges faced by model-free methods in accurate motion transfer. Extensive experiments are conducted on three challenging benchmarks (i.e. VoxCeleb, HDTF, CelebV). Comprehensive quantitative and qualitative results demonstrate that our model outperforms SOTA methods with superior identity preservation and motion transfer.

Related papers

IM-Animation: An Implicit Motion Representation for Identity-decoupled Character Animation [58.297199313494]
Implicit methods capture motion semantics directly from driving video, but suffer from identity leakage and entanglement between motion and appearance.<n>We propose a novel implicit motion representation that compresses per-frame motion into compact 1D motion tokens.<n>Our methodology employs a three-stage training strategy to enhance the training efficiency and ensure high fidelity.
arXiv Detail & Related papers (2026-02-07T11:17:20Z)
DiffSwap++: 3D Latent-Controlled Diffusion for Identity-Preserving Face Swapping [16.846179110602737]
We propose DiffSwap++, a novel diffusion-based face-swapping pipeline that incorporates 3D facial latent features during training.<n>Our method enhances geometric consistency and improves the disentanglement of facial identity from appearance attributes.<n>Experiments on CelebA, FFHQ, and CelebV-Text demonstrate that DiffSwap++ outperforms prior methods in preserving source identity while maintaining target pose and expression.
arXiv Detail & Related papers (2025-11-04T18:56:49Z)
X-NeMo: Expressive Neural Motion Reenactment via Disentangled Latent Attention [52.94097577075215]
X-NeMo is a zero-shot diffusion-based portrait animation pipeline.<n>It animates a static portrait using facial movements from a driving video of a different individual.
arXiv Detail & Related papers (2025-07-30T22:46:52Z)
OSDFace: One-Step Diffusion Model for Face Restoration [72.5045389847792]
Diffusion models have demonstrated impressive performance in face restoration. We propose OSDFace, a novel one-step diffusion model for face restoration. Results demonstrate that OSDFace surpasses current state-of-the-art (SOTA) methods in both visual quality and quantitative metrics.
arXiv Detail & Related papers (2024-11-26T07:07:48Z)
Realistic and Efficient Face Swapping: A Unified Approach with Diffusion Models [69.50286698375386]
We propose a novel approach that better harnesses diffusion models for face-swapping. We introduce a mask shuffling technique during inpainting training, which allows us to create a so-called universal model for swapping. Ours is a relatively unified approach and so it is resilient to errors in other off-the-shelf models.
arXiv Detail & Related papers (2024-09-11T13:43:53Z)
GLDiTalker: Speech-Driven 3D Facial Animation with Graph Latent Diffusion Transformer [26.567649613966974]
Speech-driven 3D facial animation model based on a Graph Latent Transformer.<n> GLDiTalker resolves misalignment by diffusing signals within a quantizedtemporal latent space.<n>It employs a two-stage training pipeline: the Graph-Enhanced Space Quantized Learning Stage ensures lip-sync accuracy, and the Space-Time Powered Latent Diffusion Stage enhances motion diversity.
arXiv Detail & Related papers (2024-08-03T17:18:26Z)
ScanTalk: 3D Talking Heads from Unregistered Scans [13.003073077799835]
We present textbfScanTalk, a novel framework capable of animating 3D faces in arbitrary topologies including scanned data. Our approach relies on the DiffusionNet architecture to overcome the fixed topology constraint, offering promising avenues for more flexible and realistic 3D animations.
arXiv Detail & Related papers (2024-03-16T14:58:58Z)
High-Fidelity Face Swapping with Style Blending [16.024260677867076]
We propose an innovative end-to-end framework for high-fidelity face swapping. First, we introduce a StyleGAN-based facial attributes encoder that extracts essential features from faces and inverts them into a latent style code. Second, we introduce an attention-based style blending module to effectively transfer Face IDs from source to target.
arXiv Detail & Related papers (2023-12-17T23:22:37Z)
ImFace++: A Sophisticated Nonlinear 3D Morphable Face Model with Implicit Neural Representations [25.016000421755162]
This paper presents a novel 3D morphable face model, named ImFace++, to learn a sophisticated and continuous space with implicit neural representations. ImFace++ first constructs two explicitly disentangled deformation fields to model complex shapes associated with identities and expressions. A refinement displacement field within the template space is further incorporated, enabling fine-grained learning of individual-specific facial details.
arXiv Detail & Related papers (2023-12-07T03:53:53Z)
Decaf: Monocular Deformation Capture for Face and Hand Interactions [77.75726740605748]
This paper introduces the first method that allows tracking human hands interacting with human faces in 3D from single monocular RGB videos. We model hands as articulated objects inducing non-rigid face deformations during an active interaction. Our method relies on a new hand-face motion and interaction capture dataset with realistic face deformations acquired with a markerless multi-view camera system.
arXiv Detail & Related papers (2023-09-28T17:59:51Z)
GaFET: Learning Geometry-aware Facial Expression Translation from In-The-Wild Images [55.431697263581626]
We introduce a novel Geometry-aware Facial Expression Translation framework, which is based on parametric 3D facial representations and can stably decoupled expression. We achieve higher-quality and more accurate facial expression transfer results compared to state-of-the-art methods, and demonstrate applicability of various poses and complex textures.
arXiv Detail & Related papers (2023-08-07T09:03:35Z)
CodeTalker: Speech-Driven 3D Facial Animation with Discrete Motion Prior [27.989344587876964]
Speech-driven 3D facial animation has been widely studied, yet there is still a gap to achieving realism and vividness. We propose to cast speech-driven facial animation as a code query task in a finite proxy space of the learned codebook. We demonstrate that our approach outperforms current state-of-the-art methods both qualitatively and quantitatively.
arXiv Detail & Related papers (2023-01-06T05:04:32Z)
Neural Face Models for Example-Based Visual Speech Synthesis [2.2817442144155207]
We present a marker-less approach for facial motion capture based on multi-view video. We learn a neural representation of facial expressions, which is used to seamlessly facial performances during the animation procedure.
arXiv Detail & Related papers (2020-09-22T07:35:33Z)
InterFaceGAN: Interpreting the Disentangled Face Representation Learned by GANs [73.27299786083424]
We propose a framework called InterFaceGAN to interpret the disentangled face representation learned by state-of-the-art GAN models. We first find that GANs learn various semantics in some linear subspaces of the latent space. We then conduct a detailed study on the correlation between different semantics and manage to better disentangle them via subspace projection.
arXiv Detail & Related papers (2020-05-18T18:01:22Z)

This list is automatically generated from the titles and abstracts of the papers in this site.