Related papers: MagicPortrait: Temporally Consistent Face Reenactment with 3D Geometric Guidance

MagicPortrait: Temporally Consistent Face Reenactment with 3D Geometric Guidance

URL: http://arxiv.org/abs/2504.21497v2
Date: Sat, 10 May 2025 08:54:17 GMT
Title: MagicPortrait: Temporally Consistent Face Reenactment with 3D Geometric Guidance
Authors: Mengting Wei, Yante Li, Tuomas Varanka, Yan Jiang, Guoying Zhao,
Abstract summary: We propose a method for video face reenactment that integrates a 3D face parametric model into a latent diffusion framework.<n>Our approach employs the FLAME (Faces Learned with an Articulated Model and Expressions) model as the 3D face parametric representation.<n>We show that our method excels at generating high-quality face animations with precise expression and head pose variation modeling.
Score: 21.0593460047148
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: In this study, we propose a method for video face reenactment that integrates a 3D face parametric model into a latent diffusion framework, aiming to improve shape consistency and motion control in existing video-based face generation approaches. Our approach employs the FLAME (Faces Learned with an Articulated Model and Expressions) model as the 3D face parametric representation, providing a unified framework for modeling face expressions and head pose. This not only enables precise extraction of motion features from driving videos, but also contributes to the faithful preservation of face shape and geometry. Specifically, we enhance the latent diffusion model with rich 3D expression and detailed pose information by incorporating depth maps, normal maps, and rendering maps derived from FLAME sequences. These maps serve as motion guidance and are encoded into the denoising UNet through a specifically designed Geometric Guidance Encoder (GGE). A multi-layer feature fusion module with integrated self-attention mechanisms is used to combine facial appearance and motion latent features within the spatial domain. By utilizing the 3D face parametric model as motion guidance, our method enables parametric alignment of face identity between the reference image and the motion captured from the driving video. Experimental results on benchmark datasets show that our method excels at generating high-quality face animations with precise expression and head pose variation modeling. In addition, it demonstrates strong generalization performance on out-of-domain images. Code is publicly available at https://github.com/weimengting/MagicPortrait.

Related papers

Geometry Forcing: Marrying Video Diffusion and 3D Representation for Consistent World Modeling [29.723534231743038]
We propose Geometry Forcing to bridge the gap between video diffusion models and the underlying 3D nature of the physical world.<n>Our key insight is to guide the model's intermediate representations toward geometry-aware structure by aligning them with features from a pretrained geometric foundation model.<n>We evaluate Geometry Forcing on both camera view-conditioned and action-conditioned video generation tasks.
arXiv Detail & Related papers (2025-07-10T17:55:08Z)
DreamDance: Animating Human Images by Enriching 3D Geometry Cues from 2D Poses [57.17501809717155]
We present DreamDance, a novel method for animating human images using only skeleton pose sequences as conditional inputs. Our key insight is that human images naturally exhibit multiple levels of correlation. We construct the TikTok-Dance5K dataset, comprising 5K high-quality dance videos with detailed frame annotations.
arXiv Detail & Related papers (2024-11-30T08:42:13Z)
DreamMesh4D: Video-to-4D Generation with Sparse-Controlled Gaussian-Mesh Hybrid Representation [10.250715657201363]
We introduce DreamMesh4D, a novel framework combining mesh representation with geometric skinning technique to generate high-quality 4D object from a monocular video. Our method is compatible with modern graphic pipelines, showcasing its potential in the 3D gaming and film industry.
arXiv Detail & Related papers (2024-10-09T10:41:08Z)
G3FA: Geometry-guided GAN for Face Animation [14.488117084637631]
We introduce Geometry-guided GAN for Face Animation (G3FA) to tackle this limitation. Our novel approach empowers the face animation model to incorporate 3D information using only 2D images. In our face reenactment model, we leverage 2D motion warping to capture motion dynamics.
arXiv Detail & Related papers (2024-08-23T13:13:24Z)
Champ: Controllable and Consistent Human Image Animation with 3D Parametric Guidance [25.346255905155424]
We introduce a methodology for human image animation by leveraging a 3D human parametric model within a latent diffusion framework. By representing the 3D human parametric model as the motion guidance, we can perform parametric shape alignment of the human body between the reference image and the source video motion. Our approach also exhibits superior generalization capabilities on the proposed in-the-wild dataset.
arXiv Detail & Related papers (2024-03-21T18:52:58Z)
Decaf: Monocular Deformation Capture for Face and Hand Interactions [77.75726740605748]
This paper introduces the first method that allows tracking human hands interacting with human faces in 3D from single monocular RGB videos. We model hands as articulated objects inducing non-rigid face deformations during an active interaction. Our method relies on a new hand-face motion and interaction capture dataset with realistic face deformations acquired with a markerless multi-view camera system.
arXiv Detail & Related papers (2023-09-28T17:59:51Z)
Single-Shot Implicit Morphable Faces with Consistent Texture Parameterization [91.52882218901627]
We propose a novel method for constructing implicit 3D morphable face models that are both generalizable and intuitive for editing. Our method improves upon photo-realism, geometry, and expression accuracy compared to state-of-the-art methods.
arXiv Detail & Related papers (2023-05-04T17:58:40Z)
Towards Realistic Generative 3D Face Models [41.574628821637944]
This paper proposes a 3D controllable generative face model to produce high-quality albedo and precise 3D shape. By combining 2D face generative models with semantic face manipulation, this method enables editing of detailed 3D rendered faces.
arXiv Detail & Related papers (2023-04-24T22:47:52Z)
HiFace: High-Fidelity 3D Face Reconstruction by Learning Static and Dynamic Details [66.74088288846491]
HiFace aims at high-fidelity 3D face reconstruction with dynamic and static details. We exploit several loss functions to jointly learn the coarse shape and fine details with both synthetic and real-world datasets.
arXiv Detail & Related papers (2023-03-20T16:07:02Z)
CGOF++: Controllable 3D Face Synthesis with Conditional Generative Occupancy Fields [52.14985242487535]
We propose a new conditional 3D face synthesis framework, which enables 3D controllability over generated face images. At its core is a conditional Generative Occupancy Field (cGOF++) that effectively enforces the shape of the generated face to conform to a given 3D Morphable Model (3DMM) mesh. Experiments validate the effectiveness of the proposed method and show more precise 3D controllability than state-of-the-art 2D-based controllable face synthesis methods.
arXiv Detail & Related papers (2022-11-23T19:02:50Z)
Next3D: Generative Neural Texture Rasterization for 3D-Aware Head Avatars [36.4402388864691]
3D-aware generative adversarial networks (GANs) synthesize high-fidelity and multi-view-consistent facial images using only collections of single-view 2D imagery. Recent efforts incorporate 3D Morphable Face Model (3DMM) to describe deformation in generative radiance fields either explicitly or implicitly. We propose a novel 3D GAN framework for unsupervised learning of generative, high-quality and 3D-consistent facial avatars from unstructured 2D images.
arXiv Detail & Related papers (2022-11-21T06:40:46Z)
Geometry Driven Progressive Warping for One-Shot Face Animation [5.349852254138086]
Face animation aims at creating photo-realistic portrait videos with animated poses and expressions. We present a geometry driven model and propose two geometric patterns as guidance: 3D face rendered displacement maps and posed neural codes. We show that the proposed model can synthesize portrait videos with high fidelity and achieve the new state-of-the-art results on the VoxCeleb1 and VoxCeleb2 datasets.
arXiv Detail & Related papers (2022-10-05T17:07:06Z)
FaceVerse: a Fine-grained and Detail-controllable 3D Face Morphable Model from a Hybrid Dataset [36.688730105295015]
FaceVerse is built from hybrid East Asian face datasets containing 60K fused RGB-D images and 2K high-fidelity 3D head scan models. In the coarse module, we generate a base parametric model from large-scale RGB-D images, which is able to predict accurate rough 3D face models in different genders, ages, etc. In the fine module, a conditional StyleGAN architecture trained with high-fidelity scan models is introduced to enrich elaborate facial geometric and texture details.
arXiv Detail & Related papers (2022-03-26T12:13:14Z)
MOST-GAN: 3D Morphable StyleGAN for Disentangled Face Image Manipulation [69.35523133292389]
We propose a framework that a priori models physical attributes of the face explicitly, thus providing disentanglement by design. Our method, MOST-GAN, integrates the expressive power and photorealism of style-based GANs with the physical disentanglement and flexibility of nonlinear 3D morphable models. It achieves photorealistic manipulation of portrait images with fully disentangled 3D control over their physical attributes, enabling extreme manipulation of lighting, facial expression, and pose variations up to full profile view.
arXiv Detail & Related papers (2021-11-01T15:53:36Z)
Image-to-Video Generation via 3D Facial Dynamics [78.01476554323179]
We present a versatile model, FaceAnime, for various video generation tasks from still images. Our model is versatile for various AR/VR and entertainment applications, such as face video and face video prediction.
arXiv Detail & Related papers (2021-05-31T02:30:11Z)
DeepFaceFlow: In-the-wild Dense 3D Facial Motion Estimation [56.56575063461169]
DeepFaceFlow is a robust, fast, and highly-accurate framework for the estimation of 3D non-rigid facial flow. Our framework was trained and tested on two very large-scale facial video datasets. Given registered pairs of images, our framework generates 3D flow maps at 60 fps.
arXiv Detail & Related papers (2020-05-14T23:56:48Z)

This list is automatically generated from the titles and abstracts of the papers in this site.