Related papers: Realistic Face Reenactment via Self-Supervised Disentangling of Identity and Pose

Realistic Face Reenactment via Self-Supervised Disentangling of Identity and Pose

URL: http://arxiv.org/abs/2003.12957v1
Date: Sun, 29 Mar 2020 06:45:17 GMT
Title: Realistic Face Reenactment via Self-Supervised Disentangling of Identity and Pose
Authors: Xianfang Zeng, Yusu Pan, Mengmeng Wang, Jiangning Zhang, Yong Liu
Abstract summary: We propose a self-supervised hybrid model (DAE-GAN) that learns how to reenact face naturally given large amounts of unlabeled videos. Our approach combines two deforming autoencoders with the latest advances in the conditional generation. Experiment results demonstrate the superior quality of reenacted images and the flexibility of transferring facial movements between identities.
Score: 23.211318473026243
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Recent works have shown how realistic talking face images can be obtained under the supervision of geometry guidance, e.g., facial landmark or boundary. To alleviate the demand for manual annotations, in this paper, we propose a novel self-supervised hybrid model (DAE-GAN) that learns how to reenact face naturally given large amounts of unlabeled videos. Our approach combines two deforming autoencoders with the latest advances in the conditional generation. On the one hand, we adopt the deforming autoencoder to disentangle identity and pose representations. A strong prior in talking face videos is that each frame can be encoded as two parts: one for video-specific identity and the other for various poses. Inspired by that, we utilize a multi-frame deforming autoencoder to learn a pose-invariant embedded face for each video. Meanwhile, a multi-scale deforming autoencoder is proposed to extract pose-related information for each frame. On the other hand, the conditional generator allows for enhancing fine details and overall reality. It leverages the disentangled features to generate photo-realistic and pose-alike face images. We evaluate our model on VoxCeleb1 and RaFD dataset. Experiment results demonstrate the superior quality of reenacted images and the flexibility of transferring facial movements between identities.

Related papers

VividFace: A Diffusion-Based Hybrid Framework for High-Fidelity Video Face Swapping [43.30061680192465]
We present the first diffusion-based framework specifically designed for video face swapping. Our approach incorporates a specially designed diffusion model coupled with a VidFaceVAE. Our framework achieves superior performance in identity preservation, temporal consistency, and visual quality compared to existing methods.
arXiv Detail & Related papers (2024-12-15T18:58:32Z)
Replace Anyone in Videos [82.37852750357331]
We present the ReplaceAnyone framework, which focuses on localized human replacement and insertion featuring intricate backgrounds.<n>We formulate this task as an image-conditioned video inpainting paradigm with pose guidance, utilizing a unified end-to-end video diffusion architecture.<n>The proposed ReplaceAnyone can be seamlessly applied not only to traditional 3D-UNet base models but also to DiT-based video models such as Wan2.1.
arXiv Detail & Related papers (2024-09-30T03:27:33Z)
GaussianHeads: End-to-End Learning of Drivable Gaussian Head Avatars from Coarse-to-fine Representations [54.94362657501809]
We propose a new method to generate highly dynamic and deformable human head avatars from multi-view imagery in real-time. At the core of our method is a hierarchical representation of head models that allows to capture the complex dynamics of facial expressions and head movements. We train this coarse-to-fine facial avatar model along with the head pose as a learnable parameter in an end-to-end framework.
arXiv Detail & Related papers (2024-09-18T13:05:43Z)
VividPose: Advancing Stable Video Diffusion for Realistic Human Image Animation [79.99551055245071]
We propose VividPose, an end-to-end pipeline that ensures superior temporal stability. An identity-aware appearance controller integrates additional facial information without compromising other appearance details. A geometry-aware pose controller utilizes both dense rendering maps from SMPL-X and sparse skeleton maps. VividPose exhibits superior generalization capabilities on our proposed in-the-wild dataset.
arXiv Detail & Related papers (2024-05-28T13:18:32Z)
Arc2Face: A Foundation Model for ID-Consistent Human Faces [95.00331107591859]
Arc2Face is an identity-conditioned face foundation model. It can generate diverse photo-realistic images with an unparalleled degree of face similarity than existing models.
arXiv Detail & Related papers (2024-03-18T10:32:51Z)
StableIdentity: Inserting Anybody into Anywhere at First Sight [57.99693188913382]
We propose StableIdentity, which allows identity-consistent recontextualization with just one face image. We are the first to directly inject the identity learned from a single image into video/3D generation without finetuning.
arXiv Detail & Related papers (2024-01-29T09:06:15Z)
FAAC: Facial Animation Generation with Anchor Frame and Conditional Control for Superior Fidelity and Editability [14.896554342627551]
We introduce a facial animation generation method that enhances both face identity fidelity and editing capabilities. This approach incorporates the concept of an anchor frame to counteract the degradation of generative ability in original text-to-image models. Our method's efficacy has been validated on multiple representative DreamBooth and LoRA models.
arXiv Detail & Related papers (2023-12-06T02:55:35Z)
FaceChain: A Playground for Human-centric Artificial Intelligence Generated Content [36.48960592782015]
FaceChain is a personalized portrait generation framework that combines a series of customized image-generation model and a rich set of face-related perceptual understanding models. We inject several SOTA face models into the generation procedure, achieving a more efficient label-tagging, data-processing, and model post-processing compared to previous solutions. Based on FaceChain, we further develop several applications to build a broader playground for better showing its value, including virtual try on and 2D talking head.
arXiv Detail & Related papers (2023-08-28T02:20:44Z)
DAE-Talker: High Fidelity Speech-Driven Talking Face Generation with Diffusion Autoencoder [20.814063371439904]
We propose DAE-Talker to synthesize full video frames and produce natural head movements that align with the content of speech. We also introduce pose modelling in speech2latent for pose controllability. Our experiments show that DAE-Talker outperforms existing popular methods in lip-sync, video fidelity, and pose naturalness.
arXiv Detail & Related papers (2023-03-30T17:18:31Z)
Image-to-Video Generation via 3D Facial Dynamics [78.01476554323179]
We present a versatile model, FaceAnime, for various video generation tasks from still images. Our model is versatile for various AR/VR and entertainment applications, such as face video and face video prediction.
arXiv Detail & Related papers (2021-05-31T02:30:11Z)
Pose-Controllable Talking Face Generation by Implicitly Modularized Audio-Visual Representation [96.66010515343106]
We propose a clean yet effective framework to generate pose-controllable talking faces. We operate on raw face images, using only a single photo as an identity reference. Our model has multiple advanced capabilities including extreme view robustness and talking face frontalization.
arXiv Detail & Related papers (2021-04-22T15:10:26Z)
VariTex: Variational Neural Face Textures [0.0]
VariTex is a method that learns a variational latent feature space of neural face textures. To generate images of complete human heads, we propose an additive decoder that generates plausible additional details such as hair. The resulting method can generate geometrically consistent images of novel identities allowing fine-grained control over head pose, face shape, and facial expressions.
arXiv Detail & Related papers (2021-04-13T07:47:53Z)

This list is automatically generated from the titles and abstracts of the papers in this site.