Realistic Face Reenactment via Self-Supervised Disentangling of Identity
and Pose
- URL: http://arxiv.org/abs/2003.12957v1
- Date: Sun, 29 Mar 2020 06:45:17 GMT
- Title: Realistic Face Reenactment via Self-Supervised Disentangling of Identity
and Pose
- Authors: Xianfang Zeng, Yusu Pan, Mengmeng Wang, Jiangning Zhang, Yong Liu
- Abstract summary: We propose a self-supervised hybrid model (DAE-GAN) that learns how to reenact face naturally given large amounts of unlabeled videos.
Our approach combines two deforming autoencoders with the latest advances in the conditional generation.
Experiment results demonstrate the superior quality of reenacted images and the flexibility of transferring facial movements between identities.
- Score: 23.211318473026243
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Recent works have shown how realistic talking face images can be obtained
under the supervision of geometry guidance, e.g., facial landmark or boundary.
To alleviate the demand for manual annotations, in this paper, we propose a
novel self-supervised hybrid model (DAE-GAN) that learns how to reenact face
naturally given large amounts of unlabeled videos. Our approach combines two
deforming autoencoders with the latest advances in the conditional generation.
On the one hand, we adopt the deforming autoencoder to disentangle identity and
pose representations. A strong prior in talking face videos is that each frame
can be encoded as two parts: one for video-specific identity and the other for
various poses. Inspired by that, we utilize a multi-frame deforming autoencoder
to learn a pose-invariant embedded face for each video. Meanwhile, a
multi-scale deforming autoencoder is proposed to extract pose-related
information for each frame. On the other hand, the conditional generator allows
for enhancing fine details and overall reality. It leverages the disentangled
features to generate photo-realistic and pose-alike face images. We evaluate
our model on VoxCeleb1 and RaFD dataset. Experiment results demonstrate the
superior quality of reenacted images and the flexibility of transferring facial
movements between identities.
Related papers
- GaussianHeads: End-to-End Learning of Drivable Gaussian Head Avatars from Coarse-to-fine Representations [54.94362657501809]
We propose a new method to generate highly dynamic and deformable human head avatars from multi-view imagery in real-time.
At the core of our method is a hierarchical representation of head models that allows to capture the complex dynamics of facial expressions and head movements.
We train this coarse-to-fine facial avatar model along with the head pose as a learnable parameter in an end-to-end framework.
arXiv Detail & Related papers (2024-09-18T13:05:43Z) - VividPose: Advancing Stable Video Diffusion for Realistic Human Image Animation [79.99551055245071]
We propose VividPose, an end-to-end pipeline that ensures superior temporal stability.
An identity-aware appearance controller integrates additional facial information without compromising other appearance details.
A geometry-aware pose controller utilizes both dense rendering maps from SMPL-X and sparse skeleton maps.
VividPose exhibits superior generalization capabilities on our proposed in-the-wild dataset.
arXiv Detail & Related papers (2024-05-28T13:18:32Z) - Arc2Face: A Foundation Model for ID-Consistent Human Faces [95.00331107591859]
Arc2Face is an identity-conditioned face foundation model.
It can generate diverse photo-realistic images with an unparalleled degree of face similarity than existing models.
arXiv Detail & Related papers (2024-03-18T10:32:51Z) - StableIdentity: Inserting Anybody into Anywhere at First Sight [57.99693188913382]
We propose StableIdentity, which allows identity-consistent recontextualization with just one face image.
We are the first to directly inject the identity learned from a single image into video/3D generation without finetuning.
arXiv Detail & Related papers (2024-01-29T09:06:15Z) - FAAC: Facial Animation Generation with Anchor Frame and Conditional
Control for Superior Fidelity and Editability [14.896554342627551]
We introduce a facial animation generation method that enhances both face identity fidelity and editing capabilities.
This approach incorporates the concept of an anchor frame to counteract the degradation of generative ability in original text-to-image models.
Our method's efficacy has been validated on multiple representative DreamBooth and LoRA models.
arXiv Detail & Related papers (2023-12-06T02:55:35Z) - FaceChain: A Playground for Human-centric Artificial Intelligence
Generated Content [36.48960592782015]
FaceChain is a personalized portrait generation framework that combines a series of customized image-generation model and a rich set of face-related perceptual understanding models.
We inject several SOTA face models into the generation procedure, achieving a more efficient label-tagging, data-processing, and model post-processing compared to previous solutions.
Based on FaceChain, we further develop several applications to build a broader playground for better showing its value, including virtual try on and 2D talking head.
arXiv Detail & Related papers (2023-08-28T02:20:44Z) - DAE-Talker: High Fidelity Speech-Driven Talking Face Generation with
Diffusion Autoencoder [20.814063371439904]
We propose DAE-Talker to synthesize full video frames and produce natural head movements that align with the content of speech.
We also introduce pose modelling in speech2latent for pose controllability.
Our experiments show that DAE-Talker outperforms existing popular methods in lip-sync, video fidelity, and pose naturalness.
arXiv Detail & Related papers (2023-03-30T17:18:31Z) - Image-to-Video Generation via 3D Facial Dynamics [78.01476554323179]
We present a versatile model, FaceAnime, for various video generation tasks from still images.
Our model is versatile for various AR/VR and entertainment applications, such as face video and face video prediction.
arXiv Detail & Related papers (2021-05-31T02:30:11Z) - Pose-Controllable Talking Face Generation by Implicitly Modularized
Audio-Visual Representation [96.66010515343106]
We propose a clean yet effective framework to generate pose-controllable talking faces.
We operate on raw face images, using only a single photo as an identity reference.
Our model has multiple advanced capabilities including extreme view robustness and talking face frontalization.
arXiv Detail & Related papers (2021-04-22T15:10:26Z) - VariTex: Variational Neural Face Textures [0.0]
VariTex is a method that learns a variational latent feature space of neural face textures.
To generate images of complete human heads, we propose an additive decoder that generates plausible additional details such as hair.
The resulting method can generate geometrically consistent images of novel identities allowing fine-grained control over head pose, face shape, and facial expressions.
arXiv Detail & Related papers (2021-04-13T07:47:53Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.