Related papers: One-Shot Face Video Re-enactment using Hybrid Latent Spaces of StyleGAN2

One-Shot Face Video Re-enactment using Hybrid Latent Spaces of StyleGAN2

URL: http://arxiv.org/abs/2302.07848v1
Date: Wed, 15 Feb 2023 18:34:15 GMT
Title: One-Shot Face Video Re-enactment using Hybrid Latent Spaces of StyleGAN2
Authors: Trevine Oorloff and Yaser Yacoob
Abstract summary: We propose an end-to-end framework for simultaneously supporting face edits, facial motions and deformations, and facial identity control for video generation. We employ StyleGAN2 generator to achieve high-fidelity face video re-enactment at $10242$.
Score: 0.7614628596146599
License: http://creativecommons.org/licenses/by/4.0/
Abstract: While recent research has progressively overcome the low-resolution constraint of one-shot face video re-enactment with the help of StyleGAN's high-fidelity portrait generation, these approaches rely on at least one of the following: explicit 2D/3D priors, optical flow based warping as motion descriptors, off-the-shelf encoders, etc., which constrain their performance (e.g., inconsistent predictions, inability to capture fine facial details and accessories, poor generalization, artifacts). We propose an end-to-end framework for simultaneously supporting face attribute edits, facial motions and deformations, and facial identity control for video generation. It employs a hybrid latent-space that encodes a given frame into a pair of latents: Identity latent, $\mathcal{W}_{ID}$, and Facial deformation latent, $\mathcal{S}_F$, that respectively reside in the $W+$ and $SS$ spaces of StyleGAN2. Thereby, incorporating the impressive editability-distortion trade-off of $W+$ and the high disentanglement properties of $SS$. These hybrid latents employ the StyleGAN2 generator to achieve high-fidelity face video re-enactment at $1024^2$. Furthermore, the model supports the generation of realistic re-enactment videos with other latent-based semantic edits (e.g., beard, age, make-up, etc.). Qualitative and quantitative analyses performed against state-of-the-art methods demonstrate the superiority of the proposed approach.

Related papers

UniF$^2$ace: A Unified Fine-grained Face Understanding and Generation Model [62.66515621965686]
We introduce a novel theoretical framework with a Dual Discrete Diffusion (D3Diff) loss, unifying masked generative models with discrete score matching diffusion.<n>This D3Diff significantly enhances the model's ability to synthesize high-fidelity facial details aligned with text input.<n>We construct UniF$2$aceD-1M, a large-scale dataset comprising 130K fine-grained image-caption pairs and 1M visual question-answering pairs.
arXiv Detail & Related papers (2025-03-11T07:34:59Z)
OSDFace: One-Step Diffusion Model for Face Restoration [72.5045389847792]
Diffusion models have demonstrated impressive performance in face restoration. We propose OSDFace, a novel one-step diffusion model for face restoration. Results demonstrate that OSDFace surpasses current state-of-the-art (SOTA) methods in both visual quality and quantitative metrics.
arXiv Detail & Related papers (2024-11-26T07:07:48Z)
Efficient Video Face Enhancement with Enhanced Spatial-Temporal Consistency [36.939731355462264]
This study proposes a novel and efficient blind video face enhancement method. It restores high-quality videos from their compressed low-quality versions with an effective de-flickering mechanism. Experiments conducted on the VFHQ-Test dataset demonstrate that our method surpasses the current state-of-the-art blind face video restoration and de-flickering methods on both efficiency and effectiveness.
arXiv Detail & Related papers (2024-11-25T15:14:36Z)
CLR-Face: Conditional Latent Refinement for Blind Face Restoration Using Score-Based Diffusion Models [57.9771859175664]
Recent generative-prior-based methods have shown promising blind face restoration performance. Generating fine-grained facial details faithful to inputs remains a challenging problem. We introduce a diffusion-based-prior inside a VQGAN architecture that focuses on learning the distribution over uncorrupted latent embeddings.
arXiv Detail & Related papers (2024-02-08T23:51:49Z)
High-Fidelity Face Swapping with Style Blending [16.024260677867076]
We propose an innovative end-to-end framework for high-fidelity face swapping. First, we introduce a StyleGAN-based facial attributes encoder that extracts essential features from faces and inverts them into a latent style code. Second, we introduce an attention-based style blending module to effectively transfer Face IDs from source to target.
arXiv Detail & Related papers (2023-12-17T23:22:37Z)
When StyleGAN Meets Stable Diffusion: a $\mathscr{W}_+$ Adapter for Personalized Image Generation [60.305112612629465]
Text-to-image diffusion models have excelled in producing diverse, high-quality, and photo-realistic images. We present a novel use of the extended StyleGAN embedding space $mathcalW_+$ to achieve enhanced identity preservation and disentanglement for diffusion models. Our method adeptly generates personalized text-to-image outputs that are not only compatible with prompt descriptions but also amenable to common StyleGAN editing directions.
arXiv Detail & Related papers (2023-11-29T09:05:14Z)
Controllable One-Shot Face Video Synthesis With Semantic Aware Prior [10.968343822308812]
One-shot talking-head synthesis task aims to animate a source image to another pose and expression, which is dictated by a driving frame. Recent methods rely on warping the appearance feature extracted from the source, by using motion fields estimated from the sparse keypoints, that are learned in an unsupervised manner. We propose a novel method that leverages the rich face prior information.
arXiv Detail & Related papers (2023-04-27T19:17:13Z)
Perceptual Quality Assessment of Face Video Compression: A Benchmark and An Effective Method [69.868145936998]
Generative coding approaches have been identified as promising alternatives with reasonable perceptual rate-distortion trade-offs. The great diversity of distortion types in spatial and temporal domains, ranging from the traditional hybrid coding frameworks to generative models, present grand challenges in compressed face video quality assessment (VQA) We introduce the large-scale Compressed Face Video Quality Assessment (CFVQA) database, which is the first attempt to systematically understand the perceptual quality and diversified compression distortions in face videos.
arXiv Detail & Related papers (2023-04-14T11:26:09Z)
Video2StyleGAN: Encoding Video in Latent Space for Manipulation [63.03250800510085]
We propose a novel network to encode face videos into the latent space of StyleGAN for semantic face video manipulation. Our approach can significantly outperform existing single image methods, while achieving real-time (66 fps) speed.
arXiv Detail & Related papers (2022-06-27T06:48:15Z)
High-resolution Face Swapping via Latent Semantics Disentanglement [50.23624681222619]
We present a novel high-resolution hallucination face swapping method using the inherent prior knowledge of a pre-trained GAN model. We explicitly disentangle the latent semantics by utilizing the progressive nature of the generator. We extend our method to video face swapping by enforcing two-temporal constraints on the latent space and the image space.
arXiv Detail & Related papers (2022-03-30T00:33:08Z)
FSGANv2: Improved Subject Agnostic Face Swapping and Reenactment [28.83743270895698]
We present Face Swapping GAN (FSGAN) for face swapping and reenactment. Unlike previous work, we offer a subject swapping scheme that can be applied to pairs of faces without requiring training on those faces. We derive a novel iterative deep learning--based approach for face reenactment which adjusts significant pose and expression variations that can be applied to a single image or a video sequence. For video sequences, we introduce a continuous agnostic of the face views based on reenactment, Delaunay Triangulation, and bary coordinates. Occluded face regions are handled by a face completion network.
arXiv Detail & Related papers (2022-02-25T21:04:39Z)
Realistic Face Reenactment via Self-Supervised Disentangling of Identity and Pose [23.211318473026243]
We propose a self-supervised hybrid model (DAE-GAN) that learns how to reenact face naturally given large amounts of unlabeled videos. Our approach combines two deforming autoencoders with the latest advances in the conditional generation. Experiment results demonstrate the superior quality of reenacted images and the flexibility of transferring facial movements between identities.
arXiv Detail & Related papers (2020-03-29T06:45:17Z)

This list is automatically generated from the titles and abstracts of the papers in this site.