High-resolution Face Swapping via Latent Semantics Disentanglement
- URL: http://arxiv.org/abs/2203.15958v1
- Date: Wed, 30 Mar 2022 00:33:08 GMT
- Title: High-resolution Face Swapping via Latent Semantics Disentanglement
- Authors: Yangyang Xu and Bailin Deng and Junle Wang and Yanqing Jing and Jia
Pan and Shengfeng He
- Abstract summary: We present a novel high-resolution hallucination face swapping method using the inherent prior knowledge of a pre-trained GAN model.
We explicitly disentangle the latent semantics by utilizing the progressive nature of the generator.
We extend our method to video face swapping by enforcing two-temporal constraints on the latent space and the image space.
- Score: 50.23624681222619
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: We present a novel high-resolution face swapping method using the inherent
prior knowledge of a pre-trained GAN model. Although previous research can
leverage generative priors to produce high-resolution results, their quality
can suffer from the entangled semantics of the latent space. We explicitly
disentangle the latent semantics by utilizing the progressive nature of the
generator, deriving structure attributes from the shallow layers and appearance
attributes from the deeper ones. Identity and pose information within the
structure attributes are further separated by introducing a landmark-driven
structure transfer latent direction. The disentangled latent code produces rich
generative features that incorporate feature blending to produce a plausible
swapping result. We further extend our method to video face swapping by
enforcing two spatio-temporal constraints on the latent space and the image
space. Extensive experiments demonstrate that the proposed method outperforms
state-of-the-art image/video face swapping methods in terms of hallucination
quality and consistency. Code can be found at:
https://github.com/cnnlstm/FSLSD_HiRes.
Related papers
- Coarse-to-Fine Latent Diffusion for Pose-Guided Person Image Synthesis [65.7968515029306]
We propose a novel Coarse-to-Fine Latent Diffusion (CFLD) method for Pose-Guided Person Image Synthesis (PGPIS)
A perception-refined decoder is designed to progressively refine a set of learnable queries and extract semantic understanding of person images as a coarse-grained prompt.
arXiv Detail & Related papers (2024-02-28T06:07:07Z) - CLR-Face: Conditional Latent Refinement for Blind Face Restoration Using
Score-Based Diffusion Models [57.9771859175664]
Recent generative-prior-based methods have shown promising blind face restoration performance.
Generating fine-grained facial details faithful to inputs remains a challenging problem.
We introduce a diffusion-based-prior inside a VQGAN architecture that focuses on learning the distribution over uncorrupted latent embeddings.
arXiv Detail & Related papers (2024-02-08T23:51:49Z) - High-Fidelity Face Swapping with Style Blending [16.024260677867076]
We propose an innovative end-to-end framework for high-fidelity face swapping.
First, we introduce a StyleGAN-based facial attributes encoder that extracts essential features from faces and inverts them into a latent style code.
Second, we introduce an attention-based style blending module to effectively transfer Face IDs from source to target.
arXiv Detail & Related papers (2023-12-17T23:22:37Z) - ExtSwap: Leveraging Extended Latent Mapper for Generating High Quality
Face Swapping [11.626508630081362]
We present a novel face swapping method using the progressively growing structure of a pre-trained StyleGAN.
We disentangle semantics by deriving identity and attribute features separately.
arXiv Detail & Related papers (2023-10-19T13:33:55Z) - Hierarchical Diffusion Autoencoders and Disentangled Image Manipulation [36.20575570779196]
We exploit the fine-grained-to-abstract and lowlevel-to-high-level feature hierarchy for the latent space of diffusion models.
The hierarchical latent space of HDAE inherently encodes different abstract levels of semantics and provides more comprehensive semantic representations.
We demonstrate the effectiveness of our proposed approach with extensive experiments and applications on image reconstruction, style mixing, controllable, detail-preserving and disentangled image manipulation.
arXiv Detail & Related papers (2023-04-24T05:35:59Z) - StyleSwap: Style-Based Generator Empowers Robust Face Swapping [90.05775519962303]
We introduce a concise and effective framework named StyleSwap.
Our core idea is to leverage a style-based generator to empower high-fidelity and robust face swapping.
We identify that with only minimal modifications, a StyleGAN2 architecture can successfully handle the desired information from both source and target.
arXiv Detail & Related papers (2022-09-27T16:35:16Z) - GSMFlow: Generation Shifts Mitigating Flow for Generalized Zero-Shot
Learning [55.79997930181418]
Generalized Zero-Shot Learning aims to recognize images from both the seen and unseen classes by transferring semantic knowledge from seen to unseen classes.
It is a promising solution to take the advantage of generative models to hallucinate realistic unseen samples based on the knowledge learned from the seen classes.
We propose a novel flow-based generative framework that consists of multiple conditional affine coupling layers for learning unseen data generation.
arXiv Detail & Related papers (2022-07-05T04:04:37Z) - Latent Transformations via NeuralODEs for GAN-based Image Editing [25.272389610447856]
We show that nonlinear latent code manipulations realized as flows of a trainable Neural ODE are beneficial for many practical non-face image domains.
In particular, we investigate a large number of datasets with known attributes and demonstrate that certain attribute manipulations are challenging to obtain with linear shifts only.
arXiv Detail & Related papers (2021-11-29T18:59:54Z) - InterFaceGAN: Interpreting the Disentangled Face Representation Learned
by GANs [73.27299786083424]
We propose a framework called InterFaceGAN to interpret the disentangled face representation learned by state-of-the-art GAN models.
We first find that GANs learn various semantics in some linear subspaces of the latent space.
We then conduct a detailed study on the correlation between different semantics and manage to better disentangle them via subspace projection.
arXiv Detail & Related papers (2020-05-18T18:01:22Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.