Out-of-domain GAN inversion via Invertibility Decomposition for
Photo-Realistic Human Face Manipulation
- URL: http://arxiv.org/abs/2212.09262v2
- Date: Thu, 8 Jun 2023 15:20:36 GMT
- Title: Out-of-domain GAN inversion via Invertibility Decomposition for
Photo-Realistic Human Face Manipulation
- Authors: Xin Yang, Xiaogang Xu, Yingcong Chen
- Abstract summary: We propose a novel framework that enhances the fidelity of human face inversion by designing a new module.
Unlike previous works, our invertibility detector is simultaneously learned with a spatial alignment module.
Our method produces photo-realistic results for real-world human face image inversion and manipulation.
- Score: 22.71398343370642
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The fidelity of Generative Adversarial Networks (GAN) inversion is impeded by
Out-Of-Domain (OOD) areas (e.g., background, accessories) in the image.
Detecting the OOD areas beyond the generation ability of the pre-trained model
and blending these regions with the input image can enhance fidelity. The
"invertibility mask" figures out these OOD areas, and existing methods predict
the mask with the reconstruction error. However, the estimated mask is usually
inaccurate due to the influence of the reconstruction error in the In-Domain
(ID) area. In this paper, we propose a novel framework that enhances the
fidelity of human face inversion by designing a new module to decompose the
input images to ID and OOD partitions with invertibility masks. Unlike previous
works, our invertibility detector is simultaneously learned with a spatial
alignment module. We iteratively align the generated features to the input
geometry and reduce the reconstruction error in the ID regions. Thus, the OOD
areas are more distinguishable and can be precisely predicted. Then, we improve
the fidelity of our results by blending the OOD areas from the input image with
the ID GAN inversion results. Our method produces photo-realistic results for
real-world human face image inversion and manipulation. Extensive experiments
demonstrate our method's superiority over existing methods in the quality of
GAN inversion and attribute manipulation.
Related papers
- Dual form Complementary Masking for Domain-Adaptive Image Segmentation [44.81357028765057]
We propose MaskTwins, a framework that integrates masked reconstruction directly into the main training pipeline.<n>MaskTwins uncovers intrinsic structural patterns that persist across disparate domains by enforcing consistency between predictions of images masked in complementary ways.<n>These results demonstrate the significant advantages of MaskTwins in extracting domain-invariant features without the need for separate pre-training.
arXiv Detail & Related papers (2025-07-16T08:05:22Z) - High-Fidelity Image Inpainting with Multimodal Guided GAN Inversion [15.202130790708747]
We propose a novel GAN inversion approach, dubbed MMInvertFill, for image inpainting.
MMInvertFill contains primarily a multimodal guided encoder with a pre-modulation and a GAN generator with F&W+ latent space.
We show that our MMInvertFill qualitatively and quantitatively outperforms other state-of-the-arts.
arXiv Detail & Related papers (2025-04-17T10:58:45Z) - IterMask3D: Unsupervised Anomaly Detection and Segmentation with Test-Time Iterative Mask Refinement in 3D Brain MR [10.763588041592703]
Unsupervised anomaly detection and segmentation methods train a model to learn the training distribution as 'normal'
prevailing methods corrupt the images and train a model to reconstruct them.
We propose IterMask3D, an iterative spatial mask-refining strategy designed for 3D brain MRI.
arXiv Detail & Related papers (2025-04-07T10:41:23Z) - LADMIM: Logical Anomaly Detection with Masked Image Modeling in Discrete Latent Space [0.0]
Masked image modeling is a self-supervised learning technique that predicts the feature representation of masked regions in an image.
We propose a novel approach that leverages the characteristics of MIM to detect logical anomalies effectively.
We evaluate the proposed method on the MVTecLOCO dataset, achieving an average AUC of 0.867.
arXiv Detail & Related papers (2024-10-14T07:50:56Z) - DiAD: A Diffusion-based Framework for Multi-class Anomaly Detection [55.48770333927732]
We propose a Difusion-based Anomaly Detection (DiAD) framework for multi-class anomaly detection.
It consists of a pixel-space autoencoder, a latent-space Semantic-Guided (SG) network with a connection to the stable diffusion's denoising network, and a feature-space pre-trained feature extractor.
Experiments on MVTec-AD and VisA datasets demonstrate the effectiveness of our approach.
arXiv Detail & Related papers (2023-12-11T18:38:28Z) - In-Domain GAN Inversion for Faithful Reconstruction and Editability [132.68255553099834]
We propose in-domain GAN inversion, which consists of a domain-guided domain-regularized and a encoder to regularize the inverted code in the native latent space of the pre-trained GAN model.
We make comprehensive analyses on the effects of the encoder structure, the starting inversion point, as well as the inversion parameter space, and observe the trade-off between the reconstruction quality and the editing property.
arXiv Detail & Related papers (2023-09-25T08:42:06Z) - RGI: robust GAN-inversion for mask-free image inpainting and
unsupervised pixel-wise anomaly detection [18.10039647382319]
We propose a Robust GAN-inversion (RGI) method with a provable robustness guarantee to achieve image restoration under unknown textitgross corruptions.
We show that the restored image and the identified corrupted region mask convergeally to the ground truth.
The proposed RGI/R-RGI method unifies two important applications with state-of-the-art (SOTA) performance.
arXiv Detail & Related papers (2023-02-24T05:43:03Z) - PC-GANs: Progressive Compensation Generative Adversarial Networks for
Pan-sharpening [50.943080184828524]
We propose a novel two-step model for pan-sharpening that sharpens the MS image through the progressive compensation of the spatial and spectral information.
The whole model is composed of triple GANs, and based on the specific architecture, a joint compensation loss function is designed to enable the triple GANs to be trained simultaneously.
arXiv Detail & Related papers (2022-07-29T03:09:21Z) - Editing Out-of-domain GAN Inversion via Differential Activations [56.62964029959131]
We propose a novel GAN prior based editing framework to tackle the out-of-domain inversion problem with a composition-decomposition paradigm.
With the aid of the generated Diff-CAM mask, a coarse reconstruction can intuitively be composited by the paired original and edited images.
In the decomposition phase, we further present a GAN prior based deghosting network for separating the final fine edited image from the coarse reconstruction.
arXiv Detail & Related papers (2022-07-17T10:34:58Z) - REPLICA: Enhanced Feature Pyramid Network by Local Image Translation and
Conjunct Attention for High-Resolution Breast Tumor Detection [6.112883009328882]
We call our method enhanced featuREsynthesis network by Local Image translation and Conjunct Attention, or REPLICA.
We use a convolutional autoencoder as a generator to create new images by injecting objects into images via local Pyramid and reconstruction of their features extracted in hidden layers.
Then due to the larger number of simulated images, we use a visual transformer to enhance outputs of each ResNet layer that serve as inputs to a feature pyramid network.
arXiv Detail & Related papers (2021-11-22T21:33:02Z) - Inverting Generative Adversarial Renderer for Face Reconstruction [58.45125455811038]
In this work, we introduce a novel Generative Adversa Renderer (GAR)
GAR learns to model the complicated real-world image, instead of relying on the graphics rules, it is capable of producing realistic images.
Our method achieves state-of-the-art performances on multiple face reconstruction.
arXiv Detail & Related papers (2021-05-06T04:16:06Z) - InterFaceGAN: Interpreting the Disentangled Face Representation Learned
by GANs [73.27299786083424]
We propose a framework called InterFaceGAN to interpret the disentangled face representation learned by state-of-the-art GAN models.
We first find that GANs learn various semantics in some linear subspaces of the latent space.
We then conduct a detailed study on the correlation between different semantics and manage to better disentangle them via subspace projection.
arXiv Detail & Related papers (2020-05-18T18:01:22Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.