What Decreases Editing Capability? Domain-Specific Hybrid Refinement for
Improved GAN Inversion
- URL: http://arxiv.org/abs/2301.12141v3
- Date: Wed, 1 Nov 2023 06:46:40 GMT
- Title: What Decreases Editing Capability? Domain-Specific Hybrid Refinement for
Improved GAN Inversion
- Authors: Pu Cao, Lu Yang, Dongxv Liu, Xiaoya Yang, Tianrui Huang, Qing Song
- Abstract summary: Inversion methods have focused on additional high-rate information in the generator to refine inversion and editing results from embedded latent codes.
A vital crux is refining inversion results, avoiding editing capability degradation.
We introduce Domain-Specific Hybrid Refinement, which draws on the advantages and disadvantages of two mainstream refinement techniques.
- Score: 3.9041061259639136
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Recently, inversion methods have focused on additional high-rate information
in the generator (e.g., weights or intermediate features) to refine inversion
and editing results from embedded latent codes. Although these techniques gain
reasonable improvement in reconstruction, they decrease editing capability,
especially on complex images (e.g., containing occlusions, detailed
backgrounds, and artifacts). A vital crux is refining inversion results,
avoiding editing capability degradation. To tackle this problem, we introduce
Domain-Specific Hybrid Refinement (DHR), which draws on the advantages and
disadvantages of two mainstream refinement techniques to maintain editing
ability with fidelity improvement. Specifically, we first propose
Domain-Specific Segmentation to segment images into two parts: in-domain and
out-of-domain parts. The refinement process aims to maintain the editability
for in-domain areas and improve two domains' fidelity. We refine these two
parts by weight modulation and feature modulation, which we call Hybrid
Modulation Refinement. Our proposed method is compatible with all latent code
embedding methods. Extension experiments demonstrate that our approach achieves
state-of-the-art in real image inversion and editing. Code is available at
https://github.com/caopulan/Domain-Specific_Hybrid_Refinement_Inversion.
Related papers
- FlexiEdit: Frequency-Aware Latent Refinement for Enhanced Non-Rigid Editing [22.308638156328968]
DDIM latent, crucial for retaining the original image's key features and layout, significantly contribute to limitations.
We introduce FlexiEdit, which enhances fidelity to input text prompts by refining DDIM latent.
Our approach represents notable progress in image editing, particularly in performing complex non-rigid edits.
arXiv Detail & Related papers (2024-07-25T08:07:40Z) - Enhancing Text-to-Image Editing via Hybrid Mask-Informed Fusion [61.42732844499658]
This paper systematically improves the text-guided image editing techniques based on diffusion models.
We incorporate human annotation as an external knowledge to confine editing within a Mask-informed'' region.
arXiv Detail & Related papers (2024-05-24T07:53:59Z) - Gradual Residuals Alignment: A Dual-Stream Framework for GAN Inversion
and Image Attribute Editing [36.01737879983636]
GAN-based image editing firstly leverages GAN Inversion to project real images into the latent space of GAN and then manipulates corresponding latent codes.
Recent inversion methods mainly utilize additional high-bit features to improve image details preservation.
During editing, existing works fail to accurately complement the lost details and suffer from poor editability.
arXiv Detail & Related papers (2024-02-22T09:28:47Z) - Noise Map Guidance: Inversion with Spatial Context for Real Image
Editing [23.513950664274997]
Text-guided diffusion models have become a popular tool in image synthesis, known for producing high-quality and diverse images.
Their application to editing real images often encounters hurdles due to the text condition deteriorating the reconstruction quality and subsequently affecting editing fidelity.
We present Noise Map Guidance (NMG), an inversion method rich in a spatial context, tailored for real-image editing.
arXiv Detail & Related papers (2024-02-07T07:16:12Z) - In-Domain GAN Inversion for Faithful Reconstruction and Editability [132.68255553099834]
We propose in-domain GAN inversion, which consists of a domain-guided domain-regularized and a encoder to regularize the inverted code in the native latent space of the pre-trained GAN model.
We make comprehensive analyses on the effects of the encoder structure, the starting inversion point, as well as the inversion parameter space, and observe the trade-off between the reconstruction quality and the editing property.
arXiv Detail & Related papers (2023-09-25T08:42:06Z) - LSAP: Rethinking Inversion Fidelity, Perception and Editability in GAN
Latent Space [42.56147568941768]
We introduce Normalized Style Space and $mathcalSN$ Cosine Distance to measure disalignment of inversion methods.
Our proposed SNCD is differentiable, it can be optimized in both encoder-based and optimization-based embedding methods to conduct a uniform solution.
arXiv Detail & Related papers (2022-09-26T14:55:21Z) - Editing Out-of-domain GAN Inversion via Differential Activations [56.62964029959131]
We propose a novel GAN prior based editing framework to tackle the out-of-domain inversion problem with a composition-decomposition paradigm.
With the aid of the generated Diff-CAM mask, a coarse reconstruction can intuitively be composited by the paired original and edited images.
In the decomposition phase, we further present a GAN prior based deghosting network for separating the final fine edited image from the coarse reconstruction.
arXiv Detail & Related papers (2022-07-17T10:34:58Z) - TransEditor: Transformer-Based Dual-Space GAN for Highly Controllable
Facial Editing [110.82128064489237]
We propose TransEditor, a novel Transformer-based framework to enhance interaction in a dual-space GAN for more controllable editing.
Experiments demonstrate the superiority of the proposed framework in image quality and editing capability, suggesting the effectiveness of TransEditor for highly controllable facial editing.
arXiv Detail & Related papers (2022-03-31T17:58:13Z) - High-Fidelity GAN Inversion for Image Attribute Editing [61.966946442222735]
We present a novel high-fidelity generative adversarial network (GAN) inversion framework that enables attribute editing with image-specific details well-preserved.
With a low bit-rate latent code, previous works have difficulties in preserving high-fidelity details in reconstructed and edited images.
We propose a distortion consultation approach that employs a distortion map as a reference for high-fidelity reconstruction.
arXiv Detail & Related papers (2021-09-14T11:23:48Z) - In-Domain GAN Inversion for Real Image Editing [56.924323432048304]
A common practice of feeding a real image to a trained GAN generator is to invert it back to a latent code.
Existing inversion methods typically focus on reconstructing the target image by pixel values yet fail to land the inverted code in the semantic domain of the original latent space.
We propose an in-domain GAN inversion approach, which faithfully reconstructs the input image and ensures the inverted code to be semantically meaningful for editing.
arXiv Detail & Related papers (2020-03-31T18:20:18Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.