Related papers: ReGANIE: Rectifying GAN Inversion Errors for Accurate Real Image Editing

ReGANIE: Rectifying GAN Inversion Errors for Accurate Real Image Editing

URL: http://arxiv.org/abs/2301.13402v1
Date: Tue, 31 Jan 2023 04:38:42 GMT
Title: ReGANIE: Rectifying GAN Inversion Errors for Accurate Real Image Editing
Authors: Bingchuan Li, Tianxiang Ma, Peng Zhang, Miao Hua, Wei Liu, Qian He, Zili Yi
Abstract summary: StyleGAN allows for flexible and plausible editing of generated images by manipulating the semantic-rich latent style space. Projecting a real image into its latent space encounters an inherent trade-off between inversion quality and editability. We propose a novel two-phase framework by designating two separate networks to tackle editing and reconstruction respectively.
Score: 20.39792009151017
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: The StyleGAN family succeed in high-fidelity image generation and allow for flexible and plausible editing of generated images by manipulating the semantic-rich latent style space.However, projecting a real image into its latent space encounters an inherent trade-off between inversion quality and editability. Existing encoder-based or optimization-based StyleGAN inversion methods attempt to mitigate the trade-off but see limited performance. To fundamentally resolve this problem, we propose a novel two-phase framework by designating two separate networks to tackle editing and reconstruction respectively, instead of balancing the two. Specifically, in Phase I, a W-space-oriented StyleGAN inversion network is trained and used to perform image inversion and editing, which assures the editability but sacrifices reconstruction quality. In Phase II, a carefully designed rectifying network is utilized to rectify the inversion errors and perform ideal reconstruction. Experimental results show that our approach yields near-perfect reconstructions without sacrificing the editability, thus allowing accurate manipulation of real images. Further, we evaluate the performance of our rectifying network, and see great generalizability towards unseen manipulation types and out-of-domain images.

Related papers

EditInfinity: Image Editing with Binary-Quantized Generative Models [64.05135380710749]
We investigate the parameter-efficient adaptation of binary-quantized generative models for image editing.<n>Specifically, we propose EditInfinity, which adapts emphInfinity, a binary-quantized generative model, for image editing.<n>We propose an efficient yet effective image inversion mechanism that integrates text prompting rectification and image style preservation.
arXiv Detail & Related papers (2025-10-23T05:06:24Z)
DCI: Dual-Conditional Inversion for Boosting Diffusion-Based Image Editing [73.12011187146481]
Inversion within Diffusion models aims to recover the latent noise representation for a real or generated image.<n>Most inversion approaches suffer from an intrinsic trade-off between reconstruction accuracy and editing flexibility.<n>We introduce Dual-Conditional Inversion (DCI), a novel framework that jointly conditions on the source prompt and reference image.
arXiv Detail & Related papers (2025-06-03T07:46:44Z)
MambaStyle: Efficient StyleGAN Inversion for Real Image Editing with State-Space Models [60.110274007388135]
MambaStyle is an efficient single-stage encoder-based approach for GAN inversion and editing.<n>We show that MambaStyle achieves a superior balance among inversion accuracy, editing quality, and computational efficiency.
arXiv Detail & Related papers (2025-05-06T20:03:47Z)
Uniform Attention Maps: Boosting Image Fidelity in Reconstruction and Editing [66.48853049746123]
We analyze reconstruction from a structural perspective and propose a novel approach that replaces traditional cross-attention with uniform attention maps. Our method effectively minimizes distortions caused by varying text conditions during noise prediction. Experimental results demonstrate that our approach not only excels in achieving high-fidelity image reconstruction but also performs robustly in real image composition and editing scenarios.
arXiv Detail & Related papers (2024-11-29T12:11:28Z)
Task-Oriented Diffusion Inversion for High-Fidelity Text-based Editing [60.730661748555214]
We introduce textbfTask-textbfOriented textbfDiffusion textbfInversion (textbfTODInv), a novel framework that inverts and edits real images tailored to specific editing tasks. ToDInv seamlessly integrates inversion and editing through reciprocal optimization, ensuring both high fidelity and precise editability.
arXiv Detail & Related papers (2024-08-23T22:16:34Z)
Spatial-Contextual Discrepancy Information Compensation for GAN Inversion [67.21442893265973]
We introduce a novel spatial-contextual discrepancy information compensationbased GAN-inversion method (SDIC) SDIC bridges the gap in image details between the original image and the reconstructed/edited image. Our proposed method achieves the excellent distortion-editability trade-off at a fast inference speed for both image inversion and editing tasks.
arXiv Detail & Related papers (2023-12-12T08:58:56Z)
In-Domain GAN Inversion for Faithful Reconstruction and Editability [132.68255553099834]
We propose in-domain GAN inversion, which consists of a domain-guided domain-regularized and a encoder to regularize the inverted code in the native latent space of the pre-trained GAN model. We make comprehensive analyses on the effects of the encoder structure, the starting inversion point, as well as the inversion parameter space, and observe the trade-off between the reconstruction quality and the editing property.
arXiv Detail & Related papers (2023-09-25T08:42:06Z)
Robust GAN inversion [5.1359892878090845]
We propose an approach which works in native latent space $W$ and tunes the generator network to restore missing image details. We demonstrate the effectiveness of our approach on two complex datasets: Flickr-Faces-HQ and LSUN Church.
arXiv Detail & Related papers (2023-08-31T07:47:11Z)
StyleDiffusion: Prompt-Embedding Inversion for Text-Based Editing [115.49488548588305]
A significant research effort is focused on exploiting the amazing capacities of pretrained diffusion models for the editing of images. They either finetune the model, or invert the image in the latent space of the pretrained model. They suffer from two problems: Unsatisfying results for selected regions and unexpected changes in non-selected regions.
arXiv Detail & Related papers (2023-03-28T00:16:45Z)
Editing Out-of-domain GAN Inversion via Differential Activations [56.62964029959131]
We propose a novel GAN prior based editing framework to tackle the out-of-domain inversion problem with a composition-decomposition paradigm. With the aid of the generated Diff-CAM mask, a coarse reconstruction can intuitively be composited by the paired original and edited images. In the decomposition phase, we further present a GAN prior based deghosting network for separating the final fine edited image from the coarse reconstruction.
arXiv Detail & Related papers (2022-07-17T10:34:58Z)
High-Fidelity GAN Inversion for Image Attribute Editing [61.966946442222735]
We present a novel high-fidelity generative adversarial network (GAN) inversion framework that enables attribute editing with image-specific details well-preserved. With a low bit-rate latent code, previous works have difficulties in preserving high-fidelity details in reconstructed and edited images. We propose a distortion consultation approach that employs a distortion map as a reference for high-fidelity reconstruction.
arXiv Detail & Related papers (2021-09-14T11:23:48Z)
Designing an Encoder for StyleGAN Image Manipulation [38.909059126878354]
We study the latent space of StyleGAN, the state-of-the-art unconditional generator. We identify and analyze the existence of a distortion-editability tradeoff and a distortion-perception tradeoff within the StyleGAN latent space. We present an encoder based on our two principles that is specifically designed for facilitating editing on real images.
arXiv Detail & Related papers (2021-02-04T17:52:38Z)
In-Domain GAN Inversion for Real Image Editing [56.924323432048304]
A common practice of feeding a real image to a trained GAN generator is to invert it back to a latent code. Existing inversion methods typically focus on reconstructing the target image by pixel values yet fail to land the inverted code in the semantic domain of the original latent space. We propose an in-domain GAN inversion approach, which faithfully reconstructs the input image and ensures the inverted code to be semantically meaningful for editing.
arXiv Detail & Related papers (2020-03-31T18:20:18Z)

This list is automatically generated from the titles and abstracts of the papers in this site.