Optimizing Latent Space Directions For GAN-based Local Image Editing
- URL: http://arxiv.org/abs/2111.12583v1
- Date: Wed, 24 Nov 2021 16:02:46 GMT
- Title: Optimizing Latent Space Directions For GAN-based Local Image Editing
- Authors: Ehsan Pajouheshgar, Tong Zhang, Sabine S\"usstrunk
- Abstract summary: We present a novel objective function to evaluate the locality of an image edit.
Our framework, called Locally Effective Latent Space Direction (LELSD), is applicable to any dataset and GAN architecture.
Our method is also computationally fast and exhibits a high extent of disentanglement, which allows users to interactively perform a sequence of edits on an image.
- Score: 15.118159513841874
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: Generative Adversarial Network (GAN) based localized image editing can suffer
ambiguity between semantic attributes. We thus present a novel objective
function to evaluate the locality of an image edit. By introducing the
supervision from a pre-trained segmentation network and optimizing the
objective function, our framework, called Locally Effective Latent Space
Direction (LELSD), is applicable to any dataset and GAN architecture. Our
method is also computationally fast and exhibits a high extent of
disentanglement, which allows users to interactively perform a sequence of
edits on an image. Our experiments on both GAN-generated and real images
qualitatively demonstrate the high quality and advantages of our method.
Related papers
- Layered Rendering Diffusion Model for Zero-Shot Guided Image Synthesis [60.260724486834164]
This paper introduces innovative solutions to enhance spatial controllability in diffusion models reliant on text queries.
We present two key innovations: Vision Guidance and the Layered Rendering Diffusion framework.
We apply our method to three practical applications: bounding box-to-image, semantic mask-to-image and image editing.
arXiv Detail & Related papers (2023-11-30T10:36:19Z) - In-Domain GAN Inversion for Faithful Reconstruction and Editability [132.68255553099834]
We propose in-domain GAN inversion, which consists of a domain-guided domain-regularized and a encoder to regularize the inverted code in the native latent space of the pre-trained GAN model.
We make comprehensive analyses on the effects of the encoder structure, the starting inversion point, as well as the inversion parameter space, and observe the trade-off between the reconstruction quality and the editing property.
arXiv Detail & Related papers (2023-09-25T08:42:06Z) - Conditional Score Guidance for Text-Driven Image-to-Image Translation [52.73564644268749]
We present a novel algorithm for text-driven image-to-image translation based on a pretrained text-to-image diffusion model.
Our method aims to generate a target image by selectively editing the regions of interest in a source image.
arXiv Detail & Related papers (2023-05-29T10:48:34Z) - Domain Agnostic Image-to-image Translation using Low-Resolution
Conditioning [6.470760375991825]
We propose a domain-agnostic i2i method for fine-grained problems, where the domains are related.
We present a novel approach that relies on training the generative model to produce images that both share distinctive information of the associated source image.
We validate our method on the CelebA-HQ and AFHQ datasets by demonstrating improvements in terms of visual quality.
arXiv Detail & Related papers (2023-05-08T19:58:49Z) - TcGAN: Semantic-Aware and Structure-Preserved GANs with Individual
Vision Transformer for Fast Arbitrary One-Shot Image Generation [11.207512995742999]
One-shot image generation (OSG) with generative adversarial networks that learn from the internal patches of a given image has attracted world wide attention.
We propose a novel structure-preserved method TcGAN with individual vision transformer to overcome the shortcomings of the existing one-shot image generation methods.
arXiv Detail & Related papers (2023-02-16T03:05:59Z) - Semantic Image Synthesis via Diffusion Models [159.4285444680301]
Denoising Diffusion Probabilistic Models (DDPMs) have achieved remarkable success in various image generation tasks.
Recent work on semantic image synthesis mainly follows the emphde facto Generative Adversarial Nets (GANs)
arXiv Detail & Related papers (2022-06-30T18:31:51Z) - Region-Based Semantic Factorization in GANs [67.90498535507106]
We present a highly efficient algorithm to factorize the latent semantics learned by Generative Adversarial Networks (GANs) concerning an arbitrary image region.
Through an appropriately defined generalized Rayleigh quotient, we solve such a problem without any annotations or training.
Experimental results on various state-of-the-art GAN models demonstrate the effectiveness of our approach.
arXiv Detail & Related papers (2022-02-19T17:46:02Z) - Object-Guided Day-Night Visual Localization in Urban Scenes [2.4493299476776778]
The proposed method first detects semantic objects and establishes correspondences of those objects between images.
Experiments on standard urban localization datasets show that OGuL significantly improves localization results with as simple local features as SIFT.
arXiv Detail & Related papers (2022-02-09T13:21:30Z) - Style Intervention: How to Achieve Spatial Disentanglement with
Style-based Generators? [100.60938767993088]
We propose a lightweight optimization-based algorithm which could adapt to arbitrary input images and render natural translation effects under flexible objectives.
We verify the performance of the proposed framework in facial attribute editing on high-resolution images, where both photo-realism and consistency are required.
arXiv Detail & Related papers (2020-11-19T07:37:31Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.