Optimizing Latent Space Directions For GAN-based Local Image Editing
- URL: http://arxiv.org/abs/2111.12583v1
- Date: Wed, 24 Nov 2021 16:02:46 GMT
- Title: Optimizing Latent Space Directions For GAN-based Local Image Editing
- Authors: Ehsan Pajouheshgar, Tong Zhang, Sabine S\"usstrunk
- Abstract summary: We present a novel objective function to evaluate the locality of an image edit.
Our framework, called Locally Effective Latent Space Direction (LELSD), is applicable to any dataset and GAN architecture.
Our method is also computationally fast and exhibits a high extent of disentanglement, which allows users to interactively perform a sequence of edits on an image.
- Score: 15.118159513841874
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: Generative Adversarial Network (GAN) based localized image editing can suffer
ambiguity between semantic attributes. We thus present a novel objective
function to evaluate the locality of an image edit. By introducing the
supervision from a pre-trained segmentation network and optimizing the
objective function, our framework, called Locally Effective Latent Space
Direction (LELSD), is applicable to any dataset and GAN architecture. Our
method is also computationally fast and exhibits a high extent of
disentanglement, which allows users to interactively perform a sequence of
edits on an image. Our experiments on both GAN-generated and real images
qualitatively demonstrate the high quality and advantages of our method.
Related papers
- HyperGAN-CLIP: A Unified Framework for Domain Adaptation, Image Synthesis and Manipulation [21.669044026456557]
Generative Adversarial Networks (GANs) have demonstrated remarkable capabilities in generating highly realistic images.
We present a novel framework that significantly extends the capabilities of a pre-trained StyleGAN by integrating CLIP space via hypernetworks.
Our approach demonstrates unprecedented flexibility, enabling text-guided image manipulation without the need for text-specific training data.
arXiv Detail & Related papers (2024-11-19T19:36:18Z) - AdaptiveDrag: Semantic-Driven Dragging on Diffusion-Based Image Editing [14.543341303789445]
We propose a novel mask-free point-based image editing method, AdaptiveDrag, which generates images that better align with user intent.
To ensure a comprehensive connection between the input image and the drag process, we have developed a semantic-driven optimization.
Building on these effective designs, our method delivers superior generation results using only the single input image and the handle-target point pairs.
arXiv Detail & Related papers (2024-10-16T15:59:02Z) - Layered Rendering Diffusion Model for Zero-Shot Guided Image Synthesis [60.260724486834164]
This paper introduces innovative solutions to enhance spatial controllability in diffusion models reliant on text queries.
We present two key innovations: Vision Guidance and the Layered Rendering Diffusion framework.
We apply our method to three practical applications: bounding box-to-image, semantic mask-to-image and image editing.
arXiv Detail & Related papers (2023-11-30T10:36:19Z) - In-Domain GAN Inversion for Faithful Reconstruction and Editability [132.68255553099834]
We propose in-domain GAN inversion, which consists of a domain-guided domain-regularized and a encoder to regularize the inverted code in the native latent space of the pre-trained GAN model.
We make comprehensive analyses on the effects of the encoder structure, the starting inversion point, as well as the inversion parameter space, and observe the trade-off between the reconstruction quality and the editing property.
arXiv Detail & Related papers (2023-09-25T08:42:06Z) - Conditional Score Guidance for Text-Driven Image-to-Image Translation [52.73564644268749]
We present a novel algorithm for text-driven image-to-image translation based on a pretrained text-to-image diffusion model.
Our method aims to generate a target image by selectively editing the regions of interest in a source image.
arXiv Detail & Related papers (2023-05-29T10:48:34Z) - TcGAN: Semantic-Aware and Structure-Preserved GANs with Individual
Vision Transformer for Fast Arbitrary One-Shot Image Generation [11.207512995742999]
One-shot image generation (OSG) with generative adversarial networks that learn from the internal patches of a given image has attracted world wide attention.
We propose a novel structure-preserved method TcGAN with individual vision transformer to overcome the shortcomings of the existing one-shot image generation methods.
arXiv Detail & Related papers (2023-02-16T03:05:59Z) - Semantic Image Synthesis via Diffusion Models [159.4285444680301]
Denoising Diffusion Probabilistic Models (DDPMs) have achieved remarkable success in various image generation tasks.
Recent work on semantic image synthesis mainly follows the emphde facto Generative Adversarial Nets (GANs)
arXiv Detail & Related papers (2022-06-30T18:31:51Z) - Region-Based Semantic Factorization in GANs [67.90498535507106]
We present a highly efficient algorithm to factorize the latent semantics learned by Generative Adversarial Networks (GANs) concerning an arbitrary image region.
Through an appropriately defined generalized Rayleigh quotient, we solve such a problem without any annotations or training.
Experimental results on various state-of-the-art GAN models demonstrate the effectiveness of our approach.
arXiv Detail & Related papers (2022-02-19T17:46:02Z) - Object-Guided Day-Night Visual Localization in Urban Scenes [2.4493299476776778]
The proposed method first detects semantic objects and establishes correspondences of those objects between images.
Experiments on standard urban localization datasets show that OGuL significantly improves localization results with as simple local features as SIFT.
arXiv Detail & Related papers (2022-02-09T13:21:30Z) - Style Intervention: How to Achieve Spatial Disentanglement with
Style-based Generators? [100.60938767993088]
We propose a lightweight optimization-based algorithm which could adapt to arbitrary input images and render natural translation effects under flexible objectives.
We verify the performance of the proposed framework in facial attribute editing on high-resolution images, where both photo-realism and consistency are required.
arXiv Detail & Related papers (2020-11-19T07:37:31Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.