Related papers: Optimizing Latent Space Directions For GAN-based Local Image Editing

Optimizing Latent Space Directions For GAN-based Local Image Editing

URL: http://arxiv.org/abs/2111.12583v1
Date: Wed, 24 Nov 2021 16:02:46 GMT
Title: Optimizing Latent Space Directions For GAN-based Local Image Editing
Authors: Ehsan Pajouheshgar, Tong Zhang, Sabine S\"usstrunk
Abstract summary: We present a novel objective function to evaluate the locality of an image edit. Our framework, called Locally Effective Latent Space Direction (LELSD), is applicable to any dataset and GAN architecture. Our method is also computationally fast and exhibits a high extent of disentanglement, which allows users to interactively perform a sequence of edits on an image.
Score: 15.118159513841874
License: http://creativecommons.org/licenses/by-sa/4.0/
Abstract: Generative Adversarial Network (GAN) based localized image editing can suffer ambiguity between semantic attributes. We thus present a novel objective function to evaluate the locality of an image edit. By introducing the supervision from a pre-trained segmentation network and optimizing the objective function, our framework, called Locally Effective Latent Space Direction (LELSD), is applicable to any dataset and GAN architecture. Our method is also computationally fast and exhibits a high extent of disentanglement, which allows users to interactively perform a sequence of edits on an image. Our experiments on both GAN-generated and real images qualitatively demonstrate the high quality and advantages of our method.

Related papers

Learning Latent Representations for Image Translation using Frequency Distributed CycleGAN [7.610968152027164]
Fd-CycleGAN is an image-to-image (I2I) translation framework that enhances latent representation learning to approximate real data distributions.<n>We conduct experiments on diverse datasets -- Horse2Zebra, Monet2Photo, and a synthetically augmented Strike-off dataset.<n>Our results suggest that frequency-guided latent learning significantly improves generalization in image translation tasks.
arXiv Detail & Related papers (2025-08-05T12:59:37Z)
LORE: Latent Optimization for Precise Semantic Control in Rectified Flow-based Image Editing [0.276240219662896]
We introduce LORE, a training-free and efficient image editing method.<n>LORE directly optimize the inverted noise, addressing the core limitations in generalization and controllability of existing approaches.<n> Experimental results show that LORE significantly outperforms strong baselines in terms of semantic alignment, image quality, and background fidelity.
arXiv Detail & Related papers (2025-08-05T06:45:04Z)
Diffusion-Based Conditional Image Editing through Optimized Inference with Guidance [46.922018440110826]
We present a training-free approach for text-driven image-to-image translation based on a pretrained text-to-image diffusion model. Our method achieves outstanding image-to-image translation performance on various tasks when combined with the pretrained Stable Diffusion model.
arXiv Detail & Related papers (2024-12-20T11:15:31Z)
HyperGAN-CLIP: A Unified Framework for Domain Adaptation, Image Synthesis and Manipulation [21.669044026456557]
Generative Adversarial Networks (GANs) have demonstrated remarkable capabilities in generating highly realistic images. We present a novel framework that significantly extends the capabilities of a pre-trained StyleGAN by integrating CLIP space via hypernetworks. Our approach demonstrates unprecedented flexibility, enabling text-guided image manipulation without the need for text-specific training data.
arXiv Detail & Related papers (2024-11-19T19:36:18Z)
AdaptiveDrag: Semantic-Driven Dragging on Diffusion-Based Image Editing [14.543341303789445]
We propose a novel mask-free point-based image editing method, AdaptiveDrag, which generates images that better align with user intent. To ensure a comprehensive connection between the input image and the drag process, we have developed a semantic-driven optimization. Building on these effective designs, our method delivers superior generation results using only the single input image and the handle-target point pairs.
arXiv Detail & Related papers (2024-10-16T15:59:02Z)
Layered Rendering Diffusion Model for Zero-Shot Guided Image Synthesis [60.260724486834164]
This paper introduces innovative solutions to enhance spatial controllability in diffusion models reliant on text queries. We present two key innovations: Vision Guidance and the Layered Rendering Diffusion framework. We apply our method to three practical applications: bounding box-to-image, semantic mask-to-image and image editing.
arXiv Detail & Related papers (2023-11-30T10:36:19Z)
In-Domain GAN Inversion for Faithful Reconstruction and Editability [132.68255553099834]
We propose in-domain GAN inversion, which consists of a domain-guided domain-regularized and a encoder to regularize the inverted code in the native latent space of the pre-trained GAN model. We make comprehensive analyses on the effects of the encoder structure, the starting inversion point, as well as the inversion parameter space, and observe the trade-off between the reconstruction quality and the editing property.
arXiv Detail & Related papers (2023-09-25T08:42:06Z)
Conditional Score Guidance for Text-Driven Image-to-Image Translation [52.73564644268749]
We present a novel algorithm for text-driven image-to-image translation based on a pretrained text-to-image diffusion model. Our method aims to generate a target image by selectively editing the regions of interest in a source image.
arXiv Detail & Related papers (2023-05-29T10:48:34Z)
TcGAN: Semantic-Aware and Structure-Preserved GANs with Individual Vision Transformer for Fast Arbitrary One-Shot Image Generation [11.207512995742999]
One-shot image generation (OSG) with generative adversarial networks that learn from the internal patches of a given image has attracted world wide attention. We propose a novel structure-preserved method TcGAN with individual vision transformer to overcome the shortcomings of the existing one-shot image generation methods.
arXiv Detail & Related papers (2023-02-16T03:05:59Z)
Discovering Class-Specific GAN Controls for Semantic Image Synthesis [73.91655061467988]
We propose a novel method for finding spatially disentangled class-specific directions in the latent space of pretrained SIS models. We show that the latent directions found by our method can effectively control the local appearance of semantic classes.
arXiv Detail & Related papers (2022-12-02T21:39:26Z)
Semantic Image Synthesis via Diffusion Models [159.4285444680301]
Denoising Diffusion Probabilistic Models (DDPMs) have achieved remarkable success in various image generation tasks. Recent work on semantic image synthesis mainly follows the emphde facto Generative Adversarial Nets (GANs)
arXiv Detail & Related papers (2022-06-30T18:31:51Z)
Region-Based Semantic Factorization in GANs [67.90498535507106]
We present a highly efficient algorithm to factorize the latent semantics learned by Generative Adversarial Networks (GANs) concerning an arbitrary image region. Through an appropriately defined generalized Rayleigh quotient, we solve such a problem without any annotations or training. Experimental results on various state-of-the-art GAN models demonstrate the effectiveness of our approach.
arXiv Detail & Related papers (2022-02-19T17:46:02Z)
Object-Guided Day-Night Visual Localization in Urban Scenes [2.4493299476776778]
The proposed method first detects semantic objects and establishes correspondences of those objects between images. Experiments on standard urban localization datasets show that OGuL significantly improves localization results with as simple local features as SIFT.
arXiv Detail & Related papers (2022-02-09T13:21:30Z)
A Unified Architecture of Semantic Segmentation and Hierarchical Generative Adversarial Networks for Expression Manipulation [52.911307452212256]
We develop a unified architecture of semantic segmentation and hierarchical GANs. A unique advantage of our framework is that on forward pass the semantic segmentation network conditions the generative model. We evaluate our method on two challenging facial expression translation benchmarks, AffectNet and RaFD, and a semantic segmentation benchmark, CelebAMask-HQ.
arXiv Detail & Related papers (2021-12-08T22:06:31Z)
Style Intervention: How to Achieve Spatial Disentanglement with Style-based Generators? [100.60938767993088]
We propose a lightweight optimization-based algorithm which could adapt to arbitrary input images and render natural translation effects under flexible objectives. We verify the performance of the proposed framework in facial attribute editing on high-resolution images, where both photo-realism and consistency are required.
arXiv Detail & Related papers (2020-11-19T07:37:31Z)

This list is automatically generated from the titles and abstracts of the papers in this site.