Related papers: Delta-GAN-Encoder: Encoding Semantic Changes for Explicit Image Editing, using Few Synthetic Samples

Delta-GAN-Encoder: Encoding Semantic Changes for Explicit Image Editing, using Few Synthetic Samples

URL: http://arxiv.org/abs/2111.08419v2
Date: Wed, 17 Nov 2021 11:02:33 GMT
Title: Delta-GAN-Encoder: Encoding Semantic Changes for Explicit Image Editing, using Few Synthetic Samples
Authors: Nir Diamant, Nitsan Sandor, Alex M Bronstein
Abstract summary: We propose a novel method for learning to control any desired attribute in a pre-trained GAN's latent space. We perform Sim2Real learning, relying on minimal samples to achieve an unlimited amount of continuous precise edits.
Score: 2.348633570886661
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: Understating and controlling generative models' latent space is a complex task. In this paper, we propose a novel method for learning to control any desired attribute in a pre-trained GAN's latent space, for the purpose of editing synthesized and real-world data samples accordingly. We perform Sim2Real learning, relying on minimal samples to achieve an unlimited amount of continuous precise edits. We present an Autoencoder-based model that learns to encode the semantics of changes between images as a basis for editing new samples later on, achieving precise desired results - example shown in Fig. 1. While previous editing methods rely on a known structure of latent spaces (e.g., linearity of some semantics in StyleGAN), our method inherently does not require any structural constraints. We demonstrate our method in the domain of facial imagery: editing different expressions, poses, and lighting attributes, achieving state-of-the-art results.

Related papers

Training-free Geometric Image Editing on Diffusion Models [53.38549950608886]
We tackle the task of geometric image editing, where an object within an image is repositioned, reoriented, or reshaped.<n>We propose a decoupled pipeline that separates object transformation, source region inpainting, and target region refinement.<n>Both inpainting and refinement are implemented using a training-free diffusion approach, FreeFine.
arXiv Detail & Related papers (2025-07-31T07:36:00Z)
TPIE: Topology-Preserved Image Editing With Text Instructions [14.399084325078878]
Topology-Preserved Image Editing with text instructions (TPIE) TPIE treats newly generated samples as deformable variations of a given input template, allowing for controllable and structure-preserving edits. We validate TPIE on a diverse set of 2D and 3D images and compare them with state-of-the-art image editing approaches.
arXiv Detail & Related papers (2024-11-22T22:08:27Z)
Stable Flow: Vital Layers for Training-Free Image Editing [74.52248787189302]
Diffusion models have revolutionized the field of content synthesis and editing. Recent models have replaced the traditional UNet architecture with the Diffusion Transformer (DiT) We propose an automatic method to identify "vital layers" within DiT, crucial for image formation. Next, to enable real-image editing, we introduce an improved image inversion method for flow models.
arXiv Detail & Related papers (2024-11-21T18:59:51Z)
Latent Space Disentanglement in Diffusion Transformers Enables Precise Zero-shot Semantic Editing [4.948910649137149]
Diffusion Transformers (DiTs) have recently achieved remarkable success in text-guided image generation. We show how multimodal information collectively forms this joint space and how they guide the semantics of the synthesized images. We propose a simple yet effective Encode-Identify-Manipulate (EIM) framework for zero-shot fine-grained image editing.
arXiv Detail & Related papers (2024-11-12T21:34:30Z)
Diffusion Model-Based Image Editing: A Survey [46.244266782108234]
Denoising diffusion models have emerged as a powerful tool for various image generation and editing tasks. We provide an exhaustive overview of existing methods using diffusion models for image editing. To further evaluate the performance of text-guided image editing algorithms, we propose a systematic benchmark, EditEval.
arXiv Detail & Related papers (2024-02-27T14:07:09Z)
A Compact and Semantic Latent Space for Disentangled and Controllable Image Editing [4.8201607588546]
We propose an auto-encoder which re-organizes the latent space of StyleGAN, so that each attribute which we wish to edit corresponds to an axis of the new latent space. We show that our approach has greater disentanglement than competing methods, while maintaining fidelity to the original image with respect to identity.
arXiv Detail & Related papers (2023-12-13T16:18:45Z)
Layered Rendering Diffusion Model for Zero-Shot Guided Image Synthesis [60.260724486834164]
This paper introduces innovative solutions to enhance spatial controllability in diffusion models reliant on text queries. We present two key innovations: Vision Guidance and the Layered Rendering Diffusion framework. We apply our method to three practical applications: bounding box-to-image, semantic mask-to-image and image editing.
arXiv Detail & Related papers (2023-11-30T10:36:19Z)
iEdit: Localised Text-guided Image Editing with Weak Supervision [53.082196061014734]
We propose a novel learning method for text-guided image editing. It generates images conditioned on a source image and a textual edit prompt. It shows favourable results against its counterparts in terms of image fidelity, CLIP alignment score and qualitatively for editing both generated and real images.
arXiv Detail & Related papers (2023-05-10T07:39:14Z)
Spatial Steerability of GANs via Self-Supervision from Discriminator [123.27117057804732]
We propose a self-supervised approach to improve the spatial steerability of GANs without searching for steerable directions in the latent space. Specifically, we design randomly sampled Gaussian heatmaps to be encoded into the intermediate layers of generative models as spatial inductive bias. During inference, users can interact with the spatial heatmaps in an intuitive manner, enabling them to edit the output image by adjusting the scene layout, moving, or removing objects.
arXiv Detail & Related papers (2023-01-20T07:36:29Z)
Cycle-Consistent Inverse GAN for Text-to-Image Synthesis [101.97397967958722]
We propose a novel unified framework of Cycle-consistent Inverse GAN for both text-to-image generation and text-guided image manipulation tasks. We learn a GAN inversion model to convert the images back to the GAN latent space and obtain the inverted latent codes for each image. In the text-guided optimization module, we generate images with the desired semantic attributes by optimizing the inverted latent codes.
arXiv Detail & Related papers (2021-08-03T08:38:16Z)
SDEdit: Image Synthesis and Editing with Stochastic Differential Equations [113.35735935347465]
We introduce Differential Editing (SDEdit), based on a recent generative model using differential equations (SDEs) Given an input image with user edits, we first add noise to the input according to an SDE, and subsequently denoise it by simulating the reverse SDE to gradually increase its likelihood under the prior. Our method does not require task-specific loss function designs, which are critical components for recent image editing methods based on GAN inversions.
arXiv Detail & Related papers (2021-08-02T17:59:47Z)
Designing an Encoder for StyleGAN Image Manipulation [38.909059126878354]
We study the latent space of StyleGAN, the state-of-the-art unconditional generator. We identify and analyze the existence of a distortion-editability tradeoff and a distortion-perception tradeoff within the StyleGAN latent space. We present an encoder based on our two principles that is specifically designed for facilitating editing on real images.
arXiv Detail & Related papers (2021-02-04T17:52:38Z)
Enjoy Your Editing: Controllable GANs for Image Editing via Latent Space Navigation [136.53288628437355]
Controllable semantic image editing enables a user to change entire image attributes with few clicks. Current approaches often suffer from attribute edits that are entangled, global image identity changes, and diminished photo-realism. We propose quantitative evaluation strategies for measuring controllable editing performance, unlike prior work which primarily focuses on qualitative evaluation.
arXiv Detail & Related papers (2021-02-01T21:38:36Z)

This list is automatically generated from the titles and abstracts of the papers in this site.