Gradient Adjusting Networks for Domain Inversion
- URL: http://arxiv.org/abs/2302.11413v1
- Date: Wed, 22 Feb 2023 14:47:57 GMT
- Title: Gradient Adjusting Networks for Domain Inversion
- Authors: Erez Sheffi, Michael Rotman, Lior Wolf
- Abstract summary: StyleGAN2 was demonstrated to be a powerful image generation engine that supports semantic editing.
We present a per-image optimization method that tunes a StyleGAN2 generator such that it achieves a local edit to the generator's weights.
Our experiments show a sizable gap in performance over the current state of the art in this very active domain.
- Score: 82.72289618025084
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: StyleGAN2 was demonstrated to be a powerful image generation engine that
supports semantic editing. However, in order to manipulate a real-world image,
one first needs to be able to retrieve its corresponding latent representation
in StyleGAN's latent space that is decoded to an image as close as possible to
the desired image. For many real-world images, a latent representation does not
exist, which necessitates the tuning of the generator network. We present a
per-image optimization method that tunes a StyleGAN2 generator such that it
achieves a local edit to the generator's weights, resulting in almost perfect
inversion, while still allowing image editing, by keeping the rest of the
mapping between an input latent representation tensor and an output image
relatively intact. The method is based on a one-shot training of a set of
shallow update networks (aka. Gradient Modification Modules) that modify the
layers of the generator. After training the Gradient Modification Modules, a
modified generator is obtained by a single application of these networks to the
original parameters, and the previous editing capabilities of the generator are
maintained. Our experiments show a sizable gap in performance over the current
state of the art in this very active domain. Our code is available at
\url{https://github.com/sheffier/gani}.
Related papers
- HyperEditor: Achieving Both Authenticity and Cross-Domain Capability in
Image Editing via Hypernetworks [5.9189325968909365]
We propose an innovative image editing method called HyperEditor, which utilizes weight factors generated by hypernetworks to reassign the weights of the pre-trained StyleGAN2's generator.
Guided by CLIP's cross-modal image-text semantic alignment, this innovative approach enables us to simultaneously accomplish authentic attribute editing and cross-domain style transfer.
arXiv Detail & Related papers (2023-12-21T02:39:53Z) - Latent Space Editing in Transformer-Based Flow Matching [53.75073756305241]
Flow Matching with a transformer backbone offers the potential for scalable and high-quality generative modeling.
We introduce an editing space, $u$-space, that can be manipulated in a controllable, accumulative, and composable manner.
Lastly, we put forth a straightforward yet powerful method for achieving fine-grained and nuanced editing using text prompts.
arXiv Detail & Related papers (2023-12-17T21:49:59Z) - DragonDiffusion: Enabling Drag-style Manipulation on Diffusion Models [66.43179841884098]
We propose a novel image editing method, DragonDiffusion, enabling Drag-style manipulation on Diffusion models.
Our method achieves various editing modes for the generated or real images, such as object moving, object resizing, object appearance replacement, and content dragging.
arXiv Detail & Related papers (2023-07-05T16:43:56Z) - Style Transformer for Image Inversion and Editing [35.45674653596084]
Existing GAN inversion methods fail to provide latent codes for reliable reconstruction and flexible editing simultaneously.
This paper presents a transformer-based image inversion and editing model for pretrained StyleGAN.
The proposed model employs a CNN encoder to provide multi-scale image features as keys and values.
arXiv Detail & Related papers (2022-03-15T14:16:57Z) - FlexIT: Towards Flexible Semantic Image Translation [59.09398209706869]
We propose FlexIT, a novel method which can take any input image and a user-defined text instruction for editing.
First, FlexIT combines the input image and text into a single target point in the CLIP multimodal embedding space.
We iteratively transform the input image toward the target point, ensuring coherence and quality with a variety of novel regularization terms.
arXiv Detail & Related papers (2022-03-09T13:34:38Z) - Pivotal Tuning for Latent-based Editing of Real Images [40.22151052441958]
A surge of advanced facial editing techniques have been proposed that leverage the generative power of a pre-trained StyleGAN.
To successfully edit an image this way, one must first project (or invert) the image into the pre-trained generator's domain.
This means it is still challenging to apply ID-preserving facial latent-space editing to faces which are out of the generator's domain.
arXiv Detail & Related papers (2021-06-10T13:47:59Z) - Unsupervised Image Transformation Learning via Generative Adversarial
Networks [40.84518581293321]
We study the image transformation problem by learning the underlying transformations from a collection of images using Generative Adversarial Networks (GANs)
We propose an unsupervised learning framework, termed as TrGAN, to project images onto a transformation space that is shared by the generator and the discriminator.
arXiv Detail & Related papers (2021-03-13T17:08:19Z) - In-Domain GAN Inversion for Real Image Editing [56.924323432048304]
A common practice of feeding a real image to a trained GAN generator is to invert it back to a latent code.
Existing inversion methods typically focus on reconstructing the target image by pixel values yet fail to land the inverted code in the semantic domain of the original latent space.
We propose an in-domain GAN inversion approach, which faithfully reconstructs the input image and ensures the inverted code to be semantically meaningful for editing.
arXiv Detail & Related papers (2020-03-31T18:20:18Z) - Exploiting Deep Generative Prior for Versatile Image Restoration and
Manipulation [181.08127307338654]
This work presents an effective way to exploit the image prior captured by a generative adversarial network (GAN) trained on large-scale natural images.
The deep generative prior (DGP) provides compelling results to restore missing semantics, e.g., color, patch, resolution, of various degraded images.
arXiv Detail & Related papers (2020-03-30T17:45:07Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.