Style Transformer for Image Inversion and Editing
- URL: http://arxiv.org/abs/2203.07932v1
- Date: Tue, 15 Mar 2022 14:16:57 GMT
- Title: Style Transformer for Image Inversion and Editing
- Authors: Xueqi Hu, Qiusheng Huang, Zhengyi Shi, Siyuan Li, Changxin Gao, Li
Sun, Qingli Li
- Abstract summary: Existing GAN inversion methods fail to provide latent codes for reliable reconstruction and flexible editing simultaneously.
This paper presents a transformer-based image inversion and editing model for pretrained StyleGAN.
The proposed model employs a CNN encoder to provide multi-scale image features as keys and values.
- Score: 35.45674653596084
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Existing GAN inversion methods fail to provide latent codes for reliable
reconstruction and flexible editing simultaneously. This paper presents a
transformer-based image inversion and editing model for pretrained StyleGAN
which is not only with less distortions, but also of high quality and
flexibility for editing. The proposed model employs a CNN encoder to provide
multi-scale image features as keys and values. Meanwhile it regards the style
code to be determined for different layers of the generator as queries. It
first initializes query tokens as learnable parameters and maps them into W+
space. Then the multi-stage alternate self- and cross-attention are utilized,
updating queries with the purpose of inverting the input by the generator.
Moreover, based on the inverted code, we investigate the reference- and
label-based attribute editing through a pretrained latent classifier, and
achieve flexible image-to-image translation with high quality results.
Extensive experiments are carried out, showing better performances on both
inversion and editing tasks within StyleGAN.
Related papers
- HyperEditor: Achieving Both Authenticity and Cross-Domain Capability in
Image Editing via Hypernetworks [5.9189325968909365]
We propose an innovative image editing method called HyperEditor, which utilizes weight factors generated by hypernetworks to reassign the weights of the pre-trained StyleGAN2's generator.
Guided by CLIP's cross-modal image-text semantic alignment, this innovative approach enables us to simultaneously accomplish authentic attribute editing and cross-domain style transfer.
arXiv Detail & Related papers (2023-12-21T02:39:53Z) - Latent Space Editing in Transformer-Based Flow Matching [53.75073756305241]
Flow Matching with a transformer backbone offers the potential for scalable and high-quality generative modeling.
We introduce an editing space, $u$-space, that can be manipulated in a controllable, accumulative, and composable manner.
Lastly, we put forth a straightforward yet powerful method for achieving fine-grained and nuanced editing using text prompts.
arXiv Detail & Related papers (2023-12-17T21:49:59Z) - In-Domain GAN Inversion for Faithful Reconstruction and Editability [132.68255553099834]
We propose in-domain GAN inversion, which consists of a domain-guided domain-regularized and a encoder to regularize the inverted code in the native latent space of the pre-trained GAN model.
We make comprehensive analyses on the effects of the encoder structure, the starting inversion point, as well as the inversion parameter space, and observe the trade-off between the reconstruction quality and the editing property.
arXiv Detail & Related papers (2023-09-25T08:42:06Z) - DragonDiffusion: Enabling Drag-style Manipulation on Diffusion Models [66.43179841884098]
We propose a novel image editing method, DragonDiffusion, enabling Drag-style manipulation on Diffusion models.
Our method achieves various editing modes for the generated or real images, such as object moving, object resizing, object appearance replacement, and content dragging.
arXiv Detail & Related papers (2023-07-05T16:43:56Z) - Towards Accurate Image Coding: Improved Autoregressive Image Generation
with Dynamic Vector Quantization [73.52943587514386]
Existing vector quantization (VQ) based autoregressive models follow a two-stage generation paradigm.
We propose a novel two-stage framework: (1) Dynamic-Quantization VAE (DQ-VAE) which encodes image regions into variable-length codes based their information densities for accurate representation.
arXiv Detail & Related papers (2023-05-19T14:56:05Z) - Gradient Adjusting Networks for Domain Inversion [82.72289618025084]
StyleGAN2 was demonstrated to be a powerful image generation engine that supports semantic editing.
We present a per-image optimization method that tunes a StyleGAN2 generator such that it achieves a local edit to the generator's weights.
Our experiments show a sizable gap in performance over the current state of the art in this very active domain.
arXiv Detail & Related papers (2023-02-22T14:47:57Z) - Null-text Inversion for Editing Real Images using Guided Diffusion
Models [44.27570654402436]
We introduce an accurate inversion technique and thus facilitate an intuitive text-based modification of the image.
Our Null-text inversion, based on the publicly available Stable Diffusion model, is extensively evaluated on a variety of images and prompt editing.
arXiv Detail & Related papers (2022-11-17T18:58:14Z) - In-Domain GAN Inversion for Real Image Editing [56.924323432048304]
A common practice of feeding a real image to a trained GAN generator is to invert it back to a latent code.
Existing inversion methods typically focus on reconstructing the target image by pixel values yet fail to land the inverted code in the semantic domain of the original latent space.
We propose an in-domain GAN inversion approach, which faithfully reconstructs the input image and ensures the inverted code to be semantically meaningful for editing.
arXiv Detail & Related papers (2020-03-31T18:20:18Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.