StylePrompter: All Styles Need Is Attention
- URL: http://arxiv.org/abs/2307.16151v1
- Date: Sun, 30 Jul 2023 07:23:44 GMT
- Title: StylePrompter: All Styles Need Is Attention
- Authors: Chenyi Zhuang, Pan Gao, Aljosa Smolic
- Abstract summary: StyleGAN aims at inverting images into corresponding latent codes for Generative Adversarial Networks (GANs)
We introduce a hierarchical vision Transformer backbone innovatively to predict $mathcalW+$ latent codes at token level.
We then prove that StylePrompter lies in a more disentangled $mathcalW+$ and show the controllability of SMART.
- Score: 21.760753546313403
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: GAN inversion aims at inverting given images into corresponding latent codes
for Generative Adversarial Networks (GANs), especially StyleGAN where exists a
disentangled latent space that allows attribute-based image manipulation at
latent level. As most inversion methods build upon Convolutional Neural
Networks (CNNs), we transfer a hierarchical vision Transformer backbone
innovatively to predict $\mathcal{W^+}$ latent codes at token level. We further
apply a Style-driven Multi-scale Adaptive Refinement Transformer (SMART) in
$\mathcal{F}$ space to refine the intermediate style features of the generator.
By treating style features as queries to retrieve lost identity information
from the encoder's feature maps, SMART can not only produce high-quality
inverted images but also surprisingly adapt to editing tasks. We then prove
that StylePrompter lies in a more disentangled $\mathcal{W^+}$ and show the
controllability of SMART. Finally, quantitative and qualitative experiments
demonstrate that StylePrompter can achieve desirable performance in balancing
reconstruction quality and editability, and is "smart" enough to fit into most
edits, outperforming other $\mathcal{F}$-involved inversion methods.
Related papers
- RefineStyle: Dynamic Convolution Refinement for StyleGAN [15.230430037135017]
In StyleGAN, convolution kernels are shaped by both static parameters shared across images.
$mathcalW+$ space is often used for image inversion and editing.
This paper proposes an efficient refining strategy for dynamic kernels.
arXiv Detail & Related papers (2024-10-08T15:01:30Z) - MoreStyle: Relax Low-frequency Constraint of Fourier-based Image Reconstruction in Generalizable Medical Image Segmentation [53.24011398381715]
We introduce a Plug-and-Play module for data augmentation called MoreStyle.
MoreStyle diversifies image styles by relaxing low-frequency constraints in Fourier space.
With the help of adversarial learning, MoreStyle pinpoints the most intricate style combinations within latent features.
arXiv Detail & Related papers (2024-03-18T11:38:47Z) - In-Domain GAN Inversion for Faithful Reconstruction and Editability [132.68255553099834]
We propose in-domain GAN inversion, which consists of a domain-guided domain-regularized and a encoder to regularize the inverted code in the native latent space of the pre-trained GAN model.
We make comprehensive analyses on the effects of the encoder structure, the starting inversion point, as well as the inversion parameter space, and observe the trade-off between the reconstruction quality and the editing property.
arXiv Detail & Related papers (2023-09-25T08:42:06Z) - Diverse Inpainting and Editing with GAN Inversion [4.234367850767171]
Recent inversion methods have shown that real images can be inverted into StyleGAN's latent space.
In this paper, we tackle an even more difficult task, inverting erased images into GAN's latent space for realistic inpaintings and editings.
arXiv Detail & Related papers (2023-07-27T17:41:36Z) - Intra- & Extra-Source Exemplar-Based Style Synthesis for Improved Domain
Generalization [21.591831983223997]
We propose an exemplar-based style synthesis pipeline to improve domain generalization in semantic segmentation.
Our method is based on a novel masked noise encoder for StyleGAN2 inversion.
We achieve up to $12.4%$ mIoU improvements on driving-scene semantic segmentation under different types of data shifts.
arXiv Detail & Related papers (2023-07-02T19:56:43Z) - Hierarchical Semantic Regularization of Latent Spaces in StyleGANs [53.98170188547775]
We propose a Hierarchical Semantic Regularizer (HSR) which aligns the hierarchical representations learnt by the generator to corresponding powerful features learnt by pretrained networks on large amounts of data.
HSR is shown to not only improve generator representations but also the linearity and smoothness of the latent style spaces, leading to the generation of more natural-looking style-edited images.
arXiv Detail & Related papers (2022-08-07T16:23:33Z) - Cycle Encoding of a StyleGAN Encoder for Improved Reconstruction and
Editability [76.6724135757723]
GAN inversion aims to invert an input image into the latent space of a pre-trained GAN.
Despite the recent advances in GAN inversion, there remain challenges to mitigate the tradeoff between distortion and editability.
We propose a two-step approach that first inverts the input image into a latent code, called pivot code, and then alters the generator so that the input image can be accurately mapped into the pivot code.
arXiv Detail & Related papers (2022-07-19T16:10:16Z) - Overparameterization Improves StyleGAN Inversion [66.8300251627992]
Existing inversion approaches obtain promising yet imperfect results.
We show that this allows us to obtain near-perfect image reconstruction without the need for encoders.
Our approach also retains editability, which we demonstrate by realistically interpolating between images.
arXiv Detail & Related papers (2022-05-12T18:42:43Z) - Style Transformer for Image Inversion and Editing [35.45674653596084]
Existing GAN inversion methods fail to provide latent codes for reliable reconstruction and flexible editing simultaneously.
This paper presents a transformer-based image inversion and editing model for pretrained StyleGAN.
The proposed model employs a CNN encoder to provide multi-scale image features as keys and values.
arXiv Detail & Related papers (2022-03-15T14:16:57Z) - HyperInverter: Improving StyleGAN Inversion via Hypernetwork [12.173568611144628]
Current GAN inversion methods fail to meet at least one of the three requirements listed below: high reconstruction quality, editability, and fast inference.
We present a novel two-phase strategy in this research that fits all requirements at the same time.
Our method is entirely encoder-based, resulting in extremely fast inference.
arXiv Detail & Related papers (2021-12-01T18:56:05Z) - Bi-level Feature Alignment for Versatile Image Translation and
Manipulation [88.5915443957795]
Generative adversarial networks (GANs) have achieved great success in image translation and manipulation.
High-fidelity image generation with faithful style control remains a grand challenge in computer vision.
This paper presents a versatile image translation and manipulation framework that achieves accurate semantic and style guidance.
arXiv Detail & Related papers (2021-07-07T05:26:29Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.