Diagonal Attention and Style-based GAN for Content-Style Disentanglement
in Image Generation and Translation
- URL: http://arxiv.org/abs/2103.16146v1
- Date: Tue, 30 Mar 2021 08:00:13 GMT
- Title: Diagonal Attention and Style-based GAN for Content-Style Disentanglement
in Image Generation and Translation
- Authors: Gihyun Kwon, Jong Chul Ye
- Abstract summary: We present a novel hierarchical adaptive Diagonal spatial ATtention layers to manipulate the spatial contents from styles in a hierarchical manner.
Our method enables coarse-to-fine level disentanglement of spatial contents and styles.
Our generator can be easily integrated into the GAN inversion framework so that the content and style of translated images can be flexibly controlled.
- Score: 34.24876359759408
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: One of the important research topics in image generative models is to
disentangle the spatial contents and styles for their separate control.
Although StyleGAN can generate content feature vectors from random noises, the
resulting spatial content control is primarily intended for minor spatial
variations, and the disentanglement of global content and styles is by no means
complete. Inspired by a mathematical understanding of normalization and
attention, here we present a novel hierarchical adaptive Diagonal spatial
ATtention (DAT) layers to separately manipulate the spatial contents from
styles in a hierarchical manner. Using DAT and AdaIN, our method enables
coarse-to-fine level disentanglement of spatial contents and styles. In
addition, our generator can be easily integrated into the GAN inversion
framework so that the content and style of translated images from multi-domain
image translation tasks can be flexibly controlled. By using various datasets,
we confirm that the proposed method not only outperforms the existing models in
disentanglement scores, but also provides more flexible control over spatial
features in the generated images.
Related papers
- ZePo: Zero-Shot Portrait Stylization with Faster Sampling [61.14140480095604]
This paper presents an inversion-free portrait stylization framework based on diffusion models that accomplishes content and style feature fusion in merely four sampling steps.
We propose a feature merging strategy to amalgamate redundant features in Consistency Features, thereby reducing the computational load of attention control.
arXiv Detail & Related papers (2024-08-10T08:53:41Z) - SSMG: Spatial-Semantic Map Guided Diffusion Model for Free-form
Layout-to-Image Generation [68.42476385214785]
We propose a novel Spatial-Semantic Map Guided (SSMG) diffusion model that adopts the feature map, derived from the layout, as guidance.
SSMG achieves superior generation quality with sufficient spatial and semantic controllability compared to previous works.
We also propose the Relation-Sensitive Attention (RSA) and Location-Sensitive Attention (LSA) mechanisms.
arXiv Detail & Related papers (2023-08-20T04:09:12Z) - Learning Dynamic Style Kernels for Artistic Style Transfer [26.19086645743083]
We propose a new scheme that learns em spatially adaptive kernels for per-pixel stylization.
Our proposed method outperforms state-of-the-art methods and exhibits superior performance in terms of visual quality and efficiency.
arXiv Detail & Related papers (2023-04-02T00:26:43Z) - Spatial Latent Representations in Generative Adversarial Networks for
Image Generation [0.0]
We define a family of spatial latent spaces for StyleGAN2.
We show that our spaces are effective for image manipulation and encode semantic information well.
arXiv Detail & Related papers (2023-03-25T20:01:11Z) - PARASOL: Parametric Style Control for Diffusion Image Synthesis [18.852986904591358]
PARASOL is a multi-modal synthesis model that enables disentangled, parametric control of the visual style of the image.
We leverage auxiliary semantic and style-based search to create training triplets for supervision of the latent diffusion model.
arXiv Detail & Related papers (2023-03-11T17:30:36Z) - Spatial Steerability of GANs via Self-Supervision from Discriminator [123.27117057804732]
We propose a self-supervised approach to improve the spatial steerability of GANs without searching for steerable directions in the latent space.
Specifically, we design randomly sampled Gaussian heatmaps to be encoded into the intermediate layers of generative models as spatial inductive bias.
During inference, users can interact with the spatial heatmaps in an intuitive manner, enabling them to edit the output image by adjusting the scene layout, moving, or removing objects.
arXiv Detail & Related papers (2023-01-20T07:36:29Z) - Discovering Class-Specific GAN Controls for Semantic Image Synthesis [73.91655061467988]
We propose a novel method for finding spatially disentangled class-specific directions in the latent space of pretrained SIS models.
We show that the latent directions found by our method can effectively control the local appearance of semantic classes.
arXiv Detail & Related papers (2022-12-02T21:39:26Z) - High-fidelity GAN Inversion with Padding Space [38.9258619444968]
Inverting a Generative Adversarial Network (GAN) facilitates a wide range of image editing tasks using pre-trained generators.
Existing methods typically employ the latent space of GANs as the inversion space yet observe the insufficient recovery of spatial details.
We propose to involve the padding space of the generator to complement the latent space with spatial information.
arXiv Detail & Related papers (2022-03-21T16:32:12Z) - Low-Rank Subspaces in GANs [101.48350547067628]
This work introduces low-rank subspaces that enable more precise control of GAN generation.
LowRankGAN is able to find the low-dimensional representation of attribute manifold.
Experiments on state-of-the-art GAN models (including StyleGAN2 and BigGAN) trained on various datasets demonstrate the effectiveness of our LowRankGAN.
arXiv Detail & Related papers (2021-06-08T16:16:32Z) - Style Intervention: How to Achieve Spatial Disentanglement with
Style-based Generators? [100.60938767993088]
We propose a lightweight optimization-based algorithm which could adapt to arbitrary input images and render natural translation effects under flexible objectives.
We verify the performance of the proposed framework in facial attribute editing on high-resolution images, where both photo-realism and consistency are required.
arXiv Detail & Related papers (2020-11-19T07:37:31Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.