StyleSpace Analysis: Disentangled Controls for StyleGAN Image Generation
- URL: http://arxiv.org/abs/2011.12799v2
- Date: Thu, 3 Dec 2020 17:30:00 GMT
- Title: StyleSpace Analysis: Disentangled Controls for StyleGAN Image Generation
- Authors: Zongze Wu, Dani Lischinski, Eli Shechtman
- Abstract summary: We explore and analyze the latent style space of StyleGAN2, a state-of-the-art architecture for image generation.
StyleSpace is significantly more disentangled than the other intermediate latent spaces explored by previous works.
Our findings pave the way to semantically meaningful and well-disentangled image manipulations via simple and intuitive interfaces.
- Score: 45.20783737095007
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We explore and analyze the latent style space of StyleGAN2, a
state-of-the-art architecture for image generation, using models pretrained on
several different datasets. We first show that StyleSpace, the space of
channel-wise style parameters, is significantly more disentangled than the
other intermediate latent spaces explored by previous works. Next, we describe
a method for discovering a large collection of style channels, each of which is
shown to control a distinct visual attribute in a highly localized and
disentangled manner. Third, we propose a simple method for identifying style
channels that control a specific attribute, using a pretrained classifier or a
small number of example images. Manipulation of visual attributes via these
StyleSpace controls is shown to be better disentangled than via those proposed
in previous works. To show this, we make use of a newly proposed Attribute
Dependency metric. Finally, we demonstrate the applicability of StyleSpace
controls to the manipulation of real images. Our findings pave the way to
semantically meaningful and well-disentangled image manipulations via simple
and intuitive interfaces.
Related papers
- StyleShot: A Snapshot on Any Style [20.41380860802149]
We show that, a good style representation is crucial and sufficient for generalized style transfer without test-time tuning.
We achieve this through constructing a style-aware encoder and a well-organized style dataset called StyleGallery.
We highlight that, our approach, named StyleShot, is simple yet effective in mimicking various desired styles, without test-time tuning.
arXiv Detail & Related papers (2024-07-01T16:05:18Z) - SwipeGANSpace: Swipe-to-Compare Image Generation via Efficient Latent Space Exploration [3.864321514889098]
We propose a novel approach that uses simple user-swipe interactions to generate preferred images for users.
To effectively explore the latent space with only swipe interactions, we apply principal component analysis to the latent space of the StyleGAN.
We use a multi-armed bandit algorithm to decide the dimensions to explore, focusing on the preferences of the user.
arXiv Detail & Related papers (2024-04-30T16:37:27Z) - Visual Captioning at Will: Describing Images and Videos Guided by a Few
Stylized Sentences [49.66987347397398]
Few-Shot Stylized Visual Captioning aims to generate captions in any desired style, using only a few examples as guidance during inference.
We propose a framework called FS-StyleCap for this task, which utilizes a conditional encoder-decoder language model and a visual projection module.
arXiv Detail & Related papers (2023-07-31T04:26:01Z) - Any-to-Any Style Transfer: Making Picasso and Da Vinci Collaborate [58.83278629019384]
Style transfer aims to render the style of a given image for style reference to another given image for content reference.
Existing approaches either apply the holistic style of the style image in a global manner, or migrate local colors and textures of the style image to the content counterparts in a pre-defined way.
We propose Any-to-Any Style Transfer, which enables users to interactively select styles of regions in the style image and apply them to the prescribed content regions.
arXiv Detail & Related papers (2023-04-19T15:15:36Z) - Attribute-Specific Manipulation Based on Layer-Wise Channels [11.063763802330142]
Some studies have focused on detecting channels with specific properties to manipulate the latent code.
We propose a novel detection method in the context of pre-trained classifiers.
Our methods can accurately detect relevant channels for a large number of face attributes.
arXiv Detail & Related papers (2023-02-18T08:49:20Z) - Everything is There in Latent Space: Attribute Editing and Attribute
Style Manipulation by StyleGAN Latent Space Exploration [39.18239951479647]
We present Few-shot Latent-based Attribute Manipulation and Editing (FLAME)
FLAME is a framework to perform highly controlled image editing by latent space manipulation.
We generate diverse attribute styles in disentangled manner.
arXiv Detail & Related papers (2022-07-20T12:40:32Z) - Domain Enhanced Arbitrary Image Style Transfer via Contrastive Learning [84.8813842101747]
Contrastive Arbitrary Style Transfer (CAST) is a new style representation learning and style transfer method via contrastive learning.
Our framework consists of three key components, i.e., a multi-layer style projector for style code encoding, a domain enhancement module for effective learning of style distribution, and a generative network for image style transfer.
arXiv Detail & Related papers (2022-05-19T13:11:24Z) - Attribute-specific Control Units in StyleGAN for Fine-grained Image
Manipulation [57.99007520795998]
We discover attribute-specific control units, which consist of multiple channels of feature maps and modulation styles.
Specifically, we collaboratively manipulate the modulation style channels and feature maps in control units to obtain the semantic and spatial disentangled controls.
We move the modulation style along a specific sparse direction vector and replace the filter-wise styles used to compute the feature maps to manipulate these control units.
arXiv Detail & Related papers (2021-11-25T10:42:10Z) - StyleCLIP: Text-Driven Manipulation of StyleGAN Imagery [71.1862388442953]
We develop a text-based interface for StyleGAN image manipulation.
We first introduce an optimization scheme that utilizes a CLIP-based loss to modify an input latent vector in response to a user-provided text prompt.
Next, we describe a latent mapper that infers a text-guided latent manipulation step for a given input image, allowing faster and more stable text-based manipulation.
arXiv Detail & Related papers (2021-03-31T17:51:25Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.