StyleFusion: A Generative Model for Disentangling Spatial Segments
- URL: http://arxiv.org/abs/2107.07437v1
- Date: Thu, 15 Jul 2021 16:35:21 GMT
- Title: StyleFusion: A Generative Model for Disentangling Spatial Segments
- Authors: Omer Kafri, Or Patashnik, Yuval Alaluf, Daniel Cohen-Or
- Abstract summary: We present StyleFusion, a new mapping architecture for StyleGAN.
StyleFusion takes as input a number of latent codes and fuses them into a single style code.
It provides fine-grained control over each region of the generated image.
- Score: 41.35834479560669
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We present StyleFusion, a new mapping architecture for StyleGAN, which takes
as input a number of latent codes and fuses them into a single style code.
Inserting the resulting style code into a pre-trained StyleGAN generator
results in a single harmonized image in which each semantic region is
controlled by one of the input latent codes. Effectively, StyleFusion yields a
disentangled representation of the image, providing fine-grained control over
each region of the generated image. Moreover, to help facilitate global control
over the generated image, a special input latent code is incorporated into the
fused representation. StyleFusion operates in a hierarchical manner, where each
level is tasked with learning to disentangle a pair of image regions (e.g., the
car body and wheels). The resulting learned disentanglement allows one to
modify both local, fine-grained semantics (e.g., facial features) as well as
more global features (e.g., pose and background), providing improved
flexibility in the synthesis process. As a natural extension, StyleFusion
enables one to perform semantically-aware cross-image mixing of regions that
are not necessarily aligned. Finally, we demonstrate how StyleFusion can be
paired with existing editing techniques to more faithfully constrain the edit
to the user's region of interest.
Related papers
- Semantic Image Synthesis via Class-Adaptive Cross-Attention [7.147779225315707]
Cross-attention layers are used in place of SPADE for learning shape-style correlations and so conditioning the image generation process.
Our model inherits the versatility of SPADE, at the same time obtaining state-of-the-art generation quality, as well as improved global and local style transfer.
arXiv Detail & Related papers (2023-08-30T14:49:34Z) - Any-to-Any Style Transfer: Making Picasso and Da Vinci Collaborate [58.83278629019384]
Style transfer aims to render the style of a given image for style reference to another given image for content reference.
Existing approaches either apply the holistic style of the style image in a global manner, or migrate local colors and textures of the style image to the content counterparts in a pre-defined way.
We propose Any-to-Any Style Transfer, which enables users to interactively select styles of regions in the style image and apply them to the prescribed content regions.
arXiv Detail & Related papers (2023-04-19T15:15:36Z) - Gradient Adjusting Networks for Domain Inversion [82.72289618025084]
StyleGAN2 was demonstrated to be a powerful image generation engine that supports semantic editing.
We present a per-image optimization method that tunes a StyleGAN2 generator such that it achieves a local edit to the generator's weights.
Our experiments show a sizable gap in performance over the current state of the art in this very active domain.
arXiv Detail & Related papers (2023-02-22T14:47:57Z) - FlexIT: Towards Flexible Semantic Image Translation [59.09398209706869]
We propose FlexIT, a novel method which can take any input image and a user-defined text instruction for editing.
First, FlexIT combines the input image and text into a single target point in the CLIP multimodal embedding space.
We iteratively transform the input image toward the target point, ensuring coherence and quality with a variety of novel regularization terms.
arXiv Detail & Related papers (2022-03-09T13:34:38Z) - Local and Global GANs with Semantic-Aware Upsampling for Image
Generation [201.39323496042527]
We consider generating images using local context.
We propose a class-specific generative network using semantic maps as guidance.
Lastly, we propose a novel semantic-aware upsampling method.
arXiv Detail & Related papers (2022-02-28T19:24:25Z) - SemanticStyleGAN: Learning Compositional Generative Priors for
Controllable Image Synthesis and Editing [35.02841064647306]
StyleGANs provide promising prior models for downstream tasks on image synthesis and editing.
We present SemanticStyleGAN, where a generator is trained to model local semantic parts separately and synthesizes images in a compositional way.
arXiv Detail & Related papers (2021-12-04T04:17:11Z) - Manifold Alignment for Semantically Aligned Style Transfer [61.1274057338588]
We make a new assumption that image features from the same semantic region form a manifold and an image with multiple semantic regions follows a multi-manifold distribution.
Based on this assumption, the style transfer problem is formulated as aligning two multi-manifold distributions.
The proposed framework allows semantically similar regions between the output and the style image share similar style patterns.
arXiv Detail & Related papers (2020-05-21T16:52:37Z) - Learning Layout and Style Reconfigurable GANs for Controllable Image
Synthesis [12.449076001538552]
This paper focuses on a recent emerged task, layout-to-image, to learn generative models capable of synthesizing photo-realistic images from spatial layout.
Style control at the image level is the same as in vanilla GANs, while style control at the object mask level is realized by a proposed novel feature normalization scheme.
In experiments, the proposed method is tested in the COCO-Stuff dataset and the Visual Genome dataset with state-of-the-art performance obtained.
arXiv Detail & Related papers (2020-03-25T18:16:05Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.