Styleformer: Transformer based Generative Adversarial Networks with
Style Vector
- URL: http://arxiv.org/abs/2106.07023v1
- Date: Sun, 13 Jun 2021 15:30:39 GMT
- Title: Styleformer: Transformer based Generative Adversarial Networks with
Style Vector
- Authors: Jeeseung Park, Younggeun Kim
- Abstract summary: Styleformer is a style-based generator for GAN architecture, but a convolution-free transformer-based generator.
We show how a transformer can generate high-quality images, overcoming the disadvantage that convolution operations are difficult to capture global features in an image.
- Score: 5.025654873456756
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We propose Styleformer, which is a style-based generator for GAN
architecture, but a convolution-free transformer-based generator. In our paper,
we explain how a transformer can generate high-quality images, overcoming the
disadvantage that convolution operations are difficult to capture global
features in an image. Furthermore, we change the demodulation of StyleGAN2 and
modify the existing transformer structure (e.g., residual connection, layer
normalization) to create a strong style-based generator with a convolution-free
structure. We also make Styleformer lighter by applying Linformer, enabling
Styleformer to generate higher resolution images and result in improvements in
terms of speed and memory. We experiment with the low-resolution image dataset
such as CIFAR-10, as well as the high-resolution image dataset like
LSUN-church. Styleformer records FID 2.82 and IS 9.94 on CIFAR-10, a benchmark
dataset, which is comparable performance to the current state-of-the-art and
outperforms all GAN-based generative models, including StyleGAN2-ADA with fewer
parameters on the unconditional setting. We also both achieve new
state-of-the-art with FID 20.11, IS 10.16, and FID 3.66, respectively on STL-10
and CelebA. We release our code at
https://github.com/Jeeseung-Park/Styleformer.
Related papers
- Latents2Semantics: Leveraging the Latent Space of Generative Models for
Localized Style Manipulation of Face Images [25.82631308991067]
We introduce the Latents2Semantics Autoencoder (L2SAE), a Generative Autoencoder model that facilitates localized editing of style attributes of several Regions of Interest in face images.
The L2SAE learns separate latent representations for encoded images' structure and style information, allowing for structure-preserving style editing of the chosen ROIs.
We provide qualitative and quantitative results for the same over multiple applications, such as selective style editing and swapping using test images sampled from several datasets.
arXiv Detail & Related papers (2023-12-22T20:06:53Z) - Isomer: Isomerous Transformer for Zero-shot Video Object Segmentation [59.91357714415056]
We propose two Transformer variants: Context-Sharing Transformer (CST) and Semantic Gathering-Scattering Transformer (S GST)
CST learns the global-shared contextual information within image frames with a lightweight computation; S GST models the semantic correlation separately for the foreground and background.
Compared with the baseline that uses vanilla Transformers for multi-stage fusion, ours significantly increase the speed by 13 times and achieves new state-of-the-art ZVOS performance.
arXiv Detail & Related papers (2023-08-13T06:12:00Z) - Gradient Adjusting Networks for Domain Inversion [82.72289618025084]
StyleGAN2 was demonstrated to be a powerful image generation engine that supports semantic editing.
We present a per-image optimization method that tunes a StyleGAN2 generator such that it achieves a local edit to the generator's weights.
Our experiments show a sizable gap in performance over the current state of the art in this very active domain.
arXiv Detail & Related papers (2023-02-22T14:47:57Z) - StyleNAT: Giving Each Head a New Perspective [71.84791905122052]
We present a new transformer-based framework, dubbed StyleNAT, targeting high-quality image generation with superior efficiency and flexibility.
At the core of our model, is a carefully designed framework that partitions attention heads to capture local and global information.
StyleNAT attains a new SOTA FID score on FFHQ-256 with 2.046, beating prior arts with convolutional models such as StyleGAN-XL and transformers such as HIT and StyleSwin.
arXiv Detail & Related papers (2022-11-10T18:55:48Z) - Hierarchical Semantic Regularization of Latent Spaces in StyleGANs [53.98170188547775]
We propose a Hierarchical Semantic Regularizer (HSR) which aligns the hierarchical representations learnt by the generator to corresponding powerful features learnt by pretrained networks on large amounts of data.
HSR is shown to not only improve generator representations but also the linearity and smoothness of the latent style spaces, leading to the generation of more natural-looking style-edited images.
arXiv Detail & Related papers (2022-08-07T16:23:33Z) - CogView2: Faster and Better Text-to-Image Generation via Hierarchical
Transformers [17.757983821569994]
A new text-to-image system, CogView2, shows very competitive generation compared to concurrent state-of-the-art DALL-E-2.
arXiv Detail & Related papers (2022-04-28T15:51:11Z) - Adaptive Split-Fusion Transformer [90.04885335911729]
We propose an Adaptive Split-Fusion Transformer (ASF-former) to treat convolutional and attention branches differently with adaptive weights.
Experiments on standard benchmarks, such as ImageNet-1K, show that our ASF-former outperforms its CNN, transformer counterparts, and hybrid pilots in terms of accuracy.
arXiv Detail & Related papers (2022-04-26T10:00:28Z) - StyleSwin: Transformer-based GAN for High-resolution Image Generation [28.703687511694305]
We seek to explore using pure transformers to build a generative adversarial network for high-resolution image synthesis.
Proposed generator adopts Swin transformer in a style-based architecture.
We show that offering the knowledge of the absolute position that has been lost in window-based transformers greatly benefits the generation quality.
arXiv Detail & Related papers (2021-12-20T18:59:51Z) - MobileStyleGAN: A Lightweight Convolutional Neural Network for
High-Fidelity Image Synthesis [0.0]
We focus on the performance optimization of style-based generative models.
We introduce MobileStyleGAN architecture, which has x3.5 fewer parameters and is x9.5 less computationally complex than StyleGAN2.
arXiv Detail & Related papers (2021-04-10T13:46:49Z) - TransGAN: Two Transformers Can Make One Strong GAN [111.07699201175919]
We conduct the first pilot study in building a GAN textbfcompletely free of convolutions, using only pure transformer-based architectures.
Our vanilla GAN architecture, dubbed textbfTransGAN, consists of a memory-friendly transformer-based generator.
Our best architecture achieves highly competitive performance compared to current state-of-the-art GANs based on convolutional backbones.
arXiv Detail & Related papers (2021-02-14T05:24:48Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.