Combining Transformer Generators with Convolutional Discriminators
- URL: http://arxiv.org/abs/2105.10189v1
- Date: Fri, 21 May 2021 07:56:59 GMT
- Title: Combining Transformer Generators with Convolutional Discriminators
- Authors: Ricard Durall, Stanislav Frolov, Andreas Dengel, Janis Keuper
- Abstract summary: Recently proposed TransGAN is the first GAN using only transformer-based architectures.
TransGAN requires data augmentation, an auxiliary super-resolution task during training, and a masking prior to guide the self-attention mechanism.
We evaluate our approach by conducting a benchmark of well-known CNN discriminators, ablate the size of the transformer-based generator, and show that combining both architectural elements into a hybrid model leads to better results.
- Score: 9.83490307808789
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Transformer models have recently attracted much interest from computer vision
researchers and have since been successfully employed for several problems
traditionally addressed with convolutional neural networks. At the same time,
image synthesis using generative adversarial networks (GANs) has drastically
improved over the last few years. The recently proposed TransGAN is the first
GAN using only transformer-based architectures and achieves competitive results
when compared to convolutional GANs. However, since transformers are
data-hungry architectures, TransGAN requires data augmentation, an auxiliary
super-resolution task during training, and a masking prior to guide the
self-attention mechanism. In this paper, we study the combination of a
transformer-based generator and convolutional discriminator and successfully
remove the need of the aforementioned required design choices. We evaluate our
approach by conducting a benchmark of well-known CNN discriminators, ablate the
size of the transformer-based generator, and show that combining both
architectural elements into a hybrid model leads to better results.
Furthermore, we investigate the frequency spectrum properties of generated
images and observe that our model retains the benefits of an attention based
generator.
Related papers
- Efficient generative adversarial networks using linear additive-attention Transformers [0.8287206589886879]
We present LadaGAN, an efficient generative adversarial network that is built upon a novel Transformer block named Ladaformer.
LadaGAN consistently outperforms existing convolutional and Transformer GANs on benchmark datasets at different resolutions.
arXiv Detail & Related papers (2024-01-17T21:08:41Z) - Structural Prior Guided Generative Adversarial Transformers for
Low-Light Image Enhancement [51.22694467126883]
We propose an effective Structural Prior guided Generative Adversarial Transformer (SPGAT) to solve low-light image enhancement.
The generator is based on a U-shaped Transformer which is used to explore non-local information for better clear image restoration.
To generate more realistic images, we develop a new structural prior guided adversarial learning method by building the skip connections between the generator and discriminators.
arXiv Detail & Related papers (2022-07-16T04:05:40Z) - Transformer based Generative Adversarial Network for Liver Segmentation [4.317557160310758]
We propose a new segmentation approach using a hybrid approach combining the Transformer(s) with the Generative Adversarial Network (GAN) approach.
Our model achieved a high dice coefficient of 0.9433, recall of 0.9515, and precision of 0.9376 and outperformed other Transformer based approaches.
arXiv Detail & Related papers (2022-05-21T19:55:43Z) - Rich CNN-Transformer Feature Aggregation Networks for Super-Resolution [50.10987776141901]
Recent vision transformers along with self-attention have achieved promising results on various computer vision tasks.
We introduce an effective hybrid architecture for super-resolution (SR) tasks, which leverages local features from CNNs and long-range dependencies captured by transformers.
Our proposed method achieves state-of-the-art SR results on numerous benchmark datasets.
arXiv Detail & Related papers (2022-03-15T06:52:25Z) - The Nuts and Bolts of Adopting Transformer in GANs [124.30856952272913]
We investigate the properties of Transformer in the generative adversarial network (GAN) framework for high-fidelity image synthesis.
Our study leads to a new alternative design of Transformers in GAN, a convolutional neural network (CNN)-free generator termed as STrans-G.
arXiv Detail & Related papers (2021-10-25T17:01:29Z) - Visual Saliency Transformer [127.33678448761599]
We develop a novel unified model based on a pure transformer, Visual Saliency Transformer (VST), for both RGB and RGB-D salient object detection (SOD)
It takes image patches as inputs and leverages the transformer to propagate global contexts among image patches.
Experimental results show that our model outperforms existing state-of-the-art results on both RGB and RGB-D SOD benchmark datasets.
arXiv Detail & Related papers (2021-04-25T08:24:06Z) - Transformers Solve the Limited Receptive Field for Monocular Depth
Prediction [82.90445525977904]
We propose TransDepth, an architecture which benefits from both convolutional neural networks and transformers.
This is the first paper which applies transformers into pixel-wise prediction problems involving continuous labels.
arXiv Detail & Related papers (2021-03-22T18:00:13Z) - TransGAN: Two Transformers Can Make One Strong GAN [111.07699201175919]
We conduct the first pilot study in building a GAN textbfcompletely free of convolutions, using only pure transformer-based architectures.
Our vanilla GAN architecture, dubbed textbfTransGAN, consists of a memory-friendly transformer-based generator.
Our best architecture achieves highly competitive performance compared to current state-of-the-art GANs based on convolutional backbones.
arXiv Detail & Related papers (2021-02-14T05:24:48Z) - Toward Transformer-Based Object Detection [12.704056181392415]
Vision Transformers can be used as a backbone by a common detection task head to produce competitive COCO results.
ViT-FRCNN demonstrates several known properties associated with transformers, including large pretraining capacity and fast fine-tuning performance.
We view ViT-FRCNN as an important stepping stone toward a pure-transformer solution of complex vision tasks such as object detection.
arXiv Detail & Related papers (2020-12-17T22:33:14Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.