Related papers: Combining Transformer Generators with Convolutional Discriminators

Combining Transformer Generators with Convolutional Discriminators

URL: http://arxiv.org/abs/2105.10189v1
Date: Fri, 21 May 2021 07:56:59 GMT
Title: Combining Transformer Generators with Convolutional Discriminators
Authors: Ricard Durall, Stanislav Frolov, Andreas Dengel, Janis Keuper
Abstract summary: Recently proposed TransGAN is the first GAN using only transformer-based architectures. TransGAN requires data augmentation, an auxiliary super-resolution task during training, and a masking prior to guide the self-attention mechanism. We evaluate our approach by conducting a benchmark of well-known CNN discriminators, ablate the size of the transformer-based generator, and show that combining both architectural elements into a hybrid model leads to better results.
Score: 9.83490307808789
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Transformer models have recently attracted much interest from computer vision researchers and have since been successfully employed for several problems traditionally addressed with convolutional neural networks. At the same time, image synthesis using generative adversarial networks (GANs) has drastically improved over the last few years. The recently proposed TransGAN is the first GAN using only transformer-based architectures and achieves competitive results when compared to convolutional GANs. However, since transformers are data-hungry architectures, TransGAN requires data augmentation, an auxiliary super-resolution task during training, and a masking prior to guide the self-attention mechanism. In this paper, we study the combination of a transformer-based generator and convolutional discriminator and successfully remove the need of the aforementioned required design choices. We evaluate our approach by conducting a benchmark of well-known CNN discriminators, ablate the size of the transformer-based generator, and show that combining both architectural elements into a hybrid model leads to better results. Furthermore, we investigate the frequency spectrum properties of generated images and observe that our model retains the benefits of an attention based generator.

Related papers

SSTAF: Spatial-Spectral-Temporal Attention Fusion Transformer for Motor Imagery Classification [0.0]
Brain-computer interfaces (BCI) in electroencephalography (EEG)-based motor imagery classification offer promising solutions in neurorehabilitation and assistive technologies. Non-stationary nature of EEG signals and significant inter-subject variability cause substantial challenges for developing robust cross-subject classification models. This paper introduces a novel Spatial-Spectral-Temporal Attention Fusion (SSTAF) Transformer specifically designed for upper-limb motor imagery classification.
arXiv Detail & Related papers (2025-04-17T07:45:14Z)
Efficient generative adversarial networks using linear additive-attention Transformers [0.8287206589886879]
We present a novel GAN architecture based on a linear attention Transformer block named Ladaformer. LadaGAN consistently outperforms existing convolutional and Transformer GANs on benchmark datasets at different resolutions. LadaGAN shows competitive performance compared to state-of-the-art multi-step generative models.
arXiv Detail & Related papers (2024-01-17T21:08:41Z)
Structural Prior Guided Generative Adversarial Transformers for Low-Light Image Enhancement [51.22694467126883]
We propose an effective Structural Prior guided Generative Adversarial Transformer (SPGAT) to solve low-light image enhancement. The generator is based on a U-shaped Transformer which is used to explore non-local information for better clear image restoration. To generate more realistic images, we develop a new structural prior guided adversarial learning method by building the skip connections between the generator and discriminators.
arXiv Detail & Related papers (2022-07-16T04:05:40Z)
Transformer based Generative Adversarial Network for Liver Segmentation [4.317557160310758]
We propose a new segmentation approach using a hybrid approach combining the Transformer(s) with the Generative Adversarial Network (GAN) approach. Our model achieved a high dice coefficient of 0.9433, recall of 0.9515, and precision of 0.9376 and outperformed other Transformer based approaches.
arXiv Detail & Related papers (2022-05-21T19:55:43Z)
Rich CNN-Transformer Feature Aggregation Networks for Super-Resolution [50.10987776141901]
Recent vision transformers along with self-attention have achieved promising results on various computer vision tasks. We introduce an effective hybrid architecture for super-resolution (SR) tasks, which leverages local features from CNNs and long-range dependencies captured by transformers. Our proposed method achieves state-of-the-art SR results on numerous benchmark datasets.
arXiv Detail & Related papers (2022-03-15T06:52:25Z)
The Nuts and Bolts of Adopting Transformer in GANs [124.30856952272913]
We investigate the properties of Transformer in the generative adversarial network (GAN) framework for high-fidelity image synthesis. Our study leads to a new alternative design of Transformers in GAN, a convolutional neural network (CNN)-free generator termed as STrans-G.
arXiv Detail & Related papers (2021-10-25T17:01:29Z)
Visual Saliency Transformer [127.33678448761599]
We develop a novel unified model based on a pure transformer, Visual Saliency Transformer (VST), for both RGB and RGB-D salient object detection (SOD) It takes image patches as inputs and leverages the transformer to propagate global contexts among image patches. Experimental results show that our model outperforms existing state-of-the-art results on both RGB and RGB-D SOD benchmark datasets.
arXiv Detail & Related papers (2021-04-25T08:24:06Z)
Transformers Solve the Limited Receptive Field for Monocular Depth Prediction [82.90445525977904]
We propose TransDepth, an architecture which benefits from both convolutional neural networks and transformers. This is the first paper which applies transformers into pixel-wise prediction problems involving continuous labels.
arXiv Detail & Related papers (2021-03-22T18:00:13Z)
TransGAN: Two Transformers Can Make One Strong GAN [111.07699201175919]
We conduct the first pilot study in building a GAN textbfcompletely free of convolutions, using only pure transformer-based architectures. Our vanilla GAN architecture, dubbed textbfTransGAN, consists of a memory-friendly transformer-based generator. Our best architecture achieves highly competitive performance compared to current state-of-the-art GANs based on convolutional backbones.
arXiv Detail & Related papers (2021-02-14T05:24:48Z)
Toward Transformer-Based Object Detection [12.704056181392415]
Vision Transformers can be used as a backbone by a common detection task head to produce competitive COCO results. ViT-FRCNN demonstrates several known properties associated with transformers, including large pretraining capacity and fast fine-tuning performance. We view ViT-FRCNN as an important stepping stone toward a pure-transformer solution of complex vision tasks such as object detection.
arXiv Detail & Related papers (2020-12-17T22:33:14Z)

This list is automatically generated from the titles and abstracts of the papers in this site.