Efficient generative adversarial networks using linear
additive-attention Transformers
- URL: http://arxiv.org/abs/2401.09596v1
- Date: Wed, 17 Jan 2024 21:08:41 GMT
- Title: Efficient generative adversarial networks using linear
additive-attention Transformers
- Authors: Emilio Morales-Juarez and Gibran Fuentes-Pineda
- Abstract summary: We present LadaGAN, an efficient generative adversarial network that is built upon a novel Transformer block named Ladaformer.
LadaGAN consistently outperforms existing convolutional and Transformer GANs on benchmark datasets at different resolutions.
- Score: 0.9790236766474198
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Although the capacity of deep generative models for image generation, such as
Diffusion Models (DMs) and Generative Adversarial Networks (GANs), has
dramatically improved in recent years, much of their success can be attributed
to computationally expensive architectures. This has limited their adoption and
use to research laboratories and companies with large resources, while
significantly raising the carbon footprint for training, fine-tuning, and
inference. In this work, we present LadaGAN, an efficient generative
adversarial network that is built upon a novel Transformer block named
Ladaformer. The main component of this block is a linear additive-attention
mechanism that computes a single attention vector per head instead of the
quadratic dot-product attention. We employ Ladaformer in both the generator and
discriminator, which reduces the computational complexity and overcomes the
training instabilities often associated with Transformer GANs. LadaGAN
consistently outperforms existing convolutional and Transformer GANs on
benchmark datasets at different resolutions while being significantly more
efficient. Moreover, LadaGAN shows competitive performance compared to
state-of-the-art multi-step generative models (e.g. DMs) using orders of
magnitude less computational resources.
Related papers
- Revisiting Non-Autoregressive Transformers for Efficient Image Synthesis [82.72941975704374]
Non-autoregressive Transformers (NATs) have been recognized for their rapid generation.
We re-evaluate the full potential of NATs by revisiting the design of their training and inference strategies.
We propose to go beyond existing methods by directly solving the optimal strategies in an automatic framework.
arXiv Detail & Related papers (2024-06-08T13:52:20Z) - HMANet: Hybrid Multi-Axis Aggregation Network for Image Super-Resolution [6.7341750484636975]
Transformer-based networks can only use input information from a limited spatial range.
A novel Hybrid Multi-Axis Aggregation network (HMA) is proposed in this paper to exploit feature potential information better.
The experimental results show that HMA outperforms the state-of-the-art methods on the benchmark dataset.
arXiv Detail & Related papers (2024-05-08T12:14:34Z) - Improving Out-of-Distribution Robustness of Classifiers via Generative
Interpolation [56.620403243640396]
Deep neural networks achieve superior performance for learning from independent and identically distributed (i.i.d.) data.
However, their performance deteriorates significantly when handling out-of-distribution (OoD) data.
We develop a simple yet effective method called Generative Interpolation to fuse generative models trained from multiple domains for synthesizing diverse OoD samples.
arXiv Detail & Related papers (2023-07-23T03:53:53Z) - Exploring the Performance and Efficiency of Transformer Models for NLP
on Mobile Devices [3.809702129519641]
New deep neural network (DNN) architectures and approaches are emerging every few years, driving the field's advancement.
Transformers are a relatively new model family that has achieved new levels of accuracy across AI tasks, but poses significant computational challenges.
This work aims to make steps towards bridging this gap by examining the current state of Transformers' on-device execution.
arXiv Detail & Related papers (2023-06-20T10:15:01Z) - RWKV: Reinventing RNNs for the Transformer Era [54.716108899349614]
We propose a novel model architecture that combines the efficient parallelizable training of transformers with the efficient inference of RNNs.
We scale our models as large as 14 billion parameters, by far the largest dense RNN ever trained, and find RWKV performs on par with similarly sized Transformers.
arXiv Detail & Related papers (2023-05-22T13:57:41Z) - Robust representations of oil wells' intervals via sparse attention
mechanism [2.604557228169423]
We introduce the class of efficient Transformers named Regularized Transformers (Reguformers)
The focus in our experiments is on oil&gas data, namely, well logs.
To evaluate our models for such problems, we work with an industry-scale open dataset consisting of well logs of more than 20 wells.
arXiv Detail & Related papers (2022-12-29T09:56:33Z) - Generative Cooperative Networks for Natural Language Generation [25.090455367573988]
We introduce Generative Cooperative Networks, in which the discriminator architecture is cooperatively used along with the generation policy to output samples of realistic texts.
We give theoretical guarantees of convergence for our approach, and study various efficient decoding schemes to empirically achieve state-of-the-art results in two main NLG tasks.
arXiv Detail & Related papers (2022-01-28T18:36:57Z) - The Nuts and Bolts of Adopting Transformer in GANs [124.30856952272913]
We investigate the properties of Transformer in the generative adversarial network (GAN) framework for high-fidelity image synthesis.
Our study leads to a new alternative design of Transformers in GAN, a convolutional neural network (CNN)-free generator termed as STrans-G.
arXiv Detail & Related papers (2021-10-25T17:01:29Z) - Combining Transformer Generators with Convolutional Discriminators [9.83490307808789]
Recently proposed TransGAN is the first GAN using only transformer-based architectures.
TransGAN requires data augmentation, an auxiliary super-resolution task during training, and a masking prior to guide the self-attention mechanism.
We evaluate our approach by conducting a benchmark of well-known CNN discriminators, ablate the size of the transformer-based generator, and show that combining both architectural elements into a hybrid model leads to better results.
arXiv Detail & Related papers (2021-05-21T07:56:59Z) - Efficient pre-training objectives for Transformers [84.64393460397471]
We study several efficient pre-training objectives for Transformers-based models.
We prove that eliminating the MASK token and considering the whole output during the loss are essential choices to improve performance.
arXiv Detail & Related papers (2021-04-20T00:09:37Z) - Learning Efficient GANs for Image Translation via Differentiable Masks
and co-Attention Distillation [130.30465659190773]
Generative Adversarial Networks (GANs) have been widely-used in image translation, but their high computation and storage costs impede the deployment on mobile devices.
We introduce a novel GAN compression method, termed DMAD, by proposing a Differentiable Mask and a co-Attention Distillation.
Experiments show DMAD can reduce the Multiply Accumulate Operations (MACs) of CycleGAN by 13x and that of Pix2Pix by 4x while retaining a comparable performance against the full model.
arXiv Detail & Related papers (2020-11-17T02:39:19Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.