Related papers: Can the Transformer Be Used as a Drop-in Replacement for RNNs in Text-Generating GANs?

Can the Transformer Be Used as a Drop-in Replacement for RNNs in Text-Generating GANs?

URL: http://arxiv.org/abs/2108.12275v1
Date: Thu, 26 Aug 2021 14:15:36 GMT
Title: Can the Transformer Be Used as a Drop-in Replacement for RNNs in Text-Generating GANs?
Authors: Kevin Blin and Andrei Kucharavy
Abstract summary: We use a well-performing text generative adversarial network (GAN) architecture - Diversity-Promoting GAN (DPGAN) We attempted a drop-in replacement of the LSTM layer with a self-attention-based Transformer layer in order to leverage their efficiency. The resulting Self-Attention DPGAN (SADPGAN) was evaluated for performance, quality and diversity of generated text and stability.
Score: 0.0
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: In this paper we address the problem of fine-tuned text generation with a limited computational budget. For that, we use a well-performing text generative adversarial network (GAN) architecture - Diversity-Promoting GAN (DPGAN), and attempted a drop-in replacement of the LSTM layer with a self-attention-based Transformer layer in order to leverage their efficiency. The resulting Self-Attention DPGAN (SADPGAN) was evaluated for performance, quality and diversity of generated text and stability. Computational experiments suggested that a transformer architecture is unable to drop-in replace the LSTM layer, under-performing during the pre-training phase and undergoing a complete mode collapse during the GAN tuning phase. Our results suggest that the transformer architecture need to be adapted before it can be used as a replacement for RNNs in text-generating GANs.

Related papers

Hyper-Transforming Latent Diffusion Models [16.86455404636477]
We introduce a novel generative framework for functions by integrating Implicit Neural Representations (INRs) and Transformer-based hypernetworks into latent variable models. Our method employs a Transformer-based decoder to generate INR parameters from latent variables, addressing both representation capacity and computational efficiency.
arXiv Detail & Related papers (2025-04-23T10:01:18Z)
Does Transformer Interpretability Transfer to RNNs? [0.6437284704257459]
Recent advances in recurrent neural network architectures have enabled RNNs to match or exceed the performance of equal-size transformers. We show that it is possible to improve some of these techniques by taking advantage of RNNs' compressed state.
arXiv Detail & Related papers (2024-04-09T02:59:17Z)
Self-Supervised Pre-Training for Table Structure Recognition Transformer [25.04573593082671]
We propose a self-supervised pre-training (SSP) method for table structure recognition transformers. We discover that the performance gap between the linear projection transformer and the hybrid CNN-transformer can be mitigated by SSP of the visual encoder in the TSR model.
arXiv Detail & Related papers (2024-02-23T19:34:06Z)
Probing the limit of hydrologic predictability with the Transformer network [7.326504492614808]
We show that a vanilla Transformer architecture is not competitive against LSTM on the widely benchmarked CAMELS dataset. A recurrence-free variant of Transformer can obtain mixed comparisons with LSTM, producing the same Kling-Gupta efficiency coefficient (KGE) along with other metrics. While the Transformer results are not higher than current state-of-the-art, we still learned some valuable lessons.
arXiv Detail & Related papers (2023-06-21T17:06:54Z)
Optimizing Non-Autoregressive Transformers with Contrastive Learning [74.46714706658517]
Non-autoregressive Transformers (NATs) reduce the inference latency of Autoregressive Transformers (ATs) by predicting words all at once rather than in sequential order. In this paper, we propose to ease the difficulty of modality learning via sampling from the model distribution instead of the data distribution.
arXiv Detail & Related papers (2023-05-23T04:20:13Z)
Learning Bounded Context-Free-Grammar via LSTM and the Transformer:Difference and Explanations [51.77000472945441]
Long Short-Term Memory (LSTM) and Transformers are two popular neural architectures used for natural language processing tasks. In practice, it is often observed that Transformer models have better representation power than LSTM. We study such practical differences between LSTM and Transformer and propose an explanation based on their latent space decomposition patterns.
arXiv Detail & Related papers (2021-12-16T19:56:44Z)
The Nuts and Bolts of Adopting Transformer in GANs [124.30856952272913]
We investigate the properties of Transformer in the generative adversarial network (GAN) framework for high-fidelity image synthesis. Our study leads to a new alternative design of Transformers in GAN, a convolutional neural network (CNN)-free generator termed as STrans-G.
arXiv Detail & Related papers (2021-10-25T17:01:29Z)
Combining Transformer Generators with Convolutional Discriminators [9.83490307808789]
Recently proposed TransGAN is the first GAN using only transformer-based architectures. TransGAN requires data augmentation, an auxiliary super-resolution task during training, and a masking prior to guide the self-attention mechanism. We evaluate our approach by conducting a benchmark of well-known CNN discriminators, ablate the size of the transformer-based generator, and show that combining both architectural elements into a hybrid model leads to better results.
arXiv Detail & Related papers (2021-05-21T07:56:59Z)
Bayesian Transformer Language Models for Speech Recognition [59.235405107295655]
State-of-the-art neural language models (LMs) represented by Transformers are highly complex. This paper proposes a full Bayesian learning framework for Transformer LM estimation.
arXiv Detail & Related papers (2021-02-09T10:55:27Z)
Rewiring the Transformer with Depth-Wise LSTMs [55.50278212605607]
We present a Transformer with depth-wise LSTMs connecting cascading Transformer layers and sub-layers. Experiments with the 6-layer Transformer show significant BLEU improvements in both WMT 14 English-German / French tasks and the OPUS-100 many-to-many multilingual NMT task.
arXiv Detail & Related papers (2020-07-13T09:19:34Z)
Applying the Transformer to Character-level Transduction [68.91664610425114]
The transformer has been shown to outperform recurrent neural network-based sequence-to-sequence models in various word-level NLP tasks. We show that with a large enough batch size, the transformer does indeed outperform recurrent models for character-level tasks.
arXiv Detail & Related papers (2020-05-20T17:25:43Z)

This list is automatically generated from the titles and abstracts of the papers in this site.