Exploring Transformers in Natural Language Generation: GPT, BERT, and
XLNet
- URL: http://arxiv.org/abs/2102.08036v1
- Date: Tue, 16 Feb 2021 09:18:16 GMT
- Title: Exploring Transformers in Natural Language Generation: GPT, BERT, and
XLNet
- Authors: M. Onat Topal, Anil Bas, Imke van Heerden
- Abstract summary: Recent years have seen a proliferation of attention mechanisms and the rise of Transformers in Natural Language Generation (NLG)
In this paper, we explore three major Transformer-based models, namely GPT, BERT, and XLNet.
From poetry generation to summarization, text generation derives benefit as Transformer-based language models achieve groundbreaking results.
- Score: 1.8047694351309207
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Recent years have seen a proliferation of attention mechanisms and the rise
of Transformers in Natural Language Generation (NLG). Previously,
state-of-the-art NLG architectures such as RNN and LSTM ran into vanishing
gradient problems; as sentences grew larger, distance between positions
remained linear, and sequential computation hindered parallelization since
sentences were processed word by word. Transformers usher in a new era. In this
paper, we explore three major Transformer-based models, namely GPT, BERT, and
XLNet, that carry significant implications for the field. NLG is a burgeoning
area that is now bolstered with rapid developments in attention mechanisms.
From poetry generation to summarization, text generation derives benefit as
Transformer-based language models achieve groundbreaking results.
Related papers
- GLoT: A Novel Gated-Logarithmic Transformer for Efficient Sign Language Translation [0.0]
We propose a novel Gated-Logarithmic Transformer (GLoT) that captures the long-term temporal dependencies of the sign language as a time-series data.
Our results demonstrate that GLoT consistently outperforms the other models across all metrics.
arXiv Detail & Related papers (2025-02-17T14:31:00Z) - Repeat After Me: Transformers are Better than State Space Models at Copying [53.47717661441142]
We show that while generalized state space models are promising in terms of inference-time efficiency, they are limited compared to transformer models on tasks that require copying from the input context.
arXiv Detail & Related papers (2024-02-01T21:44:11Z) - Anatomy of Neural Language Models [0.0]
Transformer-based Language Models (LMs) have led to new state-of-the-art results in a wide spectrum of applications.
Transformers pretrained on language-modeling-like tasks have been widely adopted in computer vision and time series applications.
arXiv Detail & Related papers (2024-01-08T10:27:25Z) - Comparing Generalization in Learning with Limited Numbers of Exemplars:
Transformer vs. RNN in Attractor Dynamics [3.5353632767823497]
ChatGPT, a widely-recognized large language model (LLM), has recently gained substantial attention for its performance scaling.
This raises a crucial question about Transformer's generalization-in-learning (GIL) capacity.
We compare Transformer's GIL capabilities with those of a traditional Recurrent Neural Network (RNN) in tasks involving attractor dynamics learning.
arXiv Detail & Related papers (2023-11-15T00:37:49Z) - Attention Is Not All You Need Anymore [3.9693969407364427]
We propose a family of drop-in replacements for the self-attention mechanism in the Transformer.
Experimental results show that replacing the self-attention mechanism with the SHE evidently improves the performance of the Transformer.
The proposed Extractors have the potential or are able to run faster than the self-attention mechanism.
arXiv Detail & Related papers (2023-08-15T09:24:38Z) - A Comprehensive Survey on Applications of Transformers for Deep Learning
Tasks [60.38369406877899]
Transformer is a deep neural network that employs a self-attention mechanism to comprehend the contextual relationships within sequential data.
transformer models excel in handling long dependencies between input sequence elements and enable parallel processing.
Our survey encompasses the identification of the top five application domains for transformer-based models.
arXiv Detail & Related papers (2023-06-11T23:13:51Z) - Transformers learn in-context by gradient descent [58.24152335931036]
Training Transformers on auto-regressive objectives is closely related to gradient-based meta-learning formulations.
We show how trained Transformers become mesa-optimizers i.e. learn models by gradient descent in their forward pass.
arXiv Detail & Related papers (2022-12-15T09:21:21Z) - Leveraging Pre-trained Models for Failure Analysis Triplets Generation [0.0]
We leverage the attention mechanism of pre-trained causal language models such as Transformer model for the downstream task of generating Failure Analysis Triplets (FATs)
We observe that Generative Pre-trained Transformer 2 (GPT2) outperformed other transformer model for the failure analysis triplet generation (FATG) task.
In particular, we observe that GPT2 (trained on 1.5B parameters) outperforms pre-trained BERT, BART and GPT3 by a large margin on ROUGE.
arXiv Detail & Related papers (2022-10-31T17:21:15Z) - Glancing Transformer for Non-Autoregressive Neural Machine Translation [58.87258329683682]
We propose a method to learn word interdependency for single-pass parallel generation models.
With only single-pass parallel decoding, GLAT is able to generate high-quality translation with 8-15 times speedup.
arXiv Detail & Related papers (2020-08-18T13:04:03Z) - Learning Source Phrase Representations for Neural Machine Translation [65.94387047871648]
We propose an attentive phrase representation generation mechanism which is able to generate phrase representations from corresponding token representations.
In our experiments, we obtain significant improvements on the WMT 14 English-German and English-French tasks on top of the strong Transformer baseline.
arXiv Detail & Related papers (2020-06-25T13:43:11Z) - Applying the Transformer to Character-level Transduction [68.91664610425114]
The transformer has been shown to outperform recurrent neural network-based sequence-to-sequence models in various word-level NLP tasks.
We show that with a large enough batch size, the transformer does indeed outperform recurrent models for character-level tasks.
arXiv Detail & Related papers (2020-05-20T17:25:43Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.