Related papers: Enhanced Transformer Architecture for Natural Language Processing

Enhanced Transformer Architecture for Natural Language Processing

URL: http://arxiv.org/abs/2310.10930v1
Date: Tue, 17 Oct 2023 01:59:07 GMT
Title: Enhanced Transformer Architecture for Natural Language Processing
Authors: Woohyeon Moon, Taeyoung Kim, Bumgeun Park and Dongsoo Har
Abstract summary: Transformer is a state-of-the-art model in the field of natural language processing (NLP) In this paper, a novel structure of Transformer is proposed. It is featured by full layer normalization, weighted residual connection, positional encoding exploiting reinforcement learning, and zero masked self-attention. The proposed Transformer model, which is called Enhanced Transformer, is validated by the bilingual evaluation understudy (BLEU) score obtained with the Multi30k translation dataset.
Score: 2.6071653283020915
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Transformer is a state-of-the-art model in the field of natural language processing (NLP). Current NLP models primarily increase the number of transformers to improve processing performance. However, this technique requires a lot of training resources such as computing capacity. In this paper, a novel structure of Transformer is proposed. It is featured by full layer normalization, weighted residual connection, positional encoding exploiting reinforcement learning, and zero masked self-attention. The proposed Transformer model, which is called Enhanced Transformer, is validated by the bilingual evaluation understudy (BLEU) score obtained with the Multi30k translation dataset. As a result, the Enhanced Transformer achieves 202.96% higher BLEU score as compared to the original transformer with the translation dataset.

Related papers

Quick Back-Translation for Unsupervised Machine Translation [9.51657235413336]
We propose a two-for-one improvement to Transformer back-translation: Quick Back-Translation (QBT) QBT re-purposes the encoder as a generative model, and uses encoder-generated sequences to train the decoder. Experiments on various WMT benchmarks demonstrate that QBT dramatically outperforms standard back-translation only method in terms of training efficiency.
arXiv Detail & Related papers (2023-12-01T20:27:42Z)
Foundation Transformers [105.06915886136524]
We call for the development of Foundation Transformer for true general-purpose modeling. In this work, we introduce a Transformer variant, named Magneto, to fulfill the goal.
arXiv Detail & Related papers (2022-10-12T17:16:27Z)
Transformer-F: A Transformer network with effective methods for learning universal sentence representation [8.225067988604351]
The Transformer model is widely used in natural language processing for sentence representation. In this paper, two approaches are introduced to improve the performance of Transformers.
arXiv Detail & Related papers (2021-07-02T03:20:11Z)
Finetuning Pretrained Transformers into RNNs [81.72974646901136]
Transformers have outperformed recurrent neural networks (RNNs) in natural language generation. A linear-complexity recurrent variant has proven well suited for autoregressive generation. This work aims to convert a pretrained transformer into its efficient recurrent counterpart.
arXiv Detail & Related papers (2021-03-24T10:50:43Z)
Bayesian Transformer Language Models for Speech Recognition [59.235405107295655]
State-of-the-art neural language models (LMs) represented by Transformers are highly complex. This paper proposes a full Bayesian learning framework for Transformer LM estimation.
arXiv Detail & Related papers (2021-02-09T10:55:27Z)
Optimizing Transformer for Low-Resource Neural Machine Translation [4.802292434636455]
Language pairs with limited amounts of parallel data, also known as low-resource languages, remain a challenge for neural machine translation. Our experiments on different subsets of the IWSLT14 training data show that the effectiveness of Transformer under low-resource conditions is highly dependent on the hyper- parameter settings. Using an optimized Transformer for low-resource conditions improves the translation quality up to 7.3 BLEU points compared to using the Transformer default settings.
arXiv Detail & Related papers (2020-11-04T13:12:29Z)
Applying the Transformer to Character-level Transduction [68.91664610425114]
The transformer has been shown to outperform recurrent neural network-based sequence-to-sequence models in various word-level NLP tasks. We show that with a large enough batch size, the transformer does indeed outperform recurrent models for character-level tasks.
arXiv Detail & Related papers (2020-05-20T17:25:43Z)
Segatron: Segment-Aware Transformer for Language Modeling and Understanding [79.84562707201323]
We propose a segment-aware Transformer (Segatron) to generate better contextual representations from sequential tokens. We first introduce the segment-aware mechanism to Transformer-XL, which is a popular Transformer-based language model. We find that our method can further improve the Transformer-XL base model and large model, achieving 17.1 perplexity on the WikiText-103 dataset.
arXiv Detail & Related papers (2020-04-30T17:38:27Z)
Transformer on a Diet [81.09119185568296]
Transformer has been widely used thanks to its ability to capture sequence information in an efficient way. Recent developments, such as BERT and GPT-2, deliver only heavy architectures with a focus on effectiveness. We explore three carefully-designed light Transformer architectures to figure out whether the Transformer with less computations could produce competitive results.
arXiv Detail & Related papers (2020-02-14T18:41:58Z)

This list is automatically generated from the titles and abstracts of the papers in this site.