Quick Back-Translation for Unsupervised Machine Translation
- URL: http://arxiv.org/abs/2312.00912v1
- Date: Fri, 1 Dec 2023 20:27:42 GMT
- Title: Quick Back-Translation for Unsupervised Machine Translation
- Authors: Benjamin Brimacombe, Jiawei Zhou
- Abstract summary: We propose a two-for-one improvement to Transformer back-translation: Quick Back-Translation (QBT)
QBT re-purposes the encoder as a generative model, and uses encoder-generated sequences to train the decoder.
Experiments on various WMT benchmarks demonstrate that QBT dramatically outperforms standard back-translation only method in terms of training efficiency.
- Score: 9.51657235413336
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The field of unsupervised machine translation has seen significant
advancement from the marriage of the Transformer and the back-translation
algorithm. The Transformer is a powerful generative model, and back-translation
leverages Transformer's high-quality translations for iterative
self-improvement. However, the Transformer is encumbered by the run-time of
autoregressive inference during back-translation, and back-translation is
limited by a lack of synthetic data efficiency. We propose a two-for-one
improvement to Transformer back-translation: Quick Back-Translation (QBT). QBT
re-purposes the encoder as a generative model, and uses encoder-generated
sequences to train the decoder in conjunction with the original autoregressive
back-translation step, improving data throughput and utilization. Experiments
on various WMT benchmarks demonstrate that a relatively small number of
refining steps of QBT improve current unsupervised machine translation models,
and that QBT dramatically outperforms standard back-translation only method in
terms of training efficiency for comparable translation qualities.
Related papers
- Enhanced Transformer Architecture for Natural Language Processing [2.6071653283020915]
Transformer is a state-of-the-art model in the field of natural language processing (NLP)
In this paper, a novel structure of Transformer is proposed. It is featured by full layer normalization, weighted residual connection, positional encoding exploiting reinforcement learning, and zero masked self-attention.
The proposed Transformer model, which is called Enhanced Transformer, is validated by the bilingual evaluation understudy (BLEU) score obtained with the Multi30k translation dataset.
arXiv Detail & Related papers (2023-10-17T01:59:07Z) - Optimizing Non-Autoregressive Transformers with Contrastive Learning [74.46714706658517]
Non-autoregressive Transformers (NATs) reduce the inference latency of Autoregressive Transformers (ATs) by predicting words all at once rather than in sequential order.
In this paper, we propose to ease the difficulty of modality learning via sampling from the model distribution instead of the data distribution.
arXiv Detail & Related papers (2023-05-23T04:20:13Z) - Viterbi Decoding of Directed Acyclic Transformer for Non-Autoregressive
Machine Translation [13.474844448367367]
Non-autoregressive models achieve significant decoding speedup in neural machine translation but lack the ability to capture sequential dependency.
We present a Viterbi decoding framework for DA-Transformer, which guarantees to find the joint optimal solution for the translation and decoding path under any length constraint.
arXiv Detail & Related papers (2022-10-11T06:53:34Z) - GTrans: Grouping and Fusing Transformer Layers for Neural Machine
Translation [107.2752114891855]
Transformer structure, stacked by a sequence of encoder and decoder network layers, achieves significant development in neural machine translation.
We propose the Group-Transformer model (GTrans) that flexibly divides multi-layer representations of both encoder and decoder into different groups and then fuses these group features to generate target words.
arXiv Detail & Related papers (2022-07-29T04:10:36Z) - Directed Acyclic Transformer for Non-Autoregressive Machine Translation [93.31114105366461]
Directed Acyclic Transfomer (DA-Transformer) represents hidden states in a Directed Acyclic Graph (DAG)
DA-Transformer substantially outperforms previous NATs by about 3 BLEU on average.
arXiv Detail & Related papers (2022-05-16T06:02:29Z) - Optimizing Transformer for Low-Resource Neural Machine Translation [4.802292434636455]
Language pairs with limited amounts of parallel data, also known as low-resource languages, remain a challenge for neural machine translation.
Our experiments on different subsets of the IWSLT14 training data show that the effectiveness of Transformer under low-resource conditions is highly dependent on the hyper- parameter settings.
Using an optimized Transformer for low-resource conditions improves the translation quality up to 7.3 BLEU points compared to using the Transformer default settings.
arXiv Detail & Related papers (2020-11-04T13:12:29Z) - Glancing Transformer for Non-Autoregressive Neural Machine Translation [58.87258329683682]
We propose a method to learn word interdependency for single-pass parallel generation models.
With only single-pass parallel decoding, GLAT is able to generate high-quality translation with 8-15 times speedup.
arXiv Detail & Related papers (2020-08-18T13:04:03Z) - Applying the Transformer to Character-level Transduction [68.91664610425114]
The transformer has been shown to outperform recurrent neural network-based sequence-to-sequence models in various word-level NLP tasks.
We show that with a large enough batch size, the transformer does indeed outperform recurrent models for character-level tasks.
arXiv Detail & Related papers (2020-05-20T17:25:43Z) - Explicit Reordering for Neural Machine Translation [50.70683739103066]
In Transformer-based neural machine translation (NMT), the positional encoding mechanism helps the self-attention networks to learn the source representation with order dependency.
We propose a novel reordering method to explicitly model this reordering information for the Transformer-based NMT.
The empirical results on the WMT14 English-to-German, WAT ASPEC Japanese-to-English, and WMT17 Chinese-to-English translation tasks show the effectiveness of the proposed approach.
arXiv Detail & Related papers (2020-04-08T05:28:46Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.