Optimizing Deep Transformers for Chinese-Thai Low-Resource Translation
- URL: http://arxiv.org/abs/2212.12662v1
- Date: Sat, 24 Dec 2022 05:35:04 GMT
- Title: Optimizing Deep Transformers for Chinese-Thai Low-Resource Translation
- Authors: Wenjie Hao, Hongfei Xu, Lingling Mu and Hongying Zan
- Abstract summary: We study the use of deep Transformer translation model for the CCMT 2022 Chinese-Thai low-resource machine translation task.
Considering that increasing the number of layers also increases the regularization on new model parameters, we adopt the highest performance setting but increase the depth of the Transformer to 24 layers.
Our work obtains the SOTA performance in the Chinese-to-Thai translation in the constrained evaluation.
- Score: 9.294853905247383
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: In this paper, we study the use of deep Transformer translation model for the
CCMT 2022 Chinese-Thai low-resource machine translation task. We first explore
the experiment settings (including the number of BPE merge operations, dropout
probability, embedding size, etc.) for the low-resource scenario with the
6-layer Transformer. Considering that increasing the number of layers also
increases the regularization on new model parameters (dropout modules are also
introduced when using more layers), we adopt the highest performance setting
but increase the depth of the Transformer to 24 layers to obtain improved
translation quality. Our work obtains the SOTA performance in the
Chinese-to-Thai translation in the constrained evaluation.
Related papers
- HW-TSC's Submission to the CCMT 2024 Machine Translation Tasks [12.841065384808733]
We participate in the bilingual machine translation task and multi-domain machine translation task.
For these two translation tasks, we use training strategies such as regularized dropout, bidirectional training, data diversification, forward translation, back translation, alternated training, curriculum learning, and transductive ensemble learning.
arXiv Detail & Related papers (2024-09-23T09:20:19Z) - Quick Back-Translation for Unsupervised Machine Translation [9.51657235413336]
We propose a two-for-one improvement to Transformer back-translation: Quick Back-Translation (QBT)
QBT re-purposes the encoder as a generative model, and uses encoder-generated sequences to train the decoder.
Experiments on various WMT benchmarks demonstrate that QBT dramatically outperforms standard back-translation only method in terms of training efficiency.
arXiv Detail & Related papers (2023-12-01T20:27:42Z) - Enhanced Transformer Architecture for Natural Language Processing [2.6071653283020915]
Transformer is a state-of-the-art model in the field of natural language processing (NLP)
In this paper, a novel structure of Transformer is proposed. It is featured by full layer normalization, weighted residual connection, positional encoding exploiting reinforcement learning, and zero masked self-attention.
The proposed Transformer model, which is called Enhanced Transformer, is validated by the bilingual evaluation understudy (BLEU) score obtained with the Multi30k translation dataset.
arXiv Detail & Related papers (2023-10-17T01:59:07Z) - Modeling Context With Linear Attention for Scalable Document-Level
Translation [72.41955536834702]
We investigate the efficacy of a recent linear attention model on document translation and augment it with a sentential gate to promote a recency inductive bias.
We show that sentential gating further improves translation quality on IWSLT.
arXiv Detail & Related papers (2022-10-16T03:41:50Z) - DeepNet: Scaling Transformers to 1,000 Layers [106.33669415337135]
We introduce a new normalization function (DeepNorm) to modify the residual connection in Transformer.
In-depth theoretical analysis shows that model updates can be bounded in a stable way.
We successfully scale Transformers up to 1,000 layers without difficulty, which is one order of magnitude deeper than previous deep Transformers.
arXiv Detail & Related papers (2022-03-01T15:36:38Z) - Optimizing Transformer for Low-Resource Neural Machine Translation [4.802292434636455]
Language pairs with limited amounts of parallel data, also known as low-resource languages, remain a challenge for neural machine translation.
Our experiments on different subsets of the IWSLT14 training data show that the effectiveness of Transformer under low-resource conditions is highly dependent on the hyper- parameter settings.
Using an optimized Transformer for low-resource conditions improves the translation quality up to 7.3 BLEU points compared to using the Transformer default settings.
arXiv Detail & Related papers (2020-11-04T13:12:29Z) - Multi-Unit Transformers for Neural Machine Translation [51.418245676894465]
We propose the Multi-Unit Transformers (MUTE) to promote the expressiveness of the Transformer.
Specifically, we use several parallel units and show that modeling with multiple units improves model performance and introduces diversity.
arXiv Detail & Related papers (2020-10-21T03:41:49Z) - Deep Transformers with Latent Depth [42.33955275626127]
The Transformer model has achieved state-of-the-art performance in many sequence modeling tasks.
We present a probabilistic framework to automatically learn which layer(s) to use by learning the posterior distributions of layer selection.
We propose a novel method to train one shared Transformer network for multilingual machine translation.
arXiv Detail & Related papers (2020-09-28T07:13:23Z) - Rewiring the Transformer with Depth-Wise LSTMs [55.50278212605607]
We present a Transformer with depth-wise LSTMs connecting cascading Transformer layers and sub-layers.
Experiments with the 6-layer Transformer show significant BLEU improvements in both WMT 14 English-German / French tasks and the OPUS-100 many-to-many multilingual NMT task.
arXiv Detail & Related papers (2020-07-13T09:19:34Z) - Applying the Transformer to Character-level Transduction [68.91664610425114]
The transformer has been shown to outperform recurrent neural network-based sequence-to-sequence models in various word-level NLP tasks.
We show that with a large enough batch size, the transformer does indeed outperform recurrent models for character-level tasks.
arXiv Detail & Related papers (2020-05-20T17:25:43Z) - Explicit Reordering for Neural Machine Translation [50.70683739103066]
In Transformer-based neural machine translation (NMT), the positional encoding mechanism helps the self-attention networks to learn the source representation with order dependency.
We propose a novel reordering method to explicitly model this reordering information for the Transformer-based NMT.
The empirical results on the WMT14 English-to-German, WAT ASPEC Japanese-to-English, and WMT17 Chinese-to-English translation tasks show the effectiveness of the proposed approach.
arXiv Detail & Related papers (2020-04-08T05:28:46Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.