On Optimal Transformer Depth for Low-Resource Language Translation
- URL: http://arxiv.org/abs/2004.04418v2
- Date: Tue, 14 Apr 2020 19:42:41 GMT
- Title: On Optimal Transformer Depth for Low-Resource Language Translation
- Authors: Elan van Biljon, Arnu Pretorius and Julia Kreutzer
- Abstract summary: We show that transformer models perform well (and often best) at low-to-moderate depth.
We find that the current trend in the field to use very large models is detrimental for low-resource languages.
- Score: 14.879321342968256
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Transformers have shown great promise as an approach to Neural Machine
Translation (NMT) for low-resource languages. However, at the same time,
transformer models remain difficult to optimize and require careful tuning of
hyper-parameters to be useful in this setting. Many NMT toolkits come with a
set of default hyper-parameters, which researchers and practitioners often
adopt for the sake of convenience and avoiding tuning. These configurations,
however, have been optimized for large-scale machine translation data sets with
several millions of parallel sentences for European languages like English and
French. In this work, we find that the current trend in the field to use very
large models is detrimental for low-resource languages, since it makes training
more difficult and hurts overall performance, confirming previous observations.
We see our work as complementary to the Masakhane project ("Masakhane" means
"We Build Together" in isiZulu.) In this spirit, low-resource NMT systems are
now being built by the community who needs them the most. However, many in the
community still have very limited access to the type of computational resources
required for building extremely large models promoted by industrial research.
Therefore, by showing that transformer models perform well (and often best) at
low-to-moderate depth, we hope to convince fellow researchers to devote less
computational resources, as well as time, to exploring overly large models
during the development of these systems.
Related papers
- Quality or Quantity? On Data Scale and Diversity in Adapting Large Language Models for Low-Resource Translation [62.202893186343935]
We explore what it would take to adapt Large Language Models for low-resource languages.
We show that parallel data is critical during both pre-training andSupervised Fine-Tuning (SFT)
Our experiments with three LLMs across two low-resourced language groups reveal consistent trends, underscoring the generalizability of our findings.
arXiv Detail & Related papers (2024-08-23T00:59:38Z) - Low-resource neural machine translation with morphological modeling [3.3721926640077804]
Morphological modeling in neural machine translation (NMT) is a promising approach to achieving open-vocabulary machine translation.
We propose a framework-solution for modeling complex morphology in low-resource settings.
We evaluate our proposed solution on Kinyarwanda - English translation using public-domain parallel text.
arXiv Detail & Related papers (2024-04-03T01:31:41Z) - Pointer-Generator Networks for Low-Resource Machine Translation: Don't Copy That! [13.120825574589437]
We show that Transformer-based neural machine translation (NMT) is very effective in high-resource settings.
We show that the model does not show greater improvements for closely-related vs. more distant language pairs.
Our discussion of the reasons for this behaviour highlights several general challenges for LR NMT.
arXiv Detail & Related papers (2024-03-16T16:17:47Z) - Transformers for Low-Resource Languages:Is F\'eidir Linn! [2.648836772989769]
In general, neural translation models often under perform on language pairs with insufficient training data.
We demonstrate that choosing appropriate parameters leads to considerable performance improvements.
A Transformer optimized model demonstrated a BLEU score improvement of 7.8 points when compared with a baseline RNN model.
arXiv Detail & Related papers (2024-03-04T12:29:59Z) - Enhancing Neural Machine Translation of Low-Resource Languages: Corpus
Development, Human Evaluation and Explainable AI Architectures [0.0]
The Transformer architecture stands out as the gold standard, especially for high-resource language pairs.
The scarcity of parallel datasets for low-resource languages can hinder machine translation development.
This thesis introduces adaptNMT and adaptMLLM, two open-source applications streamlined for the development, fine-tuning, and deployment of neural machine translation models.
arXiv Detail & Related papers (2024-03-03T18:08:30Z) - Simultaneous Machine Translation with Large Language Models [51.470478122113356]
We investigate the possibility of applying Large Language Models to SimulMT tasks.
We conducted experiments using the textttLlama2-7b-chat model on nine different languages from the MUST-C dataset.
The results show that LLM outperforms dedicated MT models in terms of BLEU and LAAL metrics.
arXiv Detail & Related papers (2023-09-13T04:06:47Z) - Improving Multilingual Neural Machine Translation System for Indic
Languages [0.0]
We propose a multilingual neural machine translation (MNMT) system to address the issues related to low-resource language translation.
A state-of-the-art transformer architecture is used to realize the proposed model.
Trials over a good amount of data reveal its superiority over the conventional models.
arXiv Detail & Related papers (2022-09-27T09:51:56Z) - Learning to Generalize to More: Continuous Semantic Augmentation for
Neural Machine Translation [50.54059385277964]
We present a novel data augmentation paradigm termed Continuous Semantic Augmentation (CsaNMT)
CsaNMT augments each training instance with an adjacency region that could cover adequate variants of literal expression under the same meaning.
arXiv Detail & Related papers (2022-04-14T08:16:28Z) - Towards Reinforcement Learning for Pivot-based Neural Machine
Translation with Non-autoregressive Transformer [49.897891031932545]
Pivot-based neural machine translation (NMT) is commonly used in low-resource setups.
We present an end-to-end pivot-based integrated model, enabling training on source-target data.
arXiv Detail & Related papers (2021-09-27T14:49:35Z) - Pre-training Multilingual Neural Machine Translation by Leveraging
Alignment Information [72.2412707779571]
mRASP is an approach to pre-train a universal multilingual neural machine translation model.
We carry out experiments on 42 translation directions across a diverse setting, including low, medium, rich resource, and as well as transferring to exotic language pairs.
arXiv Detail & Related papers (2020-10-07T03:57:54Z) - Improving Massively Multilingual Neural Machine Translation and
Zero-Shot Translation [81.7786241489002]
Massively multilingual models for neural machine translation (NMT) are theoretically attractive, but often underperform bilingual models and deliver poor zero-shot translations.
We argue that multilingual NMT requires stronger modeling capacity to support language pairs with varying typological characteristics.
We propose random online backtranslation to enforce the translation of unseen training language pairs.
arXiv Detail & Related papers (2020-04-24T17:21:32Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.