Related papers: On Optimal Transformer Depth for Low-Resource Language Translation

On Optimal Transformer Depth for Low-Resource Language Translation

URL: http://arxiv.org/abs/2004.04418v2
Date: Tue, 14 Apr 2020 19:42:41 GMT
Title: On Optimal Transformer Depth for Low-Resource Language Translation
Authors: Elan van Biljon, Arnu Pretorius and Julia Kreutzer
Abstract summary: We show that transformer models perform well (and often best) at low-to-moderate depth. We find that the current trend in the field to use very large models is detrimental for low-resource languages.
Score: 14.879321342968256
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Transformers have shown great promise as an approach to Neural Machine Translation (NMT) for low-resource languages. However, at the same time, transformer models remain difficult to optimize and require careful tuning of hyper-parameters to be useful in this setting. Many NMT toolkits come with a set of default hyper-parameters, which researchers and practitioners often adopt for the sake of convenience and avoiding tuning. These configurations, however, have been optimized for large-scale machine translation data sets with several millions of parallel sentences for European languages like English and French. In this work, we find that the current trend in the field to use very large models is detrimental for low-resource languages, since it makes training more difficult and hurts overall performance, confirming previous observations. We see our work as complementary to the Masakhane project ("Masakhane" means "We Build Together" in isiZulu.) In this spirit, low-resource NMT systems are now being built by the community who needs them the most. However, many in the community still have very limited access to the type of computational resources required for building extremely large models promoted by industrial research. Therefore, by showing that transformer models perform well (and often best) at low-to-moderate depth, we hope to convince fellow researchers to devote less computational resources, as well as time, to exploring overly large models during the development of these systems.

Related papers

Bridging the Linguistic Divide: A Survey on Leveraging Large Language Models for Machine Translation [33.08089616645845]
The advent of Large Language Models (LLMs) has significantly reshaped the landscape of machine translation (MT) We analyze techniques such as few-shot prompting, cross-lingual transfer, and parameter-efficient fine-tuning that enable effective adaptation to under-resourced settings. We discuss persistent challenges such as hallucinations, evaluation inconsistencies, and inherited biases while also evaluating emerging LLM-driven metrics for translation quality.
arXiv Detail & Related papers (2025-04-02T17:26:40Z)
Quality or Quantity? On Data Scale and Diversity in Adapting Large Language Models for Low-Resource Translation [62.202893186343935]
We explore what it would take to adapt Large Language Models for low-resource languages. We show that parallel data is critical during both pre-training andSupervised Fine-Tuning (SFT) Our experiments with three LLMs across two low-resourced language groups reveal consistent trends, underscoring the generalizability of our findings.
arXiv Detail & Related papers (2024-08-23T00:59:38Z)
Low-resource neural machine translation with morphological modeling [3.3721926640077804]
Morphological modeling in neural machine translation (NMT) is a promising approach to achieving open-vocabulary machine translation. We propose a framework-solution for modeling complex morphology in low-resource settings. We evaluate our proposed solution on Kinyarwanda - English translation using public-domain parallel text.
arXiv Detail & Related papers (2024-04-03T01:31:41Z)
Pointer-Generator Networks for Low-Resource Machine Translation: Don't Copy That! [13.120825574589437]
We show that Transformer-based neural machine translation (NMT) is very effective in high-resource settings. We show that the model does not show greater improvements for closely-related vs. more distant language pairs. Our discussion of the reasons for this behaviour highlights several general challenges for LR NMT.
arXiv Detail & Related papers (2024-03-16T16:17:47Z)
Transformers for Low-Resource Languages:Is F\'eidir Linn! [2.648836772989769]
In general, neural translation models often under perform on language pairs with insufficient training data. We demonstrate that choosing appropriate parameters leads to considerable performance improvements. A Transformer optimized model demonstrated a BLEU score improvement of 7.8 points when compared with a baseline RNN model.
arXiv Detail & Related papers (2024-03-04T12:29:59Z)
Enhancing Neural Machine Translation of Low-Resource Languages: Corpus Development, Human Evaluation and Explainable AI Architectures [0.0]
The Transformer architecture stands out as the gold standard, especially for high-resource language pairs. The scarcity of parallel datasets for low-resource languages can hinder machine translation development. This thesis introduces adaptNMT and adaptMLLM, two open-source applications streamlined for the development, fine-tuning, and deployment of neural machine translation models.
arXiv Detail & Related papers (2024-03-03T18:08:30Z)
Simultaneous Machine Translation with Large Language Models [51.470478122113356]
We investigate the possibility of applying Large Language Models to SimulMT tasks. We conducted experiments using the textttLlama2-7b-chat model on nine different languages from the MUST-C dataset. The results show that LLM outperforms dedicated MT models in terms of BLEU and LAAL metrics.
arXiv Detail & Related papers (2023-09-13T04:06:47Z)
Improving Multilingual Neural Machine Translation System for Indic Languages [0.0]
We propose a multilingual neural machine translation (MNMT) system to address the issues related to low-resource language translation. A state-of-the-art transformer architecture is used to realize the proposed model. Trials over a good amount of data reveal its superiority over the conventional models.
arXiv Detail & Related papers (2022-09-27T09:51:56Z)
Learning to Generalize to More: Continuous Semantic Augmentation for Neural Machine Translation [50.54059385277964]
We present a novel data augmentation paradigm termed Continuous Semantic Augmentation (CsaNMT) CsaNMT augments each training instance with an adjacency region that could cover adequate variants of literal expression under the same meaning.
arXiv Detail & Related papers (2022-04-14T08:16:28Z)
Towards Reinforcement Learning for Pivot-based Neural Machine Translation with Non-autoregressive Transformer [49.897891031932545]
Pivot-based neural machine translation (NMT) is commonly used in low-resource setups. We present an end-to-end pivot-based integrated model, enabling training on source-target data.
arXiv Detail & Related papers (2021-09-27T14:49:35Z)
Pre-training Multilingual Neural Machine Translation by Leveraging Alignment Information [72.2412707779571]
mRASP is an approach to pre-train a universal multilingual neural machine translation model. We carry out experiments on 42 translation directions across a diverse setting, including low, medium, rich resource, and as well as transferring to exotic language pairs.
arXiv Detail & Related papers (2020-10-07T03:57:54Z)
Improving Massively Multilingual Neural Machine Translation and Zero-Shot Translation [81.7786241489002]
Massively multilingual models for neural machine translation (NMT) are theoretically attractive, but often underperform bilingual models and deliver poor zero-shot translations. We argue that multilingual NMT requires stronger modeling capacity to support language pairs with varying typological characteristics. We propose random online backtranslation to enforce the translation of unseen training language pairs.
arXiv Detail & Related papers (2020-04-24T17:21:32Z)

This list is automatically generated from the titles and abstracts of the papers in this site.