Neural Machine Translation for Low-Resource Languages: A Survey
- URL: http://arxiv.org/abs/2106.15115v1
- Date: Tue, 29 Jun 2021 06:31:58 GMT
- Title: Neural Machine Translation for Low-Resource Languages: A Survey
- Authors: Surangika Ranathunga, En-Shiun Annie Lee, Marjana Prifti Skenduli,
Ravi Shekhar, Mehreen Alam and Rishemjit Kaur
- Abstract summary: This paper presents a detailed survey of research advancements in low-resource language NMT (LRL-NMT)
It provides guidelines to select the possible NMT technique for a given LRL data setting.
It also provides a list of recommendations to further enhance the research efforts on LRL-NMT.
- Score: 2.3394850341375615
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Neural Machine Translation (NMT) has seen a tremendous spurt of growth in
less than ten years, and has already entered a mature phase. While considered
as the most widely used solution for Machine Translation, its performance on
low-resource language pairs still remains sub-optimal compared to the
high-resource counterparts, due to the unavailability of large parallel
corpora. Therefore, the implementation of NMT techniques for low-resource
language pairs has been receiving the spotlight in the recent NMT research
arena, thus leading to a substantial amount of research reported on this topic.
This paper presents a detailed survey of research advancements in low-resource
language NMT (LRL-NMT), along with a quantitative analysis aimed at identifying
the most popular solutions. Based on our findings from reviewing previous work,
this survey paper provides a set of guidelines to select the possible NMT
technique for a given LRL data setting. It also presents a holistic view of the
LRL-NMT research landscape and provides a list of recommendations to further
enhance the research efforts on LRL-NMT.
Related papers
- Quality or Quantity? On Data Scale and Diversity in Adapting Large Language Models for Low-Resource Translation [62.202893186343935]
We explore what it would take to adapt Large Language Models for low-resource languages.
We show that parallel data is critical during both pre-training andSupervised Fine-Tuning (SFT)
Our experiments with three LLMs across two low-resourced language groups reveal consistent trends, underscoring the generalizability of our findings.
arXiv Detail & Related papers (2024-08-23T00:59:38Z) - Importance-Aware Data Augmentation for Document-Level Neural Machine
Translation [51.74178767827934]
Document-level neural machine translation (DocNMT) aims to generate translations that are both coherent and cohesive.
Due to its longer input length and limited availability of training data, DocNMT often faces the challenge of data sparsity.
We propose a novel Importance-Aware Data Augmentation (IADA) algorithm for DocNMT that augments the training data based on token importance information estimated by the norm of hidden states and training gradients.
arXiv Detail & Related papers (2024-01-27T09:27:47Z) - An Empirical study of Unsupervised Neural Machine Translation: analyzing
NMT output, model's behavior and sentences' contribution [5.691028372215281]
Unsupervised Neural Machine Translation (UNMT) focuses on improving NMT results under the assumption there is no human translated parallel data.
We focus on three very diverse languages, French, Gujarati, and Kazakh, and train bilingual NMT models, to and from English, with various levels of supervision.
arXiv Detail & Related papers (2023-12-19T20:35:08Z) - Efficient Methods for Natural Language Processing: A Survey [76.34572727185896]
This survey synthesizes and relates current methods and findings in efficient NLP.
We aim to provide both guidance for conducting NLP under limited resources, and point towards promising research directions for developing more efficient methods.
arXiv Detail & Related papers (2022-08-31T20:32:35Z) - Information Extraction in Low-Resource Scenarios: Survey and Perspective [56.5556523013924]
Information Extraction seeks to derive structured information from unstructured texts.
This paper presents a review of neural approaches to low-resource IE from emphtraditional and emphLLM-based perspectives.
arXiv Detail & Related papers (2022-02-16T13:44:00Z) - A Survey on Low-Resource Neural Machine Translation [106.51056217748388]
We classify related works into three categories according to the auxiliary data they used.
We hope that our survey can help researchers to better understand this field and inspire them to design better algorithms.
arXiv Detail & Related papers (2021-07-09T06:26:38Z) - Dual Past and Future for Neural Machine Translation [51.418245676894465]
We present a novel dual framework that leverages both source-to-target and target-to-source NMT models to provide a more direct and accurate supervision signal for the Past and Future modules.
Experimental results demonstrate that our proposed method significantly improves the adequacy of NMT predictions and surpasses previous methods in two well-studied translation tasks.
arXiv Detail & Related papers (2020-07-15T14:52:24Z) - Low Resource Neural Machine Translation: A Benchmark for Five African
Languages [14.97774471012222]
We benchmark NMT between English and five African LRL pairs (Swahili, Amharic, Tigrigna, Oromo, Somali)
We compare a baseline single language pair NMT model against semi-supervised learning, transfer learning, and multilingual modeling.
In terms of averaged BLEU score, the multilingual approach shows the largest gains, up to +5 points, in six out of ten translation directions.
arXiv Detail & Related papers (2020-03-31T17:50:07Z) - A Comprehensive Survey of Multilingual Neural Machine Translation [22.96845346423759]
We present a survey on multilingual neural machine translation (MNMT)
MNMT is more promising than its statistical machine translation counterpart because end-to-end modeling and distributed representations open new avenues for research on machine translation.
We first categorize various approaches based on their central use-case and then further categorize them based on resource scenarios, underlying modeling principles, core-issues and challenges.
arXiv Detail & Related papers (2020-01-04T19:38:00Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.