Improving English to Sinhala Neural Machine Translation using
Part-of-Speech Tag
- URL: http://arxiv.org/abs/2202.08882v1
- Date: Thu, 17 Feb 2022 19:45:50 GMT
- Title: Improving English to Sinhala Neural Machine Translation using
Part-of-Speech Tag
- Authors: Ravinga Perera, Thilakshi Fonseka, Rashmini Naranpanawa, Uthayasanker
Thayasivam
- Abstract summary: Most people in Sri Lanka are unable to read and understand English properly.
There is a huge requirement of translating English content to local languages to share information among locals.
- Score: 1.1470070927586016
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The performance of Neural Machine Translation (NMT) depends significantly on
the size of the available parallel corpus. Due to this fact, low resource
language pairs demonstrate low translation performance compared to high
resource language pairs. The translation quality further degrades when NMT is
performed for morphologically rich languages. Even though the web contains a
large amount of information, most people in Sri Lanka are unable to read and
understand English properly. Therefore, there is a huge requirement of
translating English content to local languages to share information among
locals. Sinhala language is the primary language in Sri Lanka and building an
NMT system that can produce quality English to Sinhala translations is
difficult due to the syntactic divergence between these two languages under low
resource constraints. Thus, in this research, we explore effective methods of
incorporating Part of Speech (POS) tags to the Transformer input embedding and
positional encoding to further enhance the performance of the baseline English
to Sinhala neural machine translation model.
Related papers
- Chain-of-Dictionary Prompting Elicits Translation in Large Language Models [100.47154959254937]
Large language models (LLMs) have shown surprisingly good performance in multilingual neural machine translation (MNMT)
We present a novel method, CoD, which augments LLMs with prior knowledge with the chains of multilingual dictionaries for a subset of input words to elicit translation abilities.
arXiv Detail & Related papers (2023-05-11T05:19:47Z) - Extremely low-resource machine translation for closely related languages [0.0]
This work focuses on closely related languages from the Uralic language family: from Estonian and Finnish.
We find that multilingual learning and synthetic corpora increase the translation quality in every language pair.
We show that transfer learning and fine-tuning are very effective for doing low-resource machine translation and achieve the best results.
arXiv Detail & Related papers (2021-05-27T11:27:06Z) - Continual Mixed-Language Pre-Training for Extremely Low-Resource Neural
Machine Translation [53.22775597051498]
We present a continual pre-training framework on mBART to effectively adapt it to unseen languages.
Results show that our method can consistently improve the fine-tuning performance upon the mBART baseline.
Our approach also boosts the performance on translation pairs where both languages are seen in the original mBART's pre-training.
arXiv Detail & Related papers (2021-05-09T14:49:07Z) - Cross-lingual Machine Reading Comprehension with Language Branch
Knowledge Distillation [105.41167108465085]
Cross-lingual Machine Reading (CLMRC) remains a challenging problem due to the lack of large-scale datasets in low-source languages.
We propose a novel augmentation approach named Language Branch Machine Reading (LBMRC)
LBMRC trains multiple machine reading comprehension (MRC) models proficient in individual language.
We devise a multilingual distillation approach to amalgamate knowledge from multiple language branch models to a single model for all target languages.
arXiv Detail & Related papers (2020-10-27T13:12:17Z) - Improving Target-side Lexical Transfer in Multilingual Neural Machine
Translation [104.10726545151043]
multilingual data has been found more beneficial for NMT models that translate from the LRL to a target language than the ones that translate into the LRLs.
Our experiments show that DecSDE leads to consistent gains of up to 1.8 BLEU on translation from English to four different languages.
arXiv Detail & Related papers (2020-10-04T19:42:40Z) - An Augmented Translation Technique for low Resource language pair:
Sanskrit to Hindi translation [0.0]
In this work, Zero Shot Translation (ZST) is inspected for a low resource language pair.
The same architecture is tested for Sanskrit to Hindi translation for which data is sparse.
Dimensionality reduction of word embedding is performed to reduce the memory usage for data storage.
arXiv Detail & Related papers (2020-06-09T17:01:55Z) - HausaMT v1.0: Towards English-Hausa Neural Machine Translation [0.012691047660244334]
We build a baseline model for English-Hausa machine translation.
The Hausa language is the second largest Afro-Asiatic language in the world after Arabic.
arXiv Detail & Related papers (2020-06-09T02:08:03Z) - It's Easier to Translate out of English than into it: Measuring Neural
Translation Difficulty by Cross-Mutual Information [90.35685796083563]
Cross-mutual information (XMI) is an asymmetric information-theoretic metric of machine translation difficulty.
XMI exploits the probabilistic nature of most neural machine translation models.
We present the first systematic and controlled study of cross-lingual translation difficulties using modern neural translation systems.
arXiv Detail & Related papers (2020-05-05T17:38:48Z) - Improving Massively Multilingual Neural Machine Translation and
Zero-Shot Translation [81.7786241489002]
Massively multilingual models for neural machine translation (NMT) are theoretically attractive, but often underperform bilingual models and deliver poor zero-shot translations.
We argue that multilingual NMT requires stronger modeling capacity to support language pairs with varying typological characteristics.
We propose random online backtranslation to enforce the translation of unseen training language pairs.
arXiv Detail & Related papers (2020-04-24T17:21:32Z) - Neural Machine Translation for Low-Resourced Indian Languages [4.726777092009554]
Machine translation is an effective approach to convert text to a different language without any human involvement.
In this paper, we have applied NMT on two of the most morphological rich Indian languages, i.e. English-Tamil and English-Malayalam.
We proposed a novel NMT model using Multihead self-attention along with pre-trained Byte-Pair-Encoded (BPE) and MultiBPE embeddings to develop an efficient translation system.
arXiv Detail & Related papers (2020-04-19T17:29:34Z) - Neural Machine Translation System of Indic Languages -- An Attention
based Approach [0.5139874302398955]
In India, almost all the languages are originated from their ancestral language - Sanskrit.
In this paper, we have presented the neural machine translation system (NMT) that can efficiently translate Indic languages like Hindi and Gujarati.
arXiv Detail & Related papers (2020-02-02T07:15:18Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.