Neural Machine Translation System of Indic Languages -- An Attention
based Approach
- URL: http://arxiv.org/abs/2002.02758v1
- Date: Sun, 2 Feb 2020 07:15:18 GMT
- Title: Neural Machine Translation System of Indic Languages -- An Attention
based Approach
- Authors: Parth Shah, Vishvajit Bakrola
- Abstract summary: In India, almost all the languages are originated from their ancestral language - Sanskrit.
In this paper, we have presented the neural machine translation system (NMT) that can efficiently translate Indic languages like Hindi and Gujarati.
- Score: 0.5139874302398955
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Neural machine translation (NMT) is a recent and effective technique which
led to remarkable improvements in comparison of conventional machine
translation techniques. Proposed neural machine translation model developed for
the Gujarati language contains encoder-decoder with attention mechanism. In
India, almost all the languages are originated from their ancestral language -
Sanskrit. They are having inevitable similarities including lexical and named
entity similarity. Translating into Indic languages is always be a challenging
task. In this paper, we have presented the neural machine translation system
(NMT) that can efficiently translate Indic languages like Hindi and Gujarati
that together covers more than 58.49 percentage of total speakers in the
country. We have compared the performance of our NMT model with automatic
evaluation matrices such as BLEU, perplexity and TER matrix. The comparison of
our network with Google translate is also presented where it outperformed with
a margin of 6 BLEU score on English-Gujarati translation.
Related papers
- Hindi to English: Transformer-Based Neural Machine Translation [0.0]
We have developed a Machine Translation (NMT) system by training the Transformer model to translate texts from Indian Language Hindi to English.
We implemented back-translation to augment the training data and for creating the vocabulary.
This led us to achieve a state-of-the-art BLEU score of 24.53 on the test set of IIT Bombay English-Hindi Corpus.
arXiv Detail & Related papers (2023-09-23T00:00:09Z) - Machine Translation by Projecting Text into the Same
Phonetic-Orthographic Space Using a Common Encoding [3.0422770070015295]
We propose an approach based on common multilingual Latin-based encodings (WX notation) that take advantage of language similarity.
We verify the proposed approach by demonstrating experiments on similar language pairs.
We also get up to 1 BLEU points improvement on distant and zero-shot language pairs.
arXiv Detail & Related papers (2023-05-21T06:46:33Z) - The Effect of Normalization for Bi-directional Amharic-English Neural
Machine Translation [53.907805815477126]
This paper presents the first relatively large-scale Amharic-English parallel sentence dataset.
We build bi-directional Amharic-English translation models by fine-tuning the existing Facebook M2M100 pre-trained model.
The results show that the normalization of Amharic homophone characters increases the performance of Amharic-English machine translation in both directions.
arXiv Detail & Related papers (2022-10-27T07:18:53Z) - Harnessing Cross-lingual Features to Improve Cognate Detection for
Low-resource Languages [50.82410844837726]
We demonstrate the use of cross-lingual word embeddings for detecting cognates among fourteen Indian languages.
We evaluate our methods to detect cognates on a challenging dataset of twelve Indian languages.
We observe an improvement of up to 18% points, in terms of F-score, for cognate detection.
arXiv Detail & Related papers (2021-12-16T11:17:58Z) - Attention based Sequence to Sequence Learning for Machine Translation of
Low Resourced Indic Languages -- A case of Sanskrit to Hindi [0.0]
The paper shows the construction of Sanskrit to Hindi bilingual parallel corpus with nearly 10K samples and having 178,000 tokens.
The attention mechanism based neural translation has achieved 88% accuracy in human evaluation and a BLEU score of 0.92 on Sanskrit to Hindi translation.
arXiv Detail & Related papers (2021-09-07T04:55:48Z) - ChrEnTranslate: Cherokee-English Machine Translation Demo with Quality
Estimation and Corrective Feedback [70.5469946314539]
ChrEnTranslate is an online machine translation demonstration system for translation between English and an endangered language Cherokee.
It supports both statistical and neural translation models as well as provides quality estimation to inform users of reliability.
arXiv Detail & Related papers (2021-07-30T17:58:54Z) - SJTU-NICT's Supervised and Unsupervised Neural Machine Translation
Systems for the WMT20 News Translation Task [111.91077204077817]
We participated in four translation directions of three language pairs: English-Chinese, English-Polish, and German-Upper Sorbian.
Based on different conditions of language pairs, we have experimented with diverse neural machine translation (NMT) techniques.
In our submissions, the primary systems won the first place on English to Chinese, Polish to English, and German to Upper Sorbian translation directions.
arXiv Detail & Related papers (2020-10-11T00:40:05Z) - It's Easier to Translate out of English than into it: Measuring Neural
Translation Difficulty by Cross-Mutual Information [90.35685796083563]
Cross-mutual information (XMI) is an asymmetric information-theoretic metric of machine translation difficulty.
XMI exploits the probabilistic nature of most neural machine translation models.
We present the first systematic and controlled study of cross-lingual translation difficulties using modern neural translation systems.
arXiv Detail & Related papers (2020-05-05T17:38:48Z) - Neural Machine Translation for Low-Resourced Indian Languages [4.726777092009554]
Machine translation is an effective approach to convert text to a different language without any human involvement.
In this paper, we have applied NMT on two of the most morphological rich Indian languages, i.e. English-Tamil and English-Malayalam.
We proposed a novel NMT model using Multihead self-attention along with pre-trained Byte-Pair-Encoded (BPE) and MultiBPE embeddings to develop an efficient translation system.
arXiv Detail & Related papers (2020-04-19T17:29:34Z) - Neural Machine Translation: Challenges, Progress and Future [62.75523637241876]
Machine translation (MT) is a technique that leverages computers to translate human languages automatically.
neural machine translation (NMT) models direct mapping between source and target languages with deep neural networks.
This article makes a review of NMT framework, discusses the challenges in NMT and introduces some exciting recent progresses.
arXiv Detail & Related papers (2020-04-13T07:53:57Z) - Marathi To English Neural Machine Translation With Near Perfect Corpus
And Transformers [0.0]
Google, Bing, Facebook and Yandex are some of the very few companies which have built translation systems for few of the Indian languages.
In this exercise, we trained and compared variety of Neural Machine Marathi to English Translators trained with BERT-tokenizer.
We achieve better BLEU scores than Google on Tatoeba and Wikimedia open datasets.
arXiv Detail & Related papers (2020-02-26T17:18:49Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.