Attention based Sequence to Sequence Learning for Machine Translation of
Low Resourced Indic Languages -- A case of Sanskrit to Hindi
- URL: http://arxiv.org/abs/2110.00435v1
- Date: Tue, 7 Sep 2021 04:55:48 GMT
- Title: Attention based Sequence to Sequence Learning for Machine Translation of
Low Resourced Indic Languages -- A case of Sanskrit to Hindi
- Authors: Vishvajit Bakarola and Jitendra Nasriwala
- Abstract summary: The paper shows the construction of Sanskrit to Hindi bilingual parallel corpus with nearly 10K samples and having 178,000 tokens.
The attention mechanism based neural translation has achieved 88% accuracy in human evaluation and a BLEU score of 0.92 on Sanskrit to Hindi translation.
- Score: 0.0
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Deep Learning techniques are powerful in mimicking humans in a particular set
of problems. They have achieved a remarkable performance in complex learning
tasks. Deep learning inspired Neural Machine Translation (NMT) is a proficient
technique that outperforms traditional machine translation. Performing
machine-aided translation on Indic languages has always been a challenging task
considering their rich and diverse grammar. The neural machine translation has
shown quality results compared to the traditional machine translation
approaches. The fully automatic machine translation becomes problematic when it
comes to low-resourced languages, especially with Sanskrit. This paper presents
attention mechanism based neural machine translation by selectively focusing on
a particular part of language sentences during translation. The work shows the
construction of Sanskrit to Hindi bilingual parallel corpus with nearly 10K
samples and having 178,000 tokens. The neural translation model equipped with
an attention mechanism has been trained on Sanskrit to Hindi parallel corpus.
The approach has shown the significance of attention mechanisms to overcome
long-term dependencies, primarily associated with low resources Indic
languages. The paper shows the attention plots on testing data to demonstrate
the alignment between source and translated words. For the evaluation of the
translated sentences, manual score based human evaluation and automatic
evaluation metric based techniques have been adopted. The attention mechanism
based neural translation has achieved 88% accuracy in human evaluation and a
BLEU score of 0.92 on Sanskrit to Hindi translation.
Related papers
- Crossing the Threshold: Idiomatic Machine Translation through Retrieval
Augmentation and Loss Weighting [66.02718577386426]
We provide a simple characterization of idiomatic translation and related issues.
We conduct a synthetic experiment revealing a tipping point at which transformer-based machine translation models correctly default to idiomatic translations.
To improve translation of natural idioms, we introduce two straightforward yet effective techniques.
arXiv Detail & Related papers (2023-10-10T23:47:25Z) - Hindi to English: Transformer-Based Neural Machine Translation [0.0]
We have developed a Machine Translation (NMT) system by training the Transformer model to translate texts from Indian Language Hindi to English.
We implemented back-translation to augment the training data and for creating the vocabulary.
This led us to achieve a state-of-the-art BLEU score of 24.53 on the test set of IIT Bombay English-Hindi Corpus.
arXiv Detail & Related papers (2023-09-23T00:00:09Z) - On the Copying Problem of Unsupervised NMT: A Training Schedule with a
Language Discriminator Loss [120.19360680963152]
unsupervised neural machine translation (UNMT) has achieved success in many language pairs.
The copying problem, i.e., directly copying some parts of the input sentence as the translation, is common among distant language pairs.
We propose a simple but effective training schedule that incorporates a language discriminator loss.
arXiv Detail & Related papers (2023-05-26T18:14:23Z) - The Best of Both Worlds: Combining Human and Machine Translations for
Multilingual Semantic Parsing with Active Learning [50.320178219081484]
We propose an active learning approach that exploits the strengths of both human and machine translations.
An ideal utterance selection can significantly reduce the error and bias in the translated data.
arXiv Detail & Related papers (2023-05-22T05:57:47Z) - No Language Left Behind: Scaling Human-Centered Machine Translation [69.28110770760506]
We create datasets and models aimed at narrowing the performance gap between low and high-resource languages.
We propose multiple architectural and training improvements to counteract overfitting while training on thousands of tasks.
Our model achieves an improvement of 44% BLEU relative to the previous state-of-the-art.
arXiv Detail & Related papers (2022-07-11T07:33:36Z) - How Robust is Neural Machine Translation to Language Imbalance in
Multilingual Tokenizer Training? [86.48323488619629]
We analyze how translation performance changes as the data ratios among languages vary in the tokenizer training corpus.
We find that while relatively better performance is often observed when languages are more equally sampled, the downstream performance is more robust to language imbalance than we usually expected.
arXiv Detail & Related papers (2022-04-29T17:50:36Z) - Harnessing Cross-lingual Features to Improve Cognate Detection for
Low-resource Languages [50.82410844837726]
We demonstrate the use of cross-lingual word embeddings for detecting cognates among fourteen Indian languages.
We evaluate our methods to detect cognates on a challenging dataset of twelve Indian languages.
We observe an improvement of up to 18% points, in terms of F-score, for cognate detection.
arXiv Detail & Related papers (2021-12-16T11:17:58Z) - Continuous Learning in Neural Machine Translation using Bilingual
Dictionaries [14.058642647656301]
We propose an evaluation framework to assess the ability of neural machine translation to continuously learn new phrases.
By addressing both challenges we are able to improve the ability to translate new, rare words and phrases from 30% to up to 70%.
arXiv Detail & Related papers (2021-02-12T14:46:13Z) - An Augmented Translation Technique for low Resource language pair:
Sanskrit to Hindi translation [0.0]
In this work, Zero Shot Translation (ZST) is inspected for a low resource language pair.
The same architecture is tested for Sanskrit to Hindi translation for which data is sparse.
Dimensionality reduction of word embedding is performed to reduce the memory usage for data storage.
arXiv Detail & Related papers (2020-06-09T17:01:55Z) - Neural Machine Translation for Low-Resourced Indian Languages [4.726777092009554]
Machine translation is an effective approach to convert text to a different language without any human involvement.
In this paper, we have applied NMT on two of the most morphological rich Indian languages, i.e. English-Tamil and English-Malayalam.
We proposed a novel NMT model using Multihead self-attention along with pre-trained Byte-Pair-Encoded (BPE) and MultiBPE embeddings to develop an efficient translation system.
arXiv Detail & Related papers (2020-04-19T17:29:34Z) - Neural Machine Translation System of Indic Languages -- An Attention
based Approach [0.5139874302398955]
In India, almost all the languages are originated from their ancestral language - Sanskrit.
In this paper, we have presented the neural machine translation system (NMT) that can efficiently translate Indic languages like Hindi and Gujarati.
arXiv Detail & Related papers (2020-02-02T07:15:18Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.