Urdu-English Machine Transliteration using Neural Networks
- URL: http://arxiv.org/abs/2001.05296v1
- Date: Sun, 12 Jan 2020 17:30:42 GMT
- Title: Urdu-English Machine Transliteration using Neural Networks
- Authors: Usman Mohy ud Din
- Abstract summary: We present transliteration technique based on Expectation Maximization (EM) which is un-supervised and language independent.
System learns the pattern and out-of-vocabulary words from parallel corpus and there is no need to train it on transliteration corpus explicitly.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Machine translation has gained much attention in recent years. It is a
sub-field of computational linguistic which focus on translating text from one
language to other language. Among different translation techniques, neural
network currently leading the domain with its capabilities of providing a
single large neural network with attention mechanism, sequence-to-sequence and
long-short term modelling. Despite significant progress in domain of machine
translation, translation of out-of-vocabulary words(OOV) which include
technical terms, named-entities, foreign words are still a challenge for
current state-of-art translation systems, and this situation becomes even worse
while translating between low resource languages or languages having different
structures. Due to morphological richness of a language, a word may have
different meninges in different context. In such scenarios, translation of word
is not only enough in order provide the correct/quality translation.
Transliteration is a way to consider the context of word/sentence during
translation. For low resource language like Urdu, it is very difficult to
have/find parallel corpus for transliteration which is large enough to train
the system. In this work, we presented transliteration technique based on
Expectation Maximization (EM) which is un-supervised and language independent.
Systems learns the pattern and out-of-vocabulary (OOV) words from parallel
corpus and there is no need to train it on transliteration corpus explicitly.
This approach is tested on three models of statistical machine translation
(SMT) which include phrasebased, hierarchical phrase-based and factor based
models and two models of neural machine translation which include LSTM and
transformer model.
Related papers
- Extending Multilingual Machine Translation through Imitation Learning [60.15671816513614]
Imit-MNMT treats the task as an imitation learning process, which mimicks the behavior of an expert.
We show that our approach significantly improves the translation performance between the new and the original languages.
We also demonstrate that our approach is capable of solving copy and off-target problems.
arXiv Detail & Related papers (2023-11-14T21:04:03Z) - Hindi to English: Transformer-Based Neural Machine Translation [0.0]
We have developed a Machine Translation (NMT) system by training the Transformer model to translate texts from Indian Language Hindi to English.
We implemented back-translation to augment the training data and for creating the vocabulary.
This led us to achieve a state-of-the-art BLEU score of 24.53 on the test set of IIT Bombay English-Hindi Corpus.
arXiv Detail & Related papers (2023-09-23T00:00:09Z) - Exploring Linguistic Similarity and Zero-Shot Learning for Multilingual
Translation of Dravidian Languages [0.34998703934432673]
We build a single-decoder neural machine translation system for Dravidian-Dravidian multilingual translation.
Our model achieves scores within 3 BLEU of large-scale pivot-based models when it is trained on 50% of the language directions.
arXiv Detail & Related papers (2023-08-10T13:38:09Z) - Decomposed Prompting for Machine Translation Between Related Languages
using Large Language Models [55.35106713257871]
We introduce DecoMT, a novel approach of few-shot prompting that decomposes the translation process into a sequence of word chunk translations.
We show that DecoMT outperforms the strong few-shot prompting BLOOM model with an average improvement of 8 chrF++ scores across the examined languages.
arXiv Detail & Related papers (2023-05-22T14:52:47Z) - Modeling Target-Side Morphology in Neural Machine Translation: A
Comparison of Strategies [72.56158036639707]
Morphologically rich languages pose difficulties to machine translation.
A large amount of differently inflected word surface forms entails a larger vocabulary.
Some inflected forms of infrequent terms typically do not appear in the training corpus.
Linguistic agreement requires the system to correctly match the grammatical categories between inflected word forms in the output sentence.
arXiv Detail & Related papers (2022-03-25T10:13:20Z) - DEEP: DEnoising Entity Pre-training for Neural Machine Translation [123.6686940355937]
It has been shown that machine translation models usually generate poor translations for named entities that are infrequent in the training corpus.
We propose DEEP, a DEnoising Entity Pre-training method that leverages large amounts of monolingual data and a knowledge base to improve named entity translation accuracy within sentences.
arXiv Detail & Related papers (2021-11-14T17:28:09Z) - Language Modeling, Lexical Translation, Reordering: The Training Process
of NMT through the Lens of Classical SMT [64.1841519527504]
neural machine translation uses a single neural network to model the entire translation process.
Despite neural machine translation being de-facto standard, it is still not clear how NMT models acquire different competences over the course of training.
arXiv Detail & Related papers (2021-09-03T09:38:50Z) - Extended Parallel Corpus for Amharic-English Machine Translation [0.0]
It will be useful for machine translation of an under-resourced language, Amharic.
We trained neural machine translation and phrase-based statistical machine translation models using the corpus.
arXiv Detail & Related papers (2021-04-08T06:51:08Z) - Bootstrapping a Crosslingual Semantic Parser [74.99223099702157]
We adapt a semantic trained on a single language, such as English, to new languages and multiple domains with minimal annotation.
We query if machine translation is an adequate substitute for training data, and extend this to investigate bootstrapping using joint training with English, paraphrasing, and multilingual pre-trained models.
arXiv Detail & Related papers (2020-04-06T12:05:02Z) - Morphological Word Segmentation on Agglutinative Languages for Neural
Machine Translation [8.87546236839959]
We propose a morphological word segmentation method on the source-side for Neural machine translation (NMT)
It incorporates morphology knowledge to preserve the linguistic and semantic information in the word structure while reducing the vocabulary size at training time.
It can be utilized as a preprocessing tool to segment the words in agglutinative languages for other natural language processing (NLP) tasks.
arXiv Detail & Related papers (2020-01-02T10:05:02Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.