The first neural machine translation system for the Erzya language
- URL: http://arxiv.org/abs/2209.09368v1
- Date: Mon, 19 Sep 2022 22:21:37 GMT
- Title: The first neural machine translation system for the Erzya language
- Authors: David Dale
- Abstract summary: We present the first neural machine translation system for translation between the endangered Erzya language and Russian.
The BLEU scores are 17 and 19 for translation to Erzya and Russian respectively, and more than half of the translations are rated as acceptable by native speakers.
We release the translation models along with the collected text corpus, a new language identification model, and a multilingual sentence encoder adapted for the Erzya language.
- Score: 0.0951828574518325
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: We present the first neural machine translation system for translation
between the endangered Erzya language and Russian and the dataset collected by
us to train and evaluate it. The BLEU scores are 17 and 19 for translation to
Erzya and Russian respectively, and more than half of the translations are
rated as acceptable by native speakers. We also adapt our model to translate
between Erzya and 10 other languages, but without additional parallel data, the
quality on these directions remains low. We release the translation models
along with the collected text corpus, a new language identification model, and
a multilingual sentence encoder adapted for the Erzya language. These resources
will be available at https://github.com/slone-nlp/myv-nmt.
Related papers
- Decoupled Vocabulary Learning Enables Zero-Shot Translation from Unseen Languages [55.157295899188476]
neural machine translation systems learn to map sentences of different languages into a common representation space.
In this work, we test this hypothesis by zero-shot translating from unseen languages.
We demonstrate that this setup enables zero-shot translation from entirely unseen languages.
arXiv Detail & Related papers (2024-08-05T07:58:58Z) - A Tulu Resource for Machine Translation [3.038642416291856]
We present the first parallel dataset for English-Tulu translation.
Tulu is spoken by approximately 2.5 million individuals in southwestern India.
Our English-Tulu system, trained without using parallel English-Tulu data, outperforms Google Translate by 19 BLEU points.
arXiv Detail & Related papers (2024-03-28T04:30:07Z) - Improving English to Sinhala Neural Machine Translation using
Part-of-Speech Tag [1.1470070927586016]
Most people in Sri Lanka are unable to read and understand English properly.
There is a huge requirement of translating English content to local languages to share information among locals.
arXiv Detail & Related papers (2022-02-17T19:45:50Z) - DEEP: DEnoising Entity Pre-training for Neural Machine Translation [123.6686940355937]
It has been shown that machine translation models usually generate poor translations for named entities that are infrequent in the training corpus.
We propose DEEP, a DEnoising Entity Pre-training method that leverages large amounts of monolingual data and a knowledge base to improve named entity translation accuracy within sentences.
arXiv Detail & Related papers (2021-11-14T17:28:09Z) - ChrEnTranslate: Cherokee-English Machine Translation Demo with Quality
Estimation and Corrective Feedback [70.5469946314539]
ChrEnTranslate is an online machine translation demonstration system for translation between English and an endangered language Cherokee.
It supports both statistical and neural translation models as well as provides quality estimation to inform users of reliability.
arXiv Detail & Related papers (2021-07-30T17:58:54Z) - Continual Mixed-Language Pre-Training for Extremely Low-Resource Neural
Machine Translation [53.22775597051498]
We present a continual pre-training framework on mBART to effectively adapt it to unseen languages.
Results show that our method can consistently improve the fine-tuning performance upon the mBART baseline.
Our approach also boosts the performance on translation pairs where both languages are seen in the original mBART's pre-training.
arXiv Detail & Related papers (2021-05-09T14:49:07Z) - Unsupervised Transfer Learning in Multilingual Neural Machine
Translation with Cross-Lingual Word Embeddings [72.69253034282035]
We exploit a language independent multilingual sentence representation to easily generalize to a new language.
Blindly decoding from Portuguese using a basesystem containing several Romance languages we achieve scores of 36.4 BLEU for Portuguese-English and 12.8 BLEU for Russian-English.
We explore a more practical adaptation approach through non-iterative backtranslation, exploiting our model's ability to produce high quality translations.
arXiv Detail & Related papers (2021-03-11T14:22:08Z) - Neural Machine Translation model for University Email Application [1.4731169524644787]
A state-of-the-art Sequence-to-Sequence Neural Network for ML -> EN and EN -> ML translations is compared with Google Translate.
The low BLEU score of Google Translation indicates that the application based regional models are better.
arXiv Detail & Related papers (2020-07-20T15:05:16Z) - It's Easier to Translate out of English than into it: Measuring Neural
Translation Difficulty by Cross-Mutual Information [90.35685796083563]
Cross-mutual information (XMI) is an asymmetric information-theoretic metric of machine translation difficulty.
XMI exploits the probabilistic nature of most neural machine translation models.
We present the first systematic and controlled study of cross-lingual translation difficulties using modern neural translation systems.
arXiv Detail & Related papers (2020-05-05T17:38:48Z) - Neural Machine Translation for Low-Resourced Indian Languages [4.726777092009554]
Machine translation is an effective approach to convert text to a different language without any human involvement.
In this paper, we have applied NMT on two of the most morphological rich Indian languages, i.e. English-Tamil and English-Malayalam.
We proposed a novel NMT model using Multihead self-attention along with pre-trained Byte-Pair-Encoded (BPE) and MultiBPE embeddings to develop an efficient translation system.
arXiv Detail & Related papers (2020-04-19T17:29:34Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.