Related papers: The first neural machine translation system for the Erzya language

The first neural machine translation system for the Erzya language

URL: http://arxiv.org/abs/2209.09368v1
Date: Mon, 19 Sep 2022 22:21:37 GMT
Title: The first neural machine translation system for the Erzya language
Authors: David Dale
Abstract summary: We present the first neural machine translation system for translation between the endangered Erzya language and Russian. The BLEU scores are 17 and 19 for translation to Erzya and Russian respectively, and more than half of the translations are rated as acceptable by native speakers. We release the translation models along with the collected text corpus, a new language identification model, and a multilingual sentence encoder adapted for the Erzya language.
Score: 0.0951828574518325
License: http://creativecommons.org/licenses/by-sa/4.0/
Abstract: We present the first neural machine translation system for translation between the endangered Erzya language and Russian and the dataset collected by us to train and evaluate it. The BLEU scores are 17 and 19 for translation to Erzya and Russian respectively, and more than half of the translations are rated as acceptable by native speakers. We also adapt our model to translate between Erzya and 10 other languages, but without additional parallel data, the quality on these directions remains low. We release the translation models along with the collected text corpus, a new language identification model, and a multilingual sentence encoder adapted for the Erzya language. These resources will be available at https://github.com/slone-nlp/myv-nmt.

Related papers

Decoupled Vocabulary Learning Enables Zero-Shot Translation from Unseen Languages [55.157295899188476]
neural machine translation systems learn to map sentences of different languages into a common representation space. In this work, we test this hypothesis by zero-shot translating from unseen languages. We demonstrate that this setup enables zero-shot translation from entirely unseen languages.
arXiv Detail & Related papers (2024-08-05T07:58:58Z)
A Tulu Resource for Machine Translation [3.038642416291856]
We present the first parallel dataset for English-Tulu translation. Tulu is spoken by approximately 2.5 million individuals in southwestern India. Our English-Tulu system, trained without using parallel English-Tulu data, outperforms Google Translate by 19 BLEU points.
arXiv Detail & Related papers (2024-03-28T04:30:07Z)
Improving English to Sinhala Neural Machine Translation using Part-of-Speech Tag [1.1470070927586016]
Most people in Sri Lanka are unable to read and understand English properly. There is a huge requirement of translating English content to local languages to share information among locals.
arXiv Detail & Related papers (2022-02-17T19:45:50Z)
DEEP: DEnoising Entity Pre-training for Neural Machine Translation [123.6686940355937]
It has been shown that machine translation models usually generate poor translations for named entities that are infrequent in the training corpus. We propose DEEP, a DEnoising Entity Pre-training method that leverages large amounts of monolingual data and a knowledge base to improve named entity translation accuracy within sentences.
arXiv Detail & Related papers (2021-11-14T17:28:09Z)
ChrEnTranslate: Cherokee-English Machine Translation Demo with Quality Estimation and Corrective Feedback [70.5469946314539]
ChrEnTranslate is an online machine translation demonstration system for translation between English and an endangered language Cherokee. It supports both statistical and neural translation models as well as provides quality estimation to inform users of reliability.
arXiv Detail & Related papers (2021-07-30T17:58:54Z)
Continual Mixed-Language Pre-Training for Extremely Low-Resource Neural Machine Translation [53.22775597051498]
We present a continual pre-training framework on mBART to effectively adapt it to unseen languages. Results show that our method can consistently improve the fine-tuning performance upon the mBART baseline. Our approach also boosts the performance on translation pairs where both languages are seen in the original mBART's pre-training.
arXiv Detail & Related papers (2021-05-09T14:49:07Z)
Unsupervised Transfer Learning in Multilingual Neural Machine Translation with Cross-Lingual Word Embeddings [72.69253034282035]
We exploit a language independent multilingual sentence representation to easily generalize to a new language. Blindly decoding from Portuguese using a basesystem containing several Romance languages we achieve scores of 36.4 BLEU for Portuguese-English and 12.8 BLEU for Russian-English. We explore a more practical adaptation approach through non-iterative backtranslation, exploiting our model's ability to produce high quality translations.
arXiv Detail & Related papers (2021-03-11T14:22:08Z)
Neural Machine Translation model for University Email Application [1.4731169524644787]
A state-of-the-art Sequence-to-Sequence Neural Network for ML -> EN and EN -> ML translations is compared with Google Translate. The low BLEU score of Google Translation indicates that the application based regional models are better.
arXiv Detail & Related papers (2020-07-20T15:05:16Z)
It's Easier to Translate out of English than into it: Measuring Neural Translation Difficulty by Cross-Mutual Information [90.35685796083563]
Cross-mutual information (XMI) is an asymmetric information-theoretic metric of machine translation difficulty. XMI exploits the probabilistic nature of most neural machine translation models. We present the first systematic and controlled study of cross-lingual translation difficulties using modern neural translation systems.
arXiv Detail & Related papers (2020-05-05T17:38:48Z)
Neural Machine Translation for Low-Resourced Indian Languages [4.726777092009554]
Machine translation is an effective approach to convert text to a different language without any human involvement. In this paper, we have applied NMT on two of the most morphological rich Indian languages, i.e. English-Tamil and English-Malayalam. We proposed a novel NMT model using Multihead self-attention along with pre-trained Byte-Pair-Encoded (BPE) and MultiBPE embeddings to develop an efficient translation system.
arXiv Detail & Related papers (2020-04-19T17:29:34Z)

This list is automatically generated from the titles and abstracts of the papers in this site.