The Effect of Normalization for Bi-directional Amharic-English Neural
Machine Translation
- URL: http://arxiv.org/abs/2210.15224v1
- Date: Thu, 27 Oct 2022 07:18:53 GMT
- Title: The Effect of Normalization for Bi-directional Amharic-English Neural
Machine Translation
- Authors: Tadesse Destaw Belay, Atnafu Lambebo Tonja, Olga Kolesnikova, Seid
Muhie Yimam, Abinew Ali Ayele, Silesh Bogale Haile, Grigori Sidorov,
Alexander Gelbukh
- Abstract summary: This paper presents the first relatively large-scale Amharic-English parallel sentence dataset.
We build bi-directional Amharic-English translation models by fine-tuning the existing Facebook M2M100 pre-trained model.
The results show that the normalization of Amharic homophone characters increases the performance of Amharic-English machine translation in both directions.
- Score: 53.907805815477126
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Machine translation (MT) is one of the main tasks in natural language
processing whose objective is to translate texts automatically from one natural
language to another. Nowadays, using deep neural networks for MT tasks has
received great attention. These networks require lots of data to learn abstract
representations of the input and store it in continuous vectors. This paper
presents the first relatively large-scale Amharic-English parallel sentence
dataset. Using these compiled data, we build bi-directional Amharic-English
translation models by fine-tuning the existing Facebook M2M100 pre-trained
model achieving a BLEU score of 37.79 in Amharic-English 32.74 in
English-Amharic translation. Additionally, we explore the effects of Amharic
homophone normalization on the machine translation task. The results show that
the normalization of Amharic homophone characters increases the performance of
Amharic-English machine translation in both directions.
Related papers
- The Best of Both Worlds: Combining Human and Machine Translations for
Multilingual Semantic Parsing with Active Learning [50.320178219081484]
We propose an active learning approach that exploits the strengths of both human and machine translations.
An ideal utterance selection can significantly reduce the error and bias in the translated data.
arXiv Detail & Related papers (2023-05-22T05:57:47Z) - End-to-End Speech Translation of Arabic to English Broadcast News [2.375764121997739]
Speech translation (ST) is the task of translating acoustic speech signals in a source language into text in a foreign language.
This paper presents our efforts towards the development of the first Broadcast News end-to-end Arabic to English speech translation system.
arXiv Detail & Related papers (2022-12-11T11:35:46Z) - DivEMT: Neural Machine Translation Post-Editing Effort Across
Typologically Diverse Languages [5.367993194110256]
DivEMT is the first publicly available post-editing study of Neural Machine Translation (NMT) over a typologically diverse set of target languages.
We assess the impact on translation productivity of two state-of-the-art NMT systems, namely: Google Translate and the open-source multilingual model mBART50.
arXiv Detail & Related papers (2022-05-24T17:22:52Z) - DEEP: DEnoising Entity Pre-training for Neural Machine Translation [123.6686940355937]
It has been shown that machine translation models usually generate poor translations for named entities that are infrequent in the training corpus.
We propose DEEP, a DEnoising Entity Pre-training method that leverages large amounts of monolingual data and a knowledge base to improve named entity translation accuracy within sentences.
arXiv Detail & Related papers (2021-11-14T17:28:09Z) - Improving Multilingual Translation by Representation and Gradient
Regularization [82.42760103045083]
We propose a joint approach to regularize NMT models at both representation-level and gradient-level.
Our results demonstrate that our approach is highly effective in both reducing off-target translation occurrences and improving zero-shot translation performance.
arXiv Detail & Related papers (2021-09-10T10:52:21Z) - On the Language Coverage Bias for Neural Machine Translation [81.81456880770762]
Language coverage bias is important for neural machine translation (NMT) because the target-original training data is not well exploited in current practice.
By carefully designing experiments, we provide comprehensive analyses of the language coverage bias in the training data.
We propose two simple and effective approaches to alleviate the language coverage bias problem.
arXiv Detail & Related papers (2021-06-07T01:55:34Z) - "Wikily" Neural Machine Translation Tailored to Cross-Lingual Tasks [20.837515947519524]
First sentences and titles of linked Wikipedia pages, as well as cross-lingual image captions, are strong signals for a seed parallel data to extract bilingual dictionaries and cross-lingual word embeddings for mining parallel text from Wikipedia.
In image captioning, we train a multi-tasking machine translation and image captioning pipeline for Arabic and English from which the Arabic training data is a wikily translation of the English captioning data.
Our captioning results in Arabic are slightly better than that of its supervised model.
arXiv Detail & Related papers (2021-04-16T21:49:12Z) - Beyond English-Centric Multilingual Machine Translation [74.21727842163068]
We create a true Many-to-Many multilingual translation model that can translate directly between any pair of 100 languages.
We build and open source a training dataset that covers thousands of language directions with supervised data, created through large-scale mining.
Our focus on non-English-Centric models brings gains of more than 10 BLEU when directly translating between non-English directions while performing competitively to the best single systems of WMT.
arXiv Detail & Related papers (2020-10-21T17:01:23Z) - HausaMT v1.0: Towards English-Hausa Neural Machine Translation [0.012691047660244334]
We build a baseline model for English-Hausa machine translation.
The Hausa language is the second largest Afro-Asiatic language in the world after Arabic.
arXiv Detail & Related papers (2020-06-09T02:08:03Z) - Neural Machine Translation for Low-Resourced Indian Languages [4.726777092009554]
Machine translation is an effective approach to convert text to a different language without any human involvement.
In this paper, we have applied NMT on two of the most morphological rich Indian languages, i.e. English-Tamil and English-Malayalam.
We proposed a novel NMT model using Multihead self-attention along with pre-trained Byte-Pair-Encoded (BPE) and MultiBPE embeddings to develop an efficient translation system.
arXiv Detail & Related papers (2020-04-19T17:29:34Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.