HausaMT v1.0: Towards English-Hausa Neural Machine Translation
- URL: http://arxiv.org/abs/2006.05014v2
- Date: Sun, 14 Jun 2020 04:35:37 GMT
- Title: HausaMT v1.0: Towards English-Hausa Neural Machine Translation
- Authors: Adewale Akinfaderin
- Abstract summary: We build a baseline model for English-Hausa machine translation.
The Hausa language is the second largest Afro-Asiatic language in the world after Arabic.
- Score: 0.012691047660244334
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Neural Machine Translation (NMT) for low-resource languages suffers from low
performance because of the lack of large amounts of parallel data and language
diversity. To contribute to ameliorating this problem, we built a baseline
model for English-Hausa machine translation, which is considered a task for
low-resource language. The Hausa language is the second largest Afro-Asiatic
language in the world after Arabic and it is the third largest language for
trading across a larger swath of West Africa countries, after English and
French. In this paper, we curated different datasets containing Hausa-English
parallel corpus for our translation. We trained baseline models and evaluated
the performance of our models using the Recurrent and Transformer
encoder-decoder architecture with two tokenization approaches: standard
word-level tokenization and Byte Pair Encoding (BPE) subword tokenization.
Related papers
- Mitigating Data Imbalance and Representation Degeneration in
Multilingual Machine Translation [103.90963418039473]
Bi-ACL is a framework that uses only target-side monolingual data and a bilingual dictionary to improve the performance of the MNMT model.
We show that Bi-ACL is more effective both in long-tail languages and in high-resource languages.
arXiv Detail & Related papers (2023-05-22T07:31:08Z) - The Effect of Normalization for Bi-directional Amharic-English Neural
Machine Translation [53.907805815477126]
This paper presents the first relatively large-scale Amharic-English parallel sentence dataset.
We build bi-directional Amharic-English translation models by fine-tuning the existing Facebook M2M100 pre-trained model.
The results show that the normalization of Amharic homophone characters increases the performance of Amharic-English machine translation in both directions.
arXiv Detail & Related papers (2022-10-27T07:18:53Z) - Tencent's Multilingual Machine Translation System for WMT22 Large-Scale
African Languages [47.06332023467713]
This paper describes Tencent's multilingual machine translation systems for the WMT22 shared task on Large-Scale Machine Translation Evaluation for African Languages.
We adopt data augmentation, distributionally robust optimization, and language family grouping, respectively, to develop our multilingual neural machine translation (MNMT) models.
arXiv Detail & Related papers (2022-10-18T07:22:29Z) - Improving Multilingual Neural Machine Translation System for Indic
Languages [0.0]
We propose a multilingual neural machine translation (MNMT) system to address the issues related to low-resource language translation.
A state-of-the-art transformer architecture is used to realize the proposed model.
Trials over a good amount of data reveal its superiority over the conventional models.
arXiv Detail & Related papers (2022-09-27T09:51:56Z) - Improving English to Sinhala Neural Machine Translation using
Part-of-Speech Tag [1.1470070927586016]
Most people in Sri Lanka are unable to read and understand English properly.
There is a huge requirement of translating English content to local languages to share information among locals.
arXiv Detail & Related papers (2022-02-17T19:45:50Z) - Continual Mixed-Language Pre-Training for Extremely Low-Resource Neural
Machine Translation [53.22775597051498]
We present a continual pre-training framework on mBART to effectively adapt it to unseen languages.
Results show that our method can consistently improve the fine-tuning performance upon the mBART baseline.
Our approach also boosts the performance on translation pairs where both languages are seen in the original mBART's pre-training.
arXiv Detail & Related papers (2021-05-09T14:49:07Z) - Unsupervised Transfer Learning in Multilingual Neural Machine
Translation with Cross-Lingual Word Embeddings [72.69253034282035]
We exploit a language independent multilingual sentence representation to easily generalize to a new language.
Blindly decoding from Portuguese using a basesystem containing several Romance languages we achieve scores of 36.4 BLEU for Portuguese-English and 12.8 BLEU for Russian-English.
We explore a more practical adaptation approach through non-iterative backtranslation, exploiting our model's ability to produce high quality translations.
arXiv Detail & Related papers (2021-03-11T14:22:08Z) - Cross-lingual Machine Reading Comprehension with Language Branch
Knowledge Distillation [105.41167108465085]
Cross-lingual Machine Reading (CLMRC) remains a challenging problem due to the lack of large-scale datasets in low-source languages.
We propose a novel augmentation approach named Language Branch Machine Reading (LBMRC)
LBMRC trains multiple machine reading comprehension (MRC) models proficient in individual language.
We devise a multilingual distillation approach to amalgamate knowledge from multiple language branch models to a single model for all target languages.
arXiv Detail & Related papers (2020-10-27T13:12:17Z) - Beyond English-Centric Multilingual Machine Translation [74.21727842163068]
We create a true Many-to-Many multilingual translation model that can translate directly between any pair of 100 languages.
We build and open source a training dataset that covers thousands of language directions with supervised data, created through large-scale mining.
Our focus on non-English-Centric models brings gains of more than 10 BLEU when directly translating between non-English directions while performing competitively to the best single systems of WMT.
arXiv Detail & Related papers (2020-10-21T17:01:23Z) - Neural Machine Translation for Low-Resourced Indian Languages [4.726777092009554]
Machine translation is an effective approach to convert text to a different language without any human involvement.
In this paper, we have applied NMT on two of the most morphological rich Indian languages, i.e. English-Tamil and English-Malayalam.
We proposed a novel NMT model using Multihead self-attention along with pre-trained Byte-Pair-Encoded (BPE) and MultiBPE embeddings to develop an efficient translation system.
arXiv Detail & Related papers (2020-04-19T17:29:34Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.