Towards Supervised and Unsupervised Neural Machine Translation Baselines
for Nigerian Pidgin
- URL: http://arxiv.org/abs/2003.12660v1
- Date: Fri, 27 Mar 2020 22:40:01 GMT
- Title: Towards Supervised and Unsupervised Neural Machine Translation Baselines
for Nigerian Pidgin
- Authors: Orevaoghene Ahia and Kelechi Ogueji
- Abstract summary: Nigerian Pidgin is arguably the most widely spoken language in Nigeria. Variants of this language are also spoken across West and Central Africa.
This work aims to establish supervised and unsupervised neural machine translation baselines between English and Nigerian Pidgin.
- Score: 0.2792030485253753
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Nigerian Pidgin is arguably the most widely spoken language in Nigeria.
Variants of this language are also spoken across West and Central Africa,
making it a very important language. This work aims to establish supervised and
unsupervised neural machine translation (NMT) baselines between English and
Nigerian Pidgin. We implement and compare NMT models with different
tokenization methods, creating a solid foundation for future works.
Related papers
- Does Generative AI speak Nigerian-Pidgin?: Issues about Representativeness and Bias for Multilingualism in LLMs [8.829688681748413]
Naija is a Nigerian-Pidgin spoken by approx. 120M speakers in Nigeria.
It is a mixed language (e.g., English, Portuguese, Yoruba, Hausa and Igbo)
It is hard to distinguish by non-native from a larger pidgin languages spoken across West Africa known as West African Pidgin English (WAPE)
arXiv Detail & Related papers (2024-04-30T10:45:40Z) - Neural Machine Translation for the Indigenous Languages of the Americas:
An Introduction [102.13536517783837]
Most languages from the Americas are among them, having a limited amount of parallel and monolingual data, if any.
We discuss the recent advances and findings and open questions, product of an increased interest of the NLP community in these languages.
arXiv Detail & Related papers (2023-06-11T23:27:47Z) - NollySenti: Leveraging Transfer Learning and Machine Translation for
Nigerian Movie Sentiment Classification [10.18858070640917]
Africa has over 2000 indigenous languages but they are under-represented in NLP research due to lack of datasets.
We create a new dataset, NollySenti, based on the Nollywood movie reviews for five languages widely spoken in Nigeria (English, Hausa, Igbo, Nigerian-Pidgin, and Yoruba)
arXiv Detail & Related papers (2023-05-18T13:38:36Z) - Language Modeling, Lexical Translation, Reordering: The Training Process
of NMT through the Lens of Classical SMT [64.1841519527504]
neural machine translation uses a single neural network to model the entire translation process.
Despite neural machine translation being de-facto standard, it is still not clear how NMT models acquire different competences over the course of training.
arXiv Detail & Related papers (2021-09-03T09:38:50Z) - Exploring Unsupervised Pretraining Objectives for Machine Translation [99.5441395624651]
Unsupervised cross-lingual pretraining has achieved strong results in neural machine translation (NMT)
Most approaches adapt masked-language modeling (MLM) to sequence-to-sequence architectures, by masking parts of the input and reconstructing them in the decoder.
We compare masking with alternative objectives that produce inputs resembling real (full) sentences, by reordering and replacing words based on their context.
arXiv Detail & Related papers (2021-06-10T10:18:23Z) - Towards End-to-End Training of Automatic Speech Recognition for Nigerian
Pidgin [0.0]
Nigerian pidgin is one of the most popular languages in West Africa.
We present the first parallel (speech-to-text) data on Nigerian pidgin.
We also trained the first end-to-end speech recognition system on this language.
arXiv Detail & Related papers (2020-10-21T16:32:58Z) - SJTU-NICT's Supervised and Unsupervised Neural Machine Translation
Systems for the WMT20 News Translation Task [111.91077204077817]
We participated in four translation directions of three language pairs: English-Chinese, English-Polish, and German-Upper Sorbian.
Based on different conditions of language pairs, we have experimented with diverse neural machine translation (NMT) techniques.
In our submissions, the primary systems won the first place on English to Chinese, Polish to English, and German to Upper Sorbian translation directions.
arXiv Detail & Related papers (2020-10-11T00:40:05Z) - HausaMT v1.0: Towards English-Hausa Neural Machine Translation [0.012691047660244334]
We build a baseline model for English-Hausa machine translation.
The Hausa language is the second largest Afro-Asiatic language in the world after Arabic.
arXiv Detail & Related papers (2020-06-09T02:08:03Z) - Neural Machine Translation: Challenges, Progress and Future [62.75523637241876]
Machine translation (MT) is a technique that leverages computers to translate human languages automatically.
neural machine translation (NMT) models direct mapping between source and target languages with deep neural networks.
This article makes a review of NMT framework, discusses the challenges in NMT and introduces some exciting recent progresses.
arXiv Detail & Related papers (2020-04-13T07:53:57Z) - Cross-lingual Supervision Improves Unsupervised Neural Machine
Translation [97.84871088440102]
We introduce a multilingual unsupervised NMT framework to leverage weakly supervised signals from high-resource language pairs to zero-resource translation directions.
Method significantly improves the translation quality by more than 3 BLEU score on six benchmark unsupervised translation directions.
arXiv Detail & Related papers (2020-04-07T05:46:49Z) - Towards Neural Machine Translation for Edoid Languages [2.144787054581292]
Many Nigerian languages have relinquished their previous prestige and purpose in modern society to English and Nigerian Pidgin.
This work explores the feasibility of Neural Machine Translation for the Edoid language family of Southern Nigeria.
arXiv Detail & Related papers (2020-03-24T07:53:41Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.