$\varepsilon$ K\'U : Integrating Yor\`ub\'a cultural greetings
into machine translation
- URL: http://arxiv.org/abs/2303.17972v2
- Date: Mon, 24 Apr 2023 16:29:38 GMT
- Title: $\varepsilon$ K\'U <MASK>: Integrating Yor\`ub\'a cultural greetings
into machine translation
- Authors: Idris Akinade, Jesujoba Alabi, David Adelani, Clement Odoje and
Dietrich Klakow
- Abstract summary: We present IkiniYorub'a, a Yorub'a-English translation dataset containing some Yorub'a greetings, and sample use cases.
We show that different multilingual NMT systems including Google and NLLB struggle to accurately translate Yorub'a greetings into English.
In addition, we trained a Yorub'a-English model by finetuning an existing NMT model on the training split of IkiniYorub'a and this achieved better performance when compared to the pre-trained multilingual NMT models
- Score: 14.469047518226708
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: This paper investigates the performance of massively multilingual neural
machine translation (NMT) systems in translating Yor\`ub\'a greetings
($\varepsilon$ k\'u [MASK]), which are a big part of Yor\`ub\'a language and
culture, into English. To evaluate these models, we present IkiniYor\`ub\'a, a
Yor\`ub\'a-English translation dataset containing some Yor\`ub\'a greetings,
and sample use cases. We analysed the performance of different multilingual NMT
systems including Google and NLLB and show that these models struggle to
accurately translate Yor\`ub\'a greetings into English. In addition, we trained
a Yor\`ub\'a-English model by finetuning an existing NMT model on the training
split of IkiniYor\`ub\'a and this achieved better performance when compared to
the pre-trained multilingual NMT models, although they were trained on a large
volume of data.
Related papers
- SPRING Lab IITM's submission to Low Resource Indic Language Translation Shared Task [10.268444449457956]
We develop a robust translation model for four low-resource Indic languages: Khasi, Mizo, Manipuri, and Assamese.
Our approach includes a comprehensive pipeline from data collection and preprocessing to training and evaluation.
To address the scarcity of bilingual data, we use back-translation techniques on monolingual datasets for Mizo and Khasi.
arXiv Detail & Related papers (2024-11-01T16:39:03Z) - NusaMT-7B: Machine Translation for Low-Resource Indonesian Languages with Large Language Models [2.186901738997927]
This paper introduces NusaMT-7B, an LLM-based machine translation model for low-resource Indonesian languages.
Our approach integrates continued pre-training on monolingual data,Supervised Fine-Tuning (SFT), self-learning, and an LLM-based data cleaner to reduce noise in parallel sentences.
Our results show that fine-tuned LLMs can enhance translation quality for low-resource languages, aiding in linguistic preservation and cross-cultural communication.
arXiv Detail & Related papers (2024-10-10T11:33:25Z) - Unified Model Learning for Various Neural Machine Translation [63.320005222549646]
Existing machine translation (NMT) studies mainly focus on developing dataset-specific models.
We propose a versatile'' model, i.e., the Unified Model Learning for NMT (UMLNMT) that works with data from different tasks.
OurNMT results in substantial improvements over dataset-specific models with significantly reduced model deployment costs.
arXiv Detail & Related papers (2023-05-04T12:21:52Z) - Better Datastore, Better Translation: Generating Datastores from
Pre-Trained Models for Nearest Neural Machine Translation [48.58899349349702]
Nearest Neighbor Machine Translation (kNNMT) is a simple and effective method of augmenting neural machine translation (NMT) with a token-level nearest neighbor retrieval mechanism.
In this paper, we propose PRED, a framework that leverages Pre-trained models for Datastores in kNN-MT.
arXiv Detail & Related papers (2022-12-17T08:34:20Z) - Language Modeling, Lexical Translation, Reordering: The Training Process
of NMT through the Lens of Classical SMT [64.1841519527504]
neural machine translation uses a single neural network to model the entire translation process.
Despite neural machine translation being de-facto standard, it is still not clear how NMT models acquire different competences over the course of training.
arXiv Detail & Related papers (2021-09-03T09:38:50Z) - ChrEnTranslate: Cherokee-English Machine Translation Demo with Quality
Estimation and Corrective Feedback [70.5469946314539]
ChrEnTranslate is an online machine translation demonstration system for translation between English and an endangered language Cherokee.
It supports both statistical and neural translation models as well as provides quality estimation to inform users of reliability.
arXiv Detail & Related papers (2021-07-30T17:58:54Z) - Exploring Unsupervised Pretraining Objectives for Machine Translation [99.5441395624651]
Unsupervised cross-lingual pretraining has achieved strong results in neural machine translation (NMT)
Most approaches adapt masked-language modeling (MLM) to sequence-to-sequence architectures, by masking parts of the input and reconstructing them in the decoder.
We compare masking with alternative objectives that produce inputs resembling real (full) sentences, by reordering and replacing words based on their context.
arXiv Detail & Related papers (2021-06-10T10:18:23Z) - Self-supervised and Supervised Joint Training for Resource-rich Machine
Translation [30.502625878505732]
Self-supervised pre-training of text representations has been successfully applied to low-resource Neural Machine Translation (NMT)
We propose a joint training approach, $F$-XEnDec, to combine self-supervised and supervised learning to optimize NMT models.
arXiv Detail & Related papers (2021-06-08T02:35:40Z) - SJTU-NICT's Supervised and Unsupervised Neural Machine Translation
Systems for the WMT20 News Translation Task [111.91077204077817]
We participated in four translation directions of three language pairs: English-Chinese, English-Polish, and German-Upper Sorbian.
Based on different conditions of language pairs, we have experimented with diverse neural machine translation (NMT) techniques.
In our submissions, the primary systems won the first place on English to Chinese, Polish to English, and German to Upper Sorbian translation directions.
arXiv Detail & Related papers (2020-10-11T00:40:05Z) - JASS: Japanese-specific Sequence to Sequence Pre-training for Neural
Machine Translation [27.364702152624034]
JASS is joint BMASS (Bunsetsu MASS) and BRSS (Bunsetsu Reordering Sequence to Sequence) pre-training.
We show for the first time that joint MASS and JASS pre-training gives results that significantly surpass the individual methods.
We will release our code, pre-trained models and bunsetsu annotated data as resources for researchers to use in their own NLP tasks.
arXiv Detail & Related papers (2020-05-07T09:53:25Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.