Low-Resourced Machine Translation for Senegalese Wolof Language
- URL: http://arxiv.org/abs/2305.00606v1
- Date: Mon, 1 May 2023 00:04:19 GMT
- Title: Low-Resourced Machine Translation for Senegalese Wolof Language
- Authors: Derguene Mbaye, Moussa Diallo, Thierno Ibrahima Diop
- Abstract summary: We present a parallel Wolof/French corpus of 123,000 sentences on which we conducted experiments on machine translation models based on Recurrent Neural Networks (RNN)
We noted performance gains with the models trained on subworded data as well as those trained on the French-English language pair compared to those trained on the French-Wolof pair under the same experimental conditions.
- Score: 0.34376560669160383
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Natural Language Processing (NLP) research has made great advancements in
recent years with major breakthroughs that have established new benchmarks.
However, these advances have mainly benefited a certain group of languages
commonly referred to as resource-rich such as English and French. Majority of
other languages with weaker resources are then left behind which is the case
for most African languages including Wolof. In this work, we present a parallel
Wolof/French corpus of 123,000 sentences on which we conducted experiments on
machine translation models based on Recurrent Neural Networks (RNN) in
different data configurations. We noted performance gains with the models
trained on subworded data as well as those trained on the French-English
language pair compared to those trained on the French-Wolof pair under the same
experimental conditions.
Related papers
- DN at SemEval-2023 Task 12: Low-Resource Language Text Classification
via Multilingual Pretrained Language Model Fine-tuning [0.0]
Most existing models and datasets for sentiment analysis are developed for high-resource languages, such as English and Chinese.
The AfriSenti-SemEval 2023 Shared Task 12 aims to fill this gap by evaluating sentiment analysis models on low-resource African languages.
We present our solution to the shared task, where we employed different multilingual XLM-R models with classification head trained on various data.
arXiv Detail & Related papers (2023-05-04T07:28:45Z) - Transfer to a Low-Resource Language via Close Relatives: The Case Study
on Faroese [54.00582760714034]
Cross-lingual NLP transfer can be improved by exploiting data and models of high-resource languages.
We release a new web corpus of Faroese and Faroese datasets for named entity recognition (NER), semantic text similarity (STS) and new language models trained on all Scandinavian languages.
arXiv Detail & Related papers (2023-04-18T08:42:38Z) - Multilingual unsupervised sequence segmentation transfers to extremely
low-resource languages [0.0]
Unsupervised sequence-segmentation performance can be transferred to extremely low-resource languages by pre-training a Masked Segmental Language Model multilingually.
We show that this transfer can be achieved by training over a collection of low-resource languages that are typologically similar (but phylogenetically unrelated) to the target language.
arXiv Detail & Related papers (2021-10-16T00:08:28Z) - AfroMT: Pretraining Strategies and Reproducible Benchmarks for
Translation of 8 African Languages [94.75849612191546]
AfroMT is a standardized, clean, and reproducible machine translation benchmark for eight widely spoken African languages.
We develop a suite of analysis tools for system diagnosis taking into account the unique properties of these languages.
We demonstrate significant improvements when pretraining on 11 languages, with gains of up to 2 BLEU points over strong baselines.
arXiv Detail & Related papers (2021-09-10T07:45:21Z) - AmericasNLI: Evaluating Zero-shot Natural Language Understanding of
Pretrained Multilingual Models in Truly Low-resource Languages [75.08199398141744]
We present AmericasNLI, an extension of XNLI (Conneau et al.), to 10 indigenous languages of the Americas.
We conduct experiments with XLM-R, testing multiple zero-shot and translation-based approaches.
We find that XLM-R's zero-shot performance is poor for all 10 languages, with an average performance of 38.62%.
arXiv Detail & Related papers (2021-04-18T05:32:28Z) - Low-Resource Language Modelling of South African Languages [6.805575417034369]
We evaluate the performance of open-vocabulary language models on low-resource South African languages.
We evaluate different variants of n-gram models, feedforward neural networks, recurrent neural networks (RNNs) and Transformers on small-scale datasets.
Overall, well-regularized RNNs give the best performance across two isiZulu and one Sepedi datasets.
arXiv Detail & Related papers (2021-04-01T21:27:27Z) - Transfer Learning and Distant Supervision for Multilingual Transformer
Models: A Study on African Languages [20.92293429849952]
We study trends in performance for different amounts of available resources for the three African languages Hausa, isiXhosa and Yorub'a.
We show that in combination with transfer learning or distant supervision, these models can achieve with as little as 10 or 100 labeled sentences the same performance as baselines.
arXiv Detail & Related papers (2020-10-07T05:23:27Z) - Harnessing Multilinguality in Unsupervised Machine Translation for Rare
Languages [48.28540903568198]
We show that multilinguality is critical to making unsupervised systems practical for low-resource settings.
We present a single model for 5 low-resource languages (Gujarati, Kazakh, Nepali, Sinhala, and Turkish) to and from English directions.
We outperform all current state-of-the-art unsupervised baselines for these languages, achieving gains of up to 14.4 BLEU.
arXiv Detail & Related papers (2020-09-23T15:07:33Z) - Improving Massively Multilingual Neural Machine Translation and
Zero-Shot Translation [81.7786241489002]
Massively multilingual models for neural machine translation (NMT) are theoretically attractive, but often underperform bilingual models and deliver poor zero-shot translations.
We argue that multilingual NMT requires stronger modeling capacity to support language pairs with varying typological characteristics.
We propose random online backtranslation to enforce the translation of unseen training language pairs.
arXiv Detail & Related papers (2020-04-24T17:21:32Z) - Knowledge Distillation for Multilingual Unsupervised Neural Machine
Translation [61.88012735215636]
Unsupervised neural machine translation (UNMT) has recently achieved remarkable results for several language pairs.
UNMT can only translate between a single language pair and cannot produce translation results for multiple language pairs at the same time.
In this paper, we empirically introduce a simple method to translate between thirteen languages using a single encoder and a single decoder.
arXiv Detail & Related papers (2020-04-21T17:26:16Z) - Using LSTM to Translate French to Senegalese Local Languages: Wolof as a
Case Study [0.0]
We propose a neural machine translation system for Wolof, a low-resource Niger-Congo language.
We gathered a parallel corpus of 70000 aligned French-Wolof sentences.
Our models are trained on a limited amount of parallel French-Wolof data.
arXiv Detail & Related papers (2020-03-27T17:09:52Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.