Congolese Swahili Machine Translation for Humanitarian Response
- URL: http://arxiv.org/abs/2103.10734v1
- Date: Fri, 19 Mar 2021 11:15:48 GMT
- Title: Congolese Swahili Machine Translation for Humanitarian Response
- Authors: Alp \"Oktem, Eric DeLuca, Rodrigue Bashizi, Eric Paquin, Grace Tang
- Abstract summary: We describe our efforts to make a bidirectional Congolese Swahili to French neural machine translation system.
For training, we created a 25,302-sentence general domain parallel corpus.
We recorded improvements of up to 2.4 and 3.5 BLEU points in the SWC-FRA and FRA-SWC directions.
- Score: 0.05526111147542002
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In this paper we describe our efforts to make a bidirectional Congolese
Swahili (SWC) to French (FRA) neural machine translation system with the
motivation of improving humanitarian translation workflows. For training, we
created a 25,302-sentence general domain parallel corpus and combined it with
publicly available data. Experimenting with low-resource methodologies like
cross-dialect transfer and semi-supervised learning, we recorded improvements
of up to 2.4 and 3.5 BLEU points in the SWC-FRA and FRA-SWC directions,
respectively. We performed human evaluations to assess the usability of our
models in a COVID-domain chatbot that operates in the Democratic Republic of
Congo (DRC). Direct assessment in the SWC-FRA direction demonstrated an average
quality ranking of 6.3 out of 10 with 75% of the target strings conveying the
main message of the source text. For the FRA-SWC direction, our preliminary
tests on post-editing assessment showed its potential usefulness for
machine-assisted translation. We make our models, datasets containing up to 1
million sentences, our development pipeline, and a translator web-app available
for public use.
Related papers
- Rethinking Human-like Translation Strategy: Integrating Drift-Diffusion
Model with Large Language Models for Machine Translation [15.333148705267012]
We propose Thinker with the Drift-Diffusion Model to emulate human translators' dynamic decision-making under constrained resources.
We conduct experiments under the high-resource, low-resource, and commonsense translation settings using the WMT22 and CommonMT datasets.
We also perform additional analysis and evaluation on commonsense translation to illustrate the high effectiveness and efficacy of the proposed method.
arXiv Detail & Related papers (2024-02-16T14:00:56Z) - Prosody in Cascade and Direct Speech-to-Text Translation: a case study
on Korean Wh-Phrases [79.07111754406841]
This work proposes using contrastive evaluation to measure the ability of direct S2TT systems to disambiguate utterances where prosody plays a crucial role.
Our results clearly demonstrate the value of direct translation systems over cascade translation models.
arXiv Detail & Related papers (2024-02-01T14:46:35Z) - SurreyAI 2023 Submission for the Quality Estimation Shared Task [17.122657128702276]
This paper describes the approach adopted by the SurreyAI team for addressing the Sentence-Level Direct Assessment task in WMT23.
The proposed approach builds upon the TransQuest framework, exploring various autoencoder pre-trained language models.
The evaluation utilizes Spearman and Pearson correlation coefficients, assessing the relationship between machine-predicted quality scores and human judgments.
arXiv Detail & Related papers (2023-12-01T12:01:04Z) - DISCO: A Large Scale Human Annotated Corpus for Disfluency Correction in
Indo-European Languages [68.66827612799577]
Disfluency correction (DC) is the process of removing disfluent elements like fillers, repetitions and corrections from spoken utterances to create readable and interpretable text.
We present a high-quality human-annotated DC corpus covering four important Indo-European languages: English, Hindi, German and French.
We show that DC leads to 5.65 points increase in BLEU scores on average when used in conjunction with a state-of-the-art Machine Translation (MT) system.
arXiv Detail & Related papers (2023-10-25T16:32:02Z) - Strategies for improving low resource speech to text translation relying
on pre-trained ASR models [59.90106959717875]
This paper presents techniques and findings for improving the performance of low-resource speech to text translation (ST)
We conducted experiments on both simulated and real-low resource setups, on language pairs English - Portuguese, and Tamasheq - French respectively.
arXiv Detail & Related papers (2023-05-31T21:58:07Z) - BJTU-WeChat's Systems for the WMT22 Chat Translation Task [66.81525961469494]
This paper introduces the joint submission of the Beijing Jiaotong University and WeChat AI to the WMT'22 chat translation task for English-German.
Based on the Transformer, we apply several effective variants.
Our systems achieve 0.810 and 0.946 COMET scores.
arXiv Detail & Related papers (2022-11-28T02:35:04Z) - The USYD-JD Speech Translation System for IWSLT 2021 [85.64797317290349]
This paper describes the University of Sydney& JD's joint submission of the IWSLT 2021 low resource speech translation task.
We trained our models with the officially provided ASR and MT datasets.
To achieve better translation performance, we explored the most recent effective strategies, including back translation, knowledge distillation, multi-feature reranking and transductive finetuning.
arXiv Detail & Related papers (2021-07-24T09:53:34Z) - Cross-lingual Retrieval for Iterative Self-Supervised Training [66.3329263451598]
Cross-lingual alignment can be further improved by training seq2seq models on sentence pairs mined using their own encoder outputs.
We develop a new approach -- cross-lingual retrieval for iterative self-supervised training.
arXiv Detail & Related papers (2020-06-16T21:30:51Z) - Using LSTM to Translate French to Senegalese Local Languages: Wolof as a
Case Study [0.0]
We propose a neural machine translation system for Wolof, a low-resource Niger-Congo language.
We gathered a parallel corpus of 70000 aligned French-Wolof sentences.
Our models are trained on a limited amount of parallel French-Wolof data.
arXiv Detail & Related papers (2020-03-27T17:09:52Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.