Translating the Unseen? Yor\`ub\'a $\rightarrow$ English MT in
Low-Resource, Morphologically-Unmarked Settings
- URL: http://arxiv.org/abs/2103.04225v2
- Date: Tue, 9 Mar 2021 04:46:10 GMT
- Title: Translating the Unseen? Yor\`ub\'a $\rightarrow$ English MT in
Low-Resource, Morphologically-Unmarked Settings
- Authors: Ife Adebara, Muhammad Abdul-Mageed, Miikka Silfverberg
- Abstract summary: Translating between languages where certain features are marked morphologically in one but absent or marked contextually in the other is an important test case for machine translation.
In this work, we perform fine-grained analysis on how an SMT system compares with two NMT systems when translating bare nouns in Yorub'a into English.
- Score: 8.006185289499049
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Translating between languages where certain features are marked
morphologically in one but absent or marked contextually in the other is an
important test case for machine translation. When translating into English
which marks (in)definiteness morphologically, from Yor\`ub\'a which uses bare
nouns but marks these features contextually, ambiguities arise. In this work,
we perform fine-grained analysis on how an SMT system compares with two NMT
systems (BiLSTM and Transformer) when translating bare nouns in Yor\`ub\'a into
English. We investigate how the systems what extent they identify BNs,
correctly translate them, and compare with human translation patterns. We also
analyze the type of errors each model makes and provide a linguistic
description of these errors. We glean insights for evaluating model performance
in low-resource settings. In translating bare nouns, our results show the
transformer model outperforms the SMT and BiLSTM models for 4 categories, the
BiLSTM outperforms the SMT model for 3 categories while the SMT outperforms the
NMT models for 1 category.
Related papers
- Word Order in English-Japanese Simultaneous Interpretation: Analyses and Evaluation using Chunk-wise Monotonic Translation [13.713981533436135]
This paper analyzes the features of monotonic translations, which follow the word order of the source language, in simultaneous interpreting (SI)
We analyzed the characteristics of chunk-wise monotonic translation (CMT) sentences using the NAIST English-to-Japanese Chunk-wise Monotonic Translation Evaluation dataset.
We further investigated the features of CMT sentences by evaluating the output from the existing speech translation (ST) and simultaneous speech translation (simulST) models on the NAIST English-to-Japanese Chunk-wise Monotonic Translation Evaluation dataset.
arXiv Detail & Related papers (2024-06-13T09:10:16Z) - Context-Aware Machine Translation with Source Coreference Explanation [26.336947440529713]
We propose a model that explains the decisions made for translation by predicting coreference features in the input.
We evaluate our method in the WMT document-level translation task of English-German dataset, the English-Russian dataset, and the multilingual TED talk dataset.
arXiv Detail & Related papers (2024-04-30T12:41:00Z) - Do GPTs Produce Less Literal Translations? [20.095646048167612]
Large Language Models (LLMs) have emerged as general-purpose language models capable of addressing many natural language generation or understanding tasks.
We find that translations out of English (E-X) from GPTs tend to be less literal, while exhibiting similar or better scores on Machine Translation quality metrics.
arXiv Detail & Related papers (2023-05-26T10:38:31Z) - Revisiting Machine Translation for Cross-lingual Classification [91.43729067874503]
Most research in the area focuses on the multilingual models rather than the Machine Translation component.
We show that, by using a stronger MT system and mitigating the mismatch between training on original text and running inference on machine translated text, translate-test can do substantially better than previously assumed.
arXiv Detail & Related papers (2023-05-23T16:56:10Z) - Towards Reliable Neural Machine Translation with Consistency-Aware
Meta-Learning [24.64700139151659]
Current Neural machine translation (NMT) systems suffer from a lack of reliability.
We present a consistency-aware meta-learning (CAML) framework derived from the model-agnostic meta-learning (MAML) algorithm to address it.
We conduct experiments on the NIST Chinese to English task, three WMT translation tasks, and the TED M2O task.
arXiv Detail & Related papers (2023-03-20T09:41:28Z) - Evaluating and Improving the Coreference Capabilities of Machine
Translation Models [30.60934078720647]
Machine translation requires a wide range of linguistic capabilities.
Current end-to-end models are expected to learn implicitly by observing aligned sentences in bilingual corpora.
arXiv Detail & Related papers (2023-02-16T18:16:09Z) - Improving Simultaneous Machine Translation with Monolingual Data [94.1085601198393]
Simultaneous machine translation (SiMT) is usually done via sequence-level knowledge distillation (Seq-KD) from a full-sentence neural machine translation (NMT) model.
We propose to leverage monolingual data to improve SiMT, which trains a SiMT student on the combination of bilingual data and external monolingual data distilled by Seq-KD.
arXiv Detail & Related papers (2022-12-02T14:13:53Z) - It is Not as Good as You Think! Evaluating Simultaneous Machine
Translation on Interpretation Data [58.105938143865906]
We argue that SiMT systems should be trained and tested on real interpretation data.
Our results highlight the difference of up-to 13.83 BLEU score when SiMT models are evaluated on translation vs interpretation data.
arXiv Detail & Related papers (2021-10-11T12:27:07Z) - ChrEnTranslate: Cherokee-English Machine Translation Demo with Quality
Estimation and Corrective Feedback [70.5469946314539]
ChrEnTranslate is an online machine translation demonstration system for translation between English and an endangered language Cherokee.
It supports both statistical and neural translation models as well as provides quality estimation to inform users of reliability.
arXiv Detail & Related papers (2021-07-30T17:58:54Z) - Exploring Unsupervised Pretraining Objectives for Machine Translation [99.5441395624651]
Unsupervised cross-lingual pretraining has achieved strong results in neural machine translation (NMT)
Most approaches adapt masked-language modeling (MLM) to sequence-to-sequence architectures, by masking parts of the input and reconstructing them in the decoder.
We compare masking with alternative objectives that produce inputs resembling real (full) sentences, by reordering and replacing words based on their context.
arXiv Detail & Related papers (2021-06-10T10:18:23Z) - Assessing the Bilingual Knowledge Learned by Neural Machine Translation
Models [72.56058378313963]
We bridge the gap by assessing the bilingual knowledge learned by NMT models with phrase table.
We find that NMT models learn patterns from simple to complex and distill essential bilingual knowledge from the training examples.
arXiv Detail & Related papers (2020-04-28T03:44:34Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.