What Can Unsupervised Machine Translation Contribute to High-Resource
Language Pairs?
- URL: http://arxiv.org/abs/2106.15818v1
- Date: Wed, 30 Jun 2021 05:44:05 GMT
- Title: What Can Unsupervised Machine Translation Contribute to High-Resource
Language Pairs?
- Authors: Kelly Marchisio, Markus Freitag, David Grangier
- Abstract summary: We compare the style of correct translations generated by either supervised or unsupervised MT.
We demonstrate a way to combine the benefits of unsupervised and supervised MT into a single system.
- Score: 18.924296648372795
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Whereas existing literature on unsupervised machine translation (MT) focuses
on exploiting unsupervised techniques for low-resource language pairs where
bilingual training data is scare or unavailable, we investigate whether
unsupervised MT can also improve translation quality of high-resource language
pairs where sufficient bitext does exist. We compare the style of correct
translations generated by either supervised or unsupervised MT and find that
the unsupervised output is less monotonic and more natural than supervised
output. We demonstrate a way to combine the benefits of unsupervised and
supervised MT into a single system, resulting in better human evaluation of
quality and fluency. Our results open the door to discussions about the
potential contributions of unsupervised MT in high-resource settings, and how
supervised and unsupervised systems might be mutually-beneficial.
Related papers
- An Empirical study of Unsupervised Neural Machine Translation: analyzing
NMT output, model's behavior and sentences' contribution [5.691028372215281]
Unsupervised Neural Machine Translation (UNMT) focuses on improving NMT results under the assumption there is no human translated parallel data.
We focus on three very diverse languages, French, Gujarati, and Kazakh, and train bilingual NMT models, to and from English, with various levels of supervision.
arXiv Detail & Related papers (2023-12-19T20:35:08Z) - Perturbation-based QE: An Explainable, Unsupervised Word-level Quality
Estimation Method for Blackbox Machine Translation [12.376309678270275]
Perturbation-based QE works simply by analyzing MT system output on perturbed input source sentences.
Our approach is better at detecting gender bias and word-sense-disambiguation errors in translation than supervised QE.
arXiv Detail & Related papers (2023-05-12T13:10:57Z) - Improving Cascaded Unsupervised Speech Translation with Denoising
Back-translation [70.33052952571884]
We propose to build a cascaded speech translation system without leveraging any kind of paired data.
We use fully unpaired data to train our unsupervised systems and evaluate our results on CoVoST 2 and CVSS.
arXiv Detail & Related papers (2023-05-12T13:07:51Z) - Dictionary-based Phrase-level Prompting of Large Language Models for
Machine Translation [91.57514888410205]
Large language models (LLMs) demonstrate remarkable machine translation (MT) abilities via prompting.
LLMs can struggle to translate inputs with rare words, which are common in low resource or domain transfer scenarios.
We show that LLM prompting can provide an effective solution for rare words as well, by using prior knowledge from bilingual dictionaries to provide control hints in the prompts.
arXiv Detail & Related papers (2023-02-15T18:46:42Z) - Unsupervised Multimodal Neural Machine Translation with Pseudo Visual
Pivoting [105.5303416210736]
Unsupervised machine translation (MT) has recently achieved impressive results with monolingual corpora only.
It is still challenging to associate source-target sentences in the latent space.
As people speak different languages biologically share similar visual systems, the potential of achieving better alignment through visual content is promising.
arXiv Detail & Related papers (2020-05-06T20:11:46Z) - When and Why is Unsupervised Neural Machine Translation Useless? [43.68079166777282]
In ten translation tasks with various data settings, we analyze the conditions under which the unsupervised methods fail to produce reasonable translations.
Our analyses pinpoint the limits of the current unsupervised NMT and also suggest immediate research directions.
arXiv Detail & Related papers (2020-04-22T14:00:55Z) - When Does Unsupervised Machine Translation Work? [23.690875724726908]
We conduct an empirical evaluation of unsupervised machine translation (MT) using dissimilar language pairs, dissimilar domains, diverse datasets, and authentic low-resource languages.
We find that performance rapidly deteriorates when source and target corpora are from different domains.
We additionally find that unsupervised MT performance declines when source and target languages use different scripts, and observe very poor performance on authentic low-resource language pairs.
arXiv Detail & Related papers (2020-04-12T00:57:47Z) - Self-Training for Unsupervised Neural Machine Translation in Unbalanced
Training Data Scenarios [61.88012735215636]
Unsupervised neural machine translation (UNMT) that relies solely on massive monolingual corpora has achieved remarkable results in several translation tasks.
In real-world scenarios, massive monolingual corpora do not exist for some extremely low-resource languages such as Estonian.
We propose UNMT self-training mechanisms to train a robust UNMT system and improve its performance.
arXiv Detail & Related papers (2020-04-09T12:07:17Z) - Cross-lingual Supervision Improves Unsupervised Neural Machine
Translation [97.84871088440102]
We introduce a multilingual unsupervised NMT framework to leverage weakly supervised signals from high-resource language pairs to zero-resource translation directions.
Method significantly improves the translation quality by more than 3 BLEU score on six benchmark unsupervised translation directions.
arXiv Detail & Related papers (2020-04-07T05:46:49Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.