When Does Unsupervised Machine Translation Work?
- URL: http://arxiv.org/abs/2004.05516v3
- Date: Thu, 19 Nov 2020 02:48:08 GMT
- Title: When Does Unsupervised Machine Translation Work?
- Authors: Kelly Marchisio, Kevin Duh, and Philipp Koehn
- Abstract summary: We conduct an empirical evaluation of unsupervised machine translation (MT) using dissimilar language pairs, dissimilar domains, diverse datasets, and authentic low-resource languages.
We find that performance rapidly deteriorates when source and target corpora are from different domains.
We additionally find that unsupervised MT performance declines when source and target languages use different scripts, and observe very poor performance on authentic low-resource language pairs.
- Score: 23.690875724726908
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Despite the reported success of unsupervised machine translation (MT), the
field has yet to examine the conditions under which these methods succeed, and
where they fail. We conduct an extensive empirical evaluation of unsupervised
MT using dissimilar language pairs, dissimilar domains, diverse datasets, and
authentic low-resource languages. We find that performance rapidly deteriorates
when source and target corpora are from different domains, and that random word
embedding initialization can dramatically affect downstream translation
performance. We additionally find that unsupervised MT performance declines
when source and target languages use different scripts, and observe very poor
performance on authentic low-resource language pairs. We advocate for extensive
empirical evaluation of unsupervised MT systems to highlight failure points and
encourage continued research on the most promising paradigms.
Related papers
- Towards Effective Disambiguation for Machine Translation with Large
Language Models [65.80775710657672]
We study the capabilities of large language models to translate "ambiguous sentences"
Experiments show that our methods can match or outperform state-of-the-art systems such as DeepL and NLLB in four out of five language directions.
arXiv Detail & Related papers (2023-09-20T22:22:52Z) - Perturbation-based QE: An Explainable, Unsupervised Word-level Quality
Estimation Method for Blackbox Machine Translation [12.376309678270275]
Perturbation-based QE works simply by analyzing MT system output on perturbed input source sentences.
Our approach is better at detecting gender bias and word-sense-disambiguation errors in translation than supervised QE.
arXiv Detail & Related papers (2023-05-12T13:10:57Z) - Dictionary-based Phrase-level Prompting of Large Language Models for
Machine Translation [91.57514888410205]
Large language models (LLMs) demonstrate remarkable machine translation (MT) abilities via prompting.
LLMs can struggle to translate inputs with rare words, which are common in low resource or domain transfer scenarios.
We show that LLM prompting can provide an effective solution for rare words as well, by using prior knowledge from bilingual dictionaries to provide control hints in the prompts.
arXiv Detail & Related papers (2023-02-15T18:46:42Z) - Extrinsic Evaluation of Machine Translation Metrics [78.75776477562087]
It is unclear if automatic metrics are reliable at distinguishing good translations from bad translations at the sentence level.
We evaluate the segment-level performance of the most widely used MT metrics (chrF, COMET, BERTScore, etc.) on three downstream cross-lingual tasks.
Our experiments demonstrate that all metrics exhibit negligible correlation with the extrinsic evaluation of the downstream outcomes.
arXiv Detail & Related papers (2022-12-20T14:39:58Z) - Prompting PaLM for Translation: Assessing Strategies and Performance [16.73524055296411]
pathways language model (PaLM) has demonstrated the strongest machine translation (MT) performance among similarly-trained LLMs to date.
We revisit previous assessments of PaLM's MT capabilities with more recent test sets, modern MT metrics, and human evaluation, and find that its performance, while impressive, still lags that of state-of-the-art supervised systems.
arXiv Detail & Related papers (2022-11-16T18:42:37Z) - When Does Translation Require Context? A Data-driven, Multilingual
Exploration [71.43817945875433]
proper handling of discourse significantly contributes to the quality of machine translation (MT)
Recent works in context-aware MT attempt to target a small set of discourse phenomena during evaluation.
We develop the Multilingual Discourse-Aware benchmark, a series of taggers that identify and evaluate model performance on discourse phenomena.
arXiv Detail & Related papers (2021-09-15T17:29:30Z) - What Can Unsupervised Machine Translation Contribute to High-Resource
Language Pairs? [18.924296648372795]
We compare the style of correct translations generated by either supervised or unsupervised MT.
We demonstrate a way to combine the benefits of unsupervised and supervised MT into a single system.
arXiv Detail & Related papers (2021-06-30T05:44:05Z) - Unsupervised Multimodal Neural Machine Translation with Pseudo Visual
Pivoting [105.5303416210736]
Unsupervised machine translation (MT) has recently achieved impressive results with monolingual corpora only.
It is still challenging to associate source-target sentences in the latent space.
As people speak different languages biologically share similar visual systems, the potential of achieving better alignment through visual content is promising.
arXiv Detail & Related papers (2020-05-06T20:11:46Z) - When and Why is Unsupervised Neural Machine Translation Useless? [43.68079166777282]
In ten translation tasks with various data settings, we analyze the conditions under which the unsupervised methods fail to produce reasonable translations.
Our analyses pinpoint the limits of the current unsupervised NMT and also suggest immediate research directions.
arXiv Detail & Related papers (2020-04-22T14:00:55Z) - Cross-lingual Supervision Improves Unsupervised Neural Machine
Translation [97.84871088440102]
We introduce a multilingual unsupervised NMT framework to leverage weakly supervised signals from high-resource language pairs to zero-resource translation directions.
Method significantly improves the translation quality by more than 3 BLEU score on six benchmark unsupervised translation directions.
arXiv Detail & Related papers (2020-04-07T05:46:49Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.