Is MAP Decoding All You Need? The Inadequacy of the Mode in Neural
Machine Translation
- URL: http://arxiv.org/abs/2005.10283v2
- Date: Wed, 28 Oct 2020 11:29:52 GMT
- Title: Is MAP Decoding All You Need? The Inadequacy of the Mode in Neural
Machine Translation
- Authors: Bryan Eikema and Wilker Aziz
- Abstract summary: We show that some of the known pathologies and biases of NMT are due to MAP decoding and not to NMT's statistical assumptions nor MLE.
We show that an approximation to minimum Bayes risk decoding gives competitive results confirming that NMT models do capture important aspects of translation well in expectation.
- Score: 15.615065041164623
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Recent studies have revealed a number of pathologies of neural machine
translation (NMT) systems. Hypotheses explaining these mostly suggest there is
something fundamentally wrong with NMT as a model or its training algorithm,
maximum likelihood estimation (MLE). Most of this evidence was gathered using
maximum a posteriori (MAP) decoding, a decision rule aimed at identifying the
highest-scoring translation, i.e. the mode. We argue that the evidence
corroborates the inadequacy of MAP decoding more than casts doubt on the model
and its training algorithm. In this work, we show that translation
distributions do reproduce various statistics of the data well, but that beam
search strays from such statistics. We show that some of the known pathologies
and biases of NMT are due to MAP decoding and not to NMT's statistical
assumptions nor MLE. In particular, we show that the most likely translations
under the model accumulate so little probability mass that the mode can be
considered essentially arbitrary. We therefore advocate for the use of decision
rules that take into account the translation distribution holistically. We show
that an approximation to minimum Bayes risk decoding gives competitive results
confirming that NMT models do capture important aspects of translation well in
expectation.
Related papers
- Towards Zero-Shot Multimodal Machine Translation [64.9141931372384]
We propose a method to bypass the need for fully supervised data to train multimodal machine translation systems.
Our method, called ZeroMMT, consists in adapting a strong text-only machine translation (MT) model by training it on a mixture of two objectives.
To prove that our method generalizes to languages with no fully supervised training data available, we extend the CoMMuTE evaluation dataset to three new languages: Arabic, Russian and Chinese.
arXiv Detail & Related papers (2024-07-18T15:20:31Z) - Automatic Evaluation and Analysis of Idioms in Neural Machine
Translation [12.227312923011986]
We present a novel metric for measuring the frequency of literal translation errors without human involvement.
We explore the role of monolingual pretraining and find that it yields substantial targeted improvements.
We find that the randomly idiom models are more local or "myopic" as they are relatively unaffected by variations of the context.
arXiv Detail & Related papers (2022-10-10T10:30:09Z) - Quality-Aware Decoding for Neural Machine Translation [64.24934199944875]
We propose quality-aware decoding for neural machine translation (NMT)
We leverage recent breakthroughs in reference-free and reference-based MT evaluation through various inference methods.
We find that quality-aware decoding consistently outperforms MAP-based decoding according both to state-of-the-art automatic metrics and to human assessments.
arXiv Detail & Related papers (2022-05-02T15:26:28Z) - Sampling-Based Minimum Bayes Risk Decoding for Neural Machine
Translation [20.76001576262768]
We show that a sampling-based approximation to minimum Bayes risk (MBR) decoding has no equivalent to the beam search curse.
We also show that it can be beneficial to make use of strategies like beam search and nucleus sampling to construct hypothesis spaces efficiently.
arXiv Detail & Related papers (2021-08-10T14:35:24Z) - Exploring Unsupervised Pretraining Objectives for Machine Translation [99.5441395624651]
Unsupervised cross-lingual pretraining has achieved strong results in neural machine translation (NMT)
Most approaches adapt masked-language modeling (MLM) to sequence-to-sequence architectures, by masking parts of the input and reconstructing them in the decoder.
We compare masking with alternative objectives that produce inputs resembling real (full) sentences, by reordering and replacing words based on their context.
arXiv Detail & Related papers (2021-06-10T10:18:23Z) - Language Model Prior for Low-Resource Neural Machine Translation [85.55729693003829]
We propose a novel approach to incorporate a LM as prior in a neural translation model (TM)
We add a regularization term, which pushes the output distributions of the TM to be probable under the LM prior.
Results on two low-resource machine translation datasets show clear improvements even with limited monolingual data.
arXiv Detail & Related papers (2020-04-30T16:29:56Z) - Assessing the Bilingual Knowledge Learned by Neural Machine Translation
Models [72.56058378313963]
We bridge the gap by assessing the bilingual knowledge learned by NMT models with phrase table.
We find that NMT models learn patterns from simple to complex and distill essential bilingual knowledge from the training examples.
arXiv Detail & Related papers (2020-04-28T03:44:34Z) - Cross-lingual Supervision Improves Unsupervised Neural Machine
Translation [97.84871088440102]
We introduce a multilingual unsupervised NMT framework to leverage weakly supervised signals from high-resource language pairs to zero-resource translation directions.
Method significantly improves the translation quality by more than 3 BLEU score on six benchmark unsupervised translation directions.
arXiv Detail & Related papers (2020-04-07T05:46:49Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.