Understanding the Properties of Minimum Bayes Risk Decoding in Neural
Machine Translation
- URL: http://arxiv.org/abs/2105.08504v1
- Date: Tue, 18 May 2021 13:31:05 GMT
- Title: Understanding the Properties of Minimum Bayes Risk Decoding in Neural
Machine Translation
- Authors: Mathias M\"uller and Rico Sennrich
- Abstract summary: Neural Machine Translation (NMT) currently exhibits biases such as producing translations that are too short and overgenerating frequent words.
Recent work has tied these shortcomings to beam search.
Eikema & Aziz ( 2020) propose to use Minimum Bayes Risk (MBR) decoding on unbiased samples instead.
- Score: 26.33252528975464
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Neural Machine Translation (NMT) currently exhibits biases such as producing
translations that are too short and overgenerating frequent words, and shows
poor robustness to copy noise in training data or domain shift. Recent work has
tied these shortcomings to beam search -- the de facto standard inference
algorithm in NMT -- and Eikema & Aziz (2020) propose to use Minimum Bayes Risk
(MBR) decoding on unbiased samples instead.
In this paper, we empirically investigate the properties of MBR decoding on a
number of previously reported biases and failure cases of beam search. We find
that MBR still exhibits a length and token frequency bias, owing to the MT
metrics used as utility functions, but that MBR also increases robustness
against copy noise in the training data and domain shift.
Related papers
- Unveiling the Power of Source: Source-based Minimum Bayes Risk Decoding for Neural Machine Translation [30.323103270892734]
Maximum a posteriori decoding, a commonly used method for neural machine translation (NMT), aims to maximize the estimated posterior probability.
Minimum Bayes Risk (MBR) decoding offers an alternative by seeking hypotheses with the highest expected utility.
arXiv Detail & Related papers (2024-06-17T15:13:52Z) - Nearest Neighbor Speculative Decoding for LLM Generation and Attribution [87.3259169631789]
Nearest Speculative Decoding (NEST) is capable of incorporating real-world text spans of arbitrary length into the LM generations and providing attribution to their sources.
NEST significantly enhances the generation quality and attribution rate of the base LM across a variety of knowledge-intensive tasks.
In addition, NEST substantially improves the generation speed, achieving a 1.8x speedup in inference time when applied to Llama-2-Chat 70B.
arXiv Detail & Related papers (2024-05-29T17:55:03Z) - Centroid-Based Efficient Minimum Bayes Risk Decoding [38.04403087991526]
Minimum Bayes risk (MBR) decoding achieved state-of-the-art translation performance by using COMET.
MBR decoding requires quadratic time since it computes the expected score between a translation hypothesis and all reference translations.
Our method clusters the reference translations in the feature space, and then calculates the score using the centroids of each cluster.
arXiv Detail & Related papers (2024-02-17T05:15:12Z) - Linear-time Minimum Bayes Risk Decoding with Reference Aggregation [52.1701152610258]
Minimum Bayes Risk (MBR) decoding is a text generation technique that has been shown to improve the quality of machine translations.
It requires the pairwise calculation of a utility metric, which has quadratic complexity.
We propose to approximate pairwise metric scores with scores calculated against aggregated reference representations.
arXiv Detail & Related papers (2024-02-06T18:59:30Z) - Hyperparameter-Free Approach for Faster Minimum Bayes Risk Decoding [5.639904484784127]
Minimum Bayes-Risk (MBR) decoding is a powerful alternative to beam search decoding for a wide range of text generation tasks.
MBR requires a huge amount of time for inference to compute the objective.
Confidence-based pruning (CBP) has recently been proposed to reduce the inference time in machine translation tasks.
arXiv Detail & Related papers (2024-01-05T11:02:08Z) - Faster Minimum Bayes Risk Decoding with Confidence-based Pruning [8.709382540743391]
We describe an algorithm for Minimum Bayes risk (MBR) decoding which gradually grows the number of samples used to estimate the utility.
Our method requires fewer samples and drastically reduces the number of calls to the utility function compared to standard MBR.
We demonstrate the effectiveness of our approach in experiments on three language pairs, using chrF++ and COMET as utility/evaluation metrics.
arXiv Detail & Related papers (2023-11-25T03:38:14Z) - It's MBR All the Way Down: Modern Generation Techniques Through the Lens
of Minimum Bayes Risk [57.641436861482696]
Minimum Bayes Risk (MBR) decoding is a method for choosing the outputs of a machine learning system based not on the output with the highest probability, but the output with the lowest risk (expected error) among multiple candidates.
arXiv Detail & Related papers (2023-10-02T17:47:10Z) - Epsilon Sampling Rocks: Investigating Sampling Strategies for Minimum
Bayes Risk Decoding for Machine Translation [20.749494856466526]
We show how different sampling approaches for generating candidate lists for Minimum Bayes Risk decoding affect performance.
Based on our insights into their limitations, we experiment with the recently proposed epsilon-sampling approach, which prunes away all tokens with a probability smaller than epsilon.
arXiv Detail & Related papers (2023-05-17T00:11:38Z) - Hard Nominal Example-aware Template Mutual Matching for Industrial
Anomaly Detection [74.9262846410559]
textbfHard Nominal textbfExample-aware textbfTemplate textbfMutual textbfMatching (HETMM)
textitHETMM aims to construct a robust prototype-based decision boundary, which can precisely distinguish between hard-nominal examples and anomalies.
arXiv Detail & Related papers (2023-03-28T17:54:56Z) - DC-MBR: Distributional Cooling for Minimum Bayesian Risk Decoding [53.33313271531839]
Minimum Bayesian Risk Decoding (MBR) emerges as a promising decoding algorithm in Neural Machine Translation.
MBR performs poorly with label smoothing, which is surprising as label smoothing provides decent improvement with beam search and improves generality in various tasks.
We show that the issue arises from the un-consistency of label smoothing on the token-level and sequence-level distributions.
arXiv Detail & Related papers (2022-12-08T11:40:31Z) - Rejuvenating Low-Frequency Words: Making the Most of Parallel Data in
Non-Autoregressive Translation [98.11249019844281]
Knowledge distillation (KD) is commonly used to construct synthetic data for training non-autoregressive translation (NAT) models.
We propose reverse KD to rejuvenate more alignments for low-frequency target words.
Results demonstrate that the proposed approach can significantly and universally improve translation quality.
arXiv Detail & Related papers (2021-06-02T02:41:40Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.