Sampling-Based Minimum Bayes Risk Decoding for Neural Machine
Translation
- URL: http://arxiv.org/abs/2108.04718v1
- Date: Tue, 10 Aug 2021 14:35:24 GMT
- Title: Sampling-Based Minimum Bayes Risk Decoding for Neural Machine
Translation
- Authors: Bryan Eikema and Wilker Aziz
- Abstract summary: We show that a sampling-based approximation to minimum Bayes risk (MBR) decoding has no equivalent to the beam search curse.
We also show that it can be beneficial to make use of strategies like beam search and nucleus sampling to construct hypothesis spaces efficiently.
- Score: 20.76001576262768
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In neural machine translation (NMT), we search for the mode of the model
distribution to form predictions. The mode as well as other high probability
translations found by beam search have been shown to often be inadequate in a
number of ways. This prevents practitioners from improving translation quality
through better search, as these idiosyncratic translations end up being
selected by the decoding algorithm, a problem known as the beam search curse.
Recently, a sampling-based approximation to minimum Bayes risk (MBR) decoding
has been proposed as an alternative decision rule for NMT that would likely not
suffer from the same problems. We analyse this approximation and establish that
it has no equivalent to the beam search curse, i.e. better search always leads
to better translations. We also design different approximations aimed at
decoupling the cost of exploration from the cost of robust estimation of
expected utility. This allows for exploration of much larger hypothesis spaces,
which we show to be beneficial. We also show that it can be beneficial to make
use of strategies like beam search and nucleus sampling to construct hypothesis
spaces efficiently. We show on three language pairs (English into and from
German, Romanian, and Nepali) that MBR can improve upon beam search with
moderate computation.
Related papers
- Towards Faster k-Nearest-Neighbor Machine Translation [56.66038663128903]
k-nearest-neighbor machine translation approaches suffer from heavy retrieve overhead on the entire datastore when decoding each token.
We propose a simple yet effective multi-layer perceptron (MLP) network to predict whether a token should be translated jointly by the neural machine translation model and probabilities produced by the kNN.
arXiv Detail & Related papers (2023-12-12T16:41:29Z) - Truncation Sampling as Language Model Desmoothing [115.28983143361681]
Long samples of text from neural language models can be of poor quality.
Truncation sampling algorithms set some words' probabilities to zero at each step.
We introduce $eta$-sampling, which truncates words below an entropy-dependent probability threshold.
arXiv Detail & Related papers (2022-10-27T05:52:35Z) - Arithmetic Sampling: Parallel Diverse Decoding for Large Language Models [65.52639709094963]
Methods such as beam search and Gumbel top-k sampling can guarantee a different output for each element of the beam, but are not easy to parallelize.
We present a framework for sampling according to an arithmetic code book implicitly defined by a large language model.
arXiv Detail & Related papers (2022-10-18T22:19:41Z) - Rethinking the Evaluation of Neural Machine Translation [25.036685025571927]
We propose a novel evaluation protocol, which avoids the effect of search errors and provides a system-level evaluation in the perspective of model ranking.
Our method is based on our newly proposed exact top-$k$ decoding instead of beam search.
arXiv Detail & Related papers (2021-06-29T09:59:50Z) - Rejuvenating Low-Frequency Words: Making the Most of Parallel Data in
Non-Autoregressive Translation [98.11249019844281]
Knowledge distillation (KD) is commonly used to construct synthetic data for training non-autoregressive translation (NAT) models.
We propose reverse KD to rejuvenate more alignments for low-frequency target words.
Results demonstrate that the proposed approach can significantly and universally improve translation quality.
arXiv Detail & Related papers (2021-06-02T02:41:40Z) - Machine Translation Decoding beyond Beam Search [43.27883368285612]
Beam search is the go-to method for decoding auto-regressive machine translation models.
Our aim is to establish whether beam search can be replaced by a more powerful metric-driven search technique.
We introduce a Monte-Carlo Tree Search (MCTS) based method and showcase its competitiveness.
arXiv Detail & Related papers (2021-04-12T10:28:17Z) - If beam search is the answer, what was the question? [78.71330480725668]
We find that beam search enforces uniform information density in text, a property motivated by cognitive science.
We suggest a set of decoding objectives that explicitly enforce this property and find that exact decoding with these objectives alleviates the problems encountered when decoding poorly calibrated language generation models.
arXiv Detail & Related papers (2020-10-06T11:57:03Z) - Best-First Beam Search [78.71330480725668]
We show that the standard implementation of beam search can be made up to 10x faster in practice.
We propose a memory-reduced variant of Best-First Beam Search, which has a similar beneficial search bias in terms of downstream performance.
arXiv Detail & Related papers (2020-07-08T05:56:01Z) - Is MAP Decoding All You Need? The Inadequacy of the Mode in Neural
Machine Translation [15.615065041164623]
We show that some of the known pathologies and biases of NMT are due to MAP decoding and not to NMT's statistical assumptions nor MLE.
We show that an approximation to minimum Bayes risk decoding gives competitive results confirming that NMT models do capture important aspects of translation well in expectation.
arXiv Detail & Related papers (2020-05-20T18:05:51Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.