Epsilon Sampling Rocks: Investigating Sampling Strategies for Minimum
Bayes Risk Decoding for Machine Translation
- URL: http://arxiv.org/abs/2305.09860v2
- Date: Thu, 18 May 2023 02:24:56 GMT
- Title: Epsilon Sampling Rocks: Investigating Sampling Strategies for Minimum
Bayes Risk Decoding for Machine Translation
- Authors: Markus Freitag and Behrooz Ghorbani and Patrick Fernandes
- Abstract summary: We show how different sampling approaches for generating candidate lists for Minimum Bayes Risk decoding affect performance.
Based on our insights into their limitations, we experiment with the recently proposed epsilon-sampling approach, which prunes away all tokens with a probability smaller than epsilon.
- Score: 20.749494856466526
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Recent advances in machine translation (MT) have shown that Minimum Bayes
Risk (MBR) decoding can be a powerful alternative to beam search decoding,
especially when combined with neural-based utility functions. However, the
performance of MBR decoding depends heavily on how and how many candidates are
sampled from the model. In this paper, we explore how different sampling
approaches for generating candidate lists for MBR decoding affect performance.
We evaluate popular sampling approaches, such as ancestral, nucleus, and top-k
sampling. Based on our insights into their limitations, we experiment with the
recently proposed epsilon-sampling approach, which prunes away all tokens with
a probability smaller than epsilon, ensuring that each token in a sample
receives a fair probability mass. Through extensive human evaluations, we
demonstrate that MBR decoding based on epsilon-sampling significantly
outperforms not only beam search decoding, but also MBR decoding with all other
tested sampling methods across four language pairs.
Related papers
- Min P Sampling: Balancing Creativity and Coherence at High Temperature [2.6639520483183867]
min-$p$ is a dynamic truncation sampling method that scales according to the probability of the top candidate token.
We demonstrate that min-$p$ improves the coherence and quality of generated text even at high temperatures.
arXiv Detail & Related papers (2024-07-01T08:37:25Z) - On the True Distribution Approximation of Minimum Bayes-Risk Decoding [3.409873726183299]
Minimum Bayes-risk (MBR) decoding has recently gained renewed attention in text generation.
Previous studies reported that the performance varies by sampling methods.
This study uses anomaly detection to measure the degree of approximation.
arXiv Detail & Related papers (2024-03-31T17:47:22Z) - Linear-time Minimum Bayes Risk Decoding with Reference Aggregation [52.1701152610258]
Minimum Bayes Risk (MBR) decoding is a text generation technique that has been shown to improve the quality of machine translations.
It requires the pairwise calculation of a utility metric, which has quadratic complexity.
We propose to approximate pairwise metric scores with scores calculated against aggregated reference representations.
arXiv Detail & Related papers (2024-02-06T18:59:30Z) - Generating Diverse and High-Quality Texts by Minimum Bayes Risk Decoding [4.209844101827474]
We develop diversity-promoting decoding algorithms by enforcing diversity objectives to Minimum Bayes-Risk decoding.
We evaluate DMBR and KMBR on a variety of directed text generation tasks using encoder-decoder models and a large language model with prompting.
arXiv Detail & Related papers (2024-01-10T10:23:41Z) - A Block Metropolis-Hastings Sampler for Controllable Energy-based Text
Generation [78.81021361497311]
We develop a novel Metropolis-Hastings (MH) sampler that proposes re-writes of the entire sequence in each step via iterative prompting of a large language model.
Our new sampler allows for more efficient and accurate sampling from a target distribution and (b) allows generation length to be determined through the sampling procedure rather than fixed in advance.
arXiv Detail & Related papers (2023-12-07T18:30:15Z) - Faster Minimum Bayes Risk Decoding with Confidence-based Pruning [8.709382540743391]
We describe an algorithm for Minimum Bayes risk (MBR) decoding which gradually grows the number of samples used to estimate the utility.
Our method requires fewer samples and drastically reduces the number of calls to the utility function compared to standard MBR.
We demonstrate the effectiveness of our approach in experiments on three language pairs, using chrF++ and COMET as utility/evaluation metrics.
arXiv Detail & Related papers (2023-11-25T03:38:14Z) - Closing the Curious Case of Neural Text Degeneration [91.22954750742183]
We provide a theoretical explanation for the effectiveness of the truncation sampling.
We show that we can leverage a known source of model errors, the softmax bottleneck, to prove that certain tokens have nonzero true probability.
Our evaluations show that our method outperforms its threshold-based counterparts for low-entropy text generation.
arXiv Detail & Related papers (2023-10-02T23:16:25Z) - Provably Convergent Subgraph-wise Sampling for Fast GNN Training [63.530816506578674]
We propose a novel subgraph-wise sampling method with a convergence guarantee, namely Local Message Compensation (LMC)
LMC retrieves the discarded messages in backward passes based on a message passing formulation of backward passes.
Experiments on large-scale benchmarks demonstrate that LMC is significantly faster than state-of-the-art subgraph-wise sampling methods.
arXiv Detail & Related papers (2023-03-17T05:16:49Z) - UniPC: A Unified Predictor-Corrector Framework for Fast Sampling of
Diffusion Models [92.43617471204963]
Diffusion probabilistic models (DPMs) have demonstrated a very promising ability in high-resolution image synthesis.
We develop a unified corrector (UniC) that can be applied after any existing DPM sampler to increase the order of accuracy.
We propose a unified predictor-corrector framework called UniPC for the fast sampling of DPMs.
arXiv Detail & Related papers (2023-02-09T18:59:48Z) - Arithmetic Sampling: Parallel Diverse Decoding for Large Language Models [65.52639709094963]
Methods such as beam search and Gumbel top-k sampling can guarantee a different output for each element of the beam, but are not easy to parallelize.
We present a framework for sampling according to an arithmetic code book implicitly defined by a large language model.
arXiv Detail & Related papers (2022-10-18T22:19:41Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.