Epsilon Sampling Rocks: Investigating Sampling Strategies for Minimum
Bayes Risk Decoding for Machine Translation
- URL: http://arxiv.org/abs/2305.09860v2
- Date: Thu, 18 May 2023 02:24:56 GMT
- Title: Epsilon Sampling Rocks: Investigating Sampling Strategies for Minimum
Bayes Risk Decoding for Machine Translation
- Authors: Markus Freitag and Behrooz Ghorbani and Patrick Fernandes
- Abstract summary: We show how different sampling approaches for generating candidate lists for Minimum Bayes Risk decoding affect performance.
Based on our insights into their limitations, we experiment with the recently proposed epsilon-sampling approach, which prunes away all tokens with a probability smaller than epsilon.
- Score: 20.749494856466526
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Recent advances in machine translation (MT) have shown that Minimum Bayes
Risk (MBR) decoding can be a powerful alternative to beam search decoding,
especially when combined with neural-based utility functions. However, the
performance of MBR decoding depends heavily on how and how many candidates are
sampled from the model. In this paper, we explore how different sampling
approaches for generating candidate lists for MBR decoding affect performance.
We evaluate popular sampling approaches, such as ancestral, nucleus, and top-k
sampling. Based on our insights into their limitations, we experiment with the
recently proposed epsilon-sampling approach, which prunes away all tokens with
a probability smaller than epsilon, ensuring that each token in a sample
receives a fair probability mass. Through extensive human evaluations, we
demonstrate that MBR decoding based on epsilon-sampling significantly
outperforms not only beam search decoding, but also MBR decoding with all other
tested sampling methods across four language pairs.
Related papers
- Balancing Diversity and Risk in LLM Sampling: How to Select Your Method and Parameter for Open-Ended Text Generation [60.493180081319785]
We propose a systematic way to estimate the intrinsic capacity of a truncation sampling method by considering the trade-off between diversity and risk at each decoding step.
Our work provides a comprehensive comparison between existing truncation sampling methods, as well as their recommended parameters as a guideline for users.
arXiv Detail & Related papers (2024-08-24T14:14:32Z) - Turning Up the Heat: Min-p Sampling for Creative and Coherent LLM Outputs [4.122612309805664]
Large Language Models (LLMs) generate text by sampling the next token from a probability distribution over the vocabulary at each decoding step.
We propose min-p sampling, a dynamic truncation method that adjusts the sampling threshold based on the model's confidence by scaling according to the top token's probability.
We conduct extensive experiments on benchmarks including GPQA, GSM8K, and AlpacaEval Creative Writing, demonstrating that min-p sampling improves both the quality and diversity of generated text, particularly at high temperatures.
arXiv Detail & Related papers (2024-07-01T08:37:25Z) - On the True Distribution Approximation of Minimum Bayes-Risk Decoding [3.409873726183299]
Minimum Bayes-risk (MBR) decoding has recently gained renewed attention in text generation.
Previous studies reported that the performance varies by sampling methods.
This study uses anomaly detection to measure the degree of approximation.
arXiv Detail & Related papers (2024-03-31T17:47:22Z) - Linear-time Minimum Bayes Risk Decoding with Reference Aggregation [52.1701152610258]
Minimum Bayes Risk (MBR) decoding is a text generation technique that has been shown to improve the quality of machine translations.
It requires the pairwise calculation of a utility metric, which has quadratic complexity.
We propose to approximate pairwise metric scores with scores calculated against aggregated reference representations.
arXiv Detail & Related papers (2024-02-06T18:59:30Z) - A Block Metropolis-Hastings Sampler for Controllable Energy-based Text
Generation [78.81021361497311]
We develop a novel Metropolis-Hastings (MH) sampler that proposes re-writes of the entire sequence in each step via iterative prompting of a large language model.
Our new sampler allows for more efficient and accurate sampling from a target distribution and (b) allows generation length to be determined through the sampling procedure rather than fixed in advance.
arXiv Detail & Related papers (2023-12-07T18:30:15Z) - Faster Minimum Bayes Risk Decoding with Confidence-based Pruning [8.709382540743391]
We describe an algorithm for Minimum Bayes risk (MBR) decoding which gradually grows the number of samples used to estimate the utility.
Our method requires fewer samples and drastically reduces the number of calls to the utility function compared to standard MBR.
We demonstrate the effectiveness of our approach in experiments on three language pairs, using chrF++ and COMET as utility/evaluation metrics.
arXiv Detail & Related papers (2023-11-25T03:38:14Z) - Provably Convergent Subgraph-wise Sampling for Fast GNN Training [122.68566970275683]
We propose a novel subgraph-wise sampling method with a convergence guarantee, namely Local Message Compensation (LMC)
LMC retrieves the discarded messages in backward passes based on a message passing formulation of backward passes.
Experiments on large-scale benchmarks demonstrate that LMC is significantly faster than state-of-the-art subgraph-wise sampling methods.
arXiv Detail & Related papers (2023-03-17T05:16:49Z) - UniPC: A Unified Predictor-Corrector Framework for Fast Sampling of
Diffusion Models [92.43617471204963]
Diffusion probabilistic models (DPMs) have demonstrated a very promising ability in high-resolution image synthesis.
We develop a unified corrector (UniC) that can be applied after any existing DPM sampler to increase the order of accuracy.
We propose a unified predictor-corrector framework called UniPC for the fast sampling of DPMs.
arXiv Detail & Related papers (2023-02-09T18:59:48Z) - Arithmetic Sampling: Parallel Diverse Decoding for Large Language Models [65.52639709094963]
Methods such as beam search and Gumbel top-k sampling can guarantee a different output for each element of the beam, but are not easy to parallelize.
We present a framework for sampling according to an arithmetic code book implicitly defined by a large language model.
arXiv Detail & Related papers (2022-10-18T22:19:41Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.