Model-Based Minimum Bayes Risk Decoding for Text Generation
- URL: http://arxiv.org/abs/2311.05263v2
- Date: Wed, 12 Jun 2024 01:00:09 GMT
- Title: Model-Based Minimum Bayes Risk Decoding for Text Generation
- Authors: Yuu Jinnai, Tetsuro Morimura, Ukyo Honda, Kaito Ariu, Kenshi Abe,
- Abstract summary: Minimum Bayes Risk (MBR) decoding has been shown to be a powerful alternative to beam search decoding.
We show analytically and empirically that the model-based estimate is more promising than the Monte Carlo estimate in text generation tasks.
- Score: 7.442545018959533
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Minimum Bayes Risk (MBR) decoding has been shown to be a powerful alternative to beam search decoding in a variety of text generation tasks. MBR decoding selects a hypothesis from a pool of hypotheses that has the least expected risk under a probability model according to a given utility function. Since it is impractical to compute the expected risk exactly over all possible hypotheses, two approximations are commonly used in MBR. First, it integrates over a sampled set of hypotheses rather than over all possible hypotheses. Second, it estimates the probability of each hypothesis using a Monte Carlo estimator. While the first approximation is necessary to make it computationally feasible, the second is not essential since we typically have access to the model probability at inference time. We propose Model-Based MBR (MBMBR), a variant of MBR that uses the model probability itself as the estimate of the probability distribution instead of the Monte Carlo estimate. We show analytically and empirically that the model-based estimate is more promising than the Monte Carlo estimate in text generation tasks. Our experiments show that MBMBR outperforms MBR in several text generation tasks, both with encoder-decoder models and with large language models.
Related papers
- A Probabilistic Perspective on Unlearning and Alignment for Large Language Models [48.96686419141881]
We introduce the first formal probabilistic evaluation framework in Large Language Models (LLMs)
We derive novel metrics with high-probability guarantees concerning the output distribution of a model.
Our metrics are application-independent and allow practitioners to make more reliable estimates about model capabilities before deployment.
arXiv Detail & Related papers (2024-10-04T15:44:23Z) - mbrs: A Library for Minimum Bayes Risk Decoding [27.207891251898904]
mbrs is a library of Minimum Bayes risk (MBR) decoding.
MBR is a decision rule of text generation tasks that outperforms conventional maximum a posterior (MAP) decoding.
We published our mbrs as an MIT-licensed open-source project, and the code is available on GitHub.
arXiv Detail & Related papers (2024-08-08T02:28:32Z) - Unveiling the Power of Source: Source-based Minimum Bayes Risk Decoding for Neural Machine Translation [30.323103270892734]
Maximum a posteriori decoding, a commonly used method for neural machine translation (NMT), aims to maximize the estimated posterior probability.
Minimum Bayes Risk (MBR) decoding offers an alternative by seeking hypotheses with the highest expected utility.
arXiv Detail & Related papers (2024-06-17T15:13:52Z) - BIRD: A Trustworthy Bayesian Inference Framework for Large Language Models [52.46248487458641]
Predictive models often need to work with incomplete information in real-world tasks.
Current large language models (LLM) are insufficient for such accurate estimations.
We propose BIRD, a novel probabilistic inference framework.
arXiv Detail & Related papers (2024-04-18T20:17:23Z) - On the True Distribution Approximation of Minimum Bayes-Risk Decoding [3.409873726183299]
Minimum Bayes-risk (MBR) decoding has recently gained renewed attention in text generation.
Previous studies reported that the performance varies by sampling methods.
This study uses anomaly detection to measure the degree of approximation.
arXiv Detail & Related papers (2024-03-31T17:47:22Z) - Faster Minimum Bayes Risk Decoding with Confidence-based Pruning [8.709382540743391]
We describe an algorithm for Minimum Bayes risk (MBR) decoding which gradually grows the number of samples used to estimate the utility.
Our method requires fewer samples and drastically reduces the number of calls to the utility function compared to standard MBR.
We demonstrate the effectiveness of our approach in experiments on three language pairs, using chrF++ and COMET as utility/evaluation metrics.
arXiv Detail & Related papers (2023-11-25T03:38:14Z) - Conformal Nucleus Sampling [67.5232384936661]
We assess whether a top-$p$ set is indeed aligned with its probabilistic meaning in various linguistic contexts.
We find that OPT models are overconfident, and that calibration shows a moderate inverse scaling with model size.
arXiv Detail & Related papers (2023-05-04T08:11:57Z) - BRIO: Bringing Order to Abstractive Summarization [107.97378285293507]
We propose a novel training paradigm which assumes a non-deterministic distribution.
Our method achieves a new state-of-the-art result on the CNN/DailyMail (47.78 ROUGE-1) and XSum (49.07 ROUGE-1) datasets.
arXiv Detail & Related papers (2022-03-31T05:19:38Z) - Multivariate Probabilistic Regression with Natural Gradient Boosting [63.58097881421937]
We propose a Natural Gradient Boosting (NGBoost) approach based on nonparametrically modeling the conditional parameters of the multivariate predictive distribution.
Our method is robust, works out-of-the-box without extensive tuning, is modular with respect to the assumed target distribution, and performs competitively in comparison to existing approaches.
arXiv Detail & Related papers (2021-06-07T17:44:49Z) - Probabilistic Gradient Boosting Machines for Large-Scale Probabilistic
Regression [51.770998056563094]
Probabilistic Gradient Boosting Machines (PGBM) is a method to create probabilistic predictions with a single ensemble of decision trees.
We empirically demonstrate the advantages of PGBM compared to existing state-of-the-art methods.
arXiv Detail & Related papers (2021-06-03T08:32:13Z) - Approximate Bayesian inference from noisy likelihoods with Gaussian
process emulated MCMC [0.24275655667345403]
We model the log-likelihood function using a Gaussian process (GP)
The main methodological innovation is to apply this model to emulate the progression that an exact Metropolis-Hastings (MH) sampler would take.
The resulting approximate sampler is conceptually simple and sample-efficient.
arXiv Detail & Related papers (2021-04-08T17:38:02Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.