It's MBR All the Way Down: Modern Generation Techniques Through the Lens
of Minimum Bayes Risk
- URL: http://arxiv.org/abs/2310.01387v1
- Date: Mon, 2 Oct 2023 17:47:10 GMT
- Title: It's MBR All the Way Down: Modern Generation Techniques Through the Lens
of Minimum Bayes Risk
- Authors: Amanda Bertsch, Alex Xie, Graham Neubig, Matthew R. Gormley
- Abstract summary: Minimum Bayes Risk (MBR) decoding is a method for choosing the outputs of a machine learning system based not on the output with the highest probability, but the output with the lowest risk (expected error) among multiple candidates.
- Score: 57.641436861482696
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Minimum Bayes Risk (MBR) decoding is a method for choosing the outputs of a
machine learning system based not on the output with the highest probability,
but the output with the lowest risk (expected error) among multiple candidates.
It is a simple but powerful method: for an additional cost at inference time,
MBR provides reliable several-point improvements across metrics for a wide
variety of tasks without any additional data or training. Despite this, MBR is
not frequently applied in NLP works, and knowledge of the method itself is
limited. We first provide an introduction to the method and the recent
literature. We show that several recent methods that do not reference MBR can
be written as special cases of MBR; this reformulation provides additional
theoretical justification for the performance of these methods, explaining some
results that were previously only empirical. We provide theoretical and
empirical results about the effectiveness of various MBR variants and make
concrete recommendations for the application of MBR in NLP models, including
future directions in this area.
Related papers
- Better Instruction-Following Through Minimum Bayes Risk [48.879360919760074]
General-purpose LLM judges capable of human-level evaluation provide a scalable and accurate way of evaluating instruction-following LLMs.
One promising way of leveraging LLM judges for supervision is through Minimum Bayes Risk (MBR) decoding.
MBR decoding uses a reference-based evaluator to select a high-quality output from amongst a set of candidate outputs.
arXiv Detail & Related papers (2024-10-03T18:48:38Z) - Don't Throw Away Data: Better Sequence Knowledge Distillation [60.60698363739434]
In this paper we seek to integrate minimum Bayes risk (MBR) decoding more tightly in knowledge distillation training.
Our experiments on English to German and English to Japanese translation show consistent improvements over strong baseline methods.
arXiv Detail & Related papers (2024-07-15T06:11:18Z) - Unveiling the Power of Source: Source-based Minimum Bayes Risk Decoding for Neural Machine Translation [30.323103270892734]
Maximum a posteriori decoding, a commonly used method for neural machine translation (NMT), aims to maximize the estimated posterior probability.
Minimum Bayes Risk (MBR) decoding offers an alternative by seeking hypotheses with the highest expected utility.
arXiv Detail & Related papers (2024-06-17T15:13:52Z) - Efficient Minimum Bayes Risk Decoding using Low-Rank Matrix Completion Algorithms [19.543681023903456]
We formulate Minimum Bayes Risk (MBR) decoding as a matrix completion problem.
We exploit this by only computing a random subset of the scores and efficiently recover the missing entries in the matrix.
Our experimental results on machine translation tasks demonstrate that the proposed method requires 1/16 utility metric computations.
arXiv Detail & Related papers (2024-06-05T00:54:03Z) - Linear-time Minimum Bayes Risk Decoding with Reference Aggregation [52.1701152610258]
Minimum Bayes Risk (MBR) decoding is a text generation technique that has been shown to improve the quality of machine translations.
It requires the pairwise calculation of a utility metric, which has quadratic complexity.
We propose to approximate pairwise metric scores with scores calculated against aggregated reference representations.
arXiv Detail & Related papers (2024-02-06T18:59:30Z) - Hyperparameter-Free Approach for Faster Minimum Bayes Risk Decoding [5.639904484784127]
Minimum Bayes-Risk (MBR) decoding is a powerful alternative to beam search decoding for a wide range of text generation tasks.
MBR requires a huge amount of time for inference to compute the objective.
Confidence-based pruning (CBP) has recently been proposed to reduce the inference time in machine translation tasks.
arXiv Detail & Related papers (2024-01-05T11:02:08Z) - How to Prune Your Language Model: Recovering Accuracy on the "Sparsity
May Cry'' Benchmark [60.72725673114168]
We revisit the question of accurate BERT-pruning during fine-tuning on downstream datasets.
We propose a set of general guidelines for successful pruning, even on the challenging SMC benchmark.
arXiv Detail & Related papers (2023-12-21T03:11:30Z) - Faster Minimum Bayes Risk Decoding with Confidence-based Pruning [8.709382540743391]
We describe an algorithm for Minimum Bayes risk (MBR) decoding which gradually grows the number of samples used to estimate the utility.
Our method requires fewer samples and drastically reduces the number of calls to the utility function compared to standard MBR.
We demonstrate the effectiveness of our approach in experiments on three language pairs, using chrF++ and COMET as utility/evaluation metrics.
arXiv Detail & Related papers (2023-11-25T03:38:14Z) - Direct Preference Optimization for Neural Machine Translation with Minimum Bayes Risk Decoding [15.309135455863753]
We show how the recently developed Reinforcement Learning technique, Direct Preference Optimization (DPO), can fine-tune Multilingual Large Language Models without additional computation.
Our method uses only a small monolingual fine-tuning set and yields significantly improved performance on multiple NMT test sets compared to MLLMs without DPO.
arXiv Detail & Related papers (2023-11-14T18:43:51Z) - Integrate Lattice-Free MMI into End-to-End Speech Recognition [87.01137882072322]
In automatic speech recognition (ASR) research, discriminative criteria have achieved superior performance in DNN-HMM systems.
With this motivation, the adoption of discriminative criteria is promising to boost the performance of end-to-end (E2E) ASR systems.
Previous works have introduced the minimum Bayesian risk (MBR, one of the discriminative criteria) into E2E ASR systems.
In this work, novel algorithms are proposed in this work to integrate another widely used discriminative criterion, lattice-free maximum mutual information (LF-MMI) into E2E
arXiv Detail & Related papers (2022-03-29T14:32:46Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.