Hyperparameter-Free Approach for Faster Minimum Bayes Risk Decoding
- URL: http://arxiv.org/abs/2401.02749v2
- Date: Wed, 12 Jun 2024 01:14:45 GMT
- Title: Hyperparameter-Free Approach for Faster Minimum Bayes Risk Decoding
- Authors: Yuu Jinnai, Kaito Ariu,
- Abstract summary: Minimum Bayes-Risk (MBR) decoding is a powerful alternative to beam search decoding for a wide range of text generation tasks.
MBR requires a huge amount of time for inference to compute the objective.
Confidence-based pruning (CBP) has recently been proposed to reduce the inference time in machine translation tasks.
- Score: 5.639904484784127
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Minimum Bayes-Risk (MBR) decoding is shown to be a powerful alternative to beam search decoding for a wide range of text generation tasks. However, MBR requires a huge amount of time for inference to compute the MBR objective, which makes the method infeasible in many situations where response time is critical. Confidence-based pruning (CBP) (Cheng and Vlachos, 2023) has recently been proposed to reduce the inference time in machine translation tasks. Although it is shown to significantly reduce the amount of computation, it requires hyperparameter tuning using a development set to be effective. To this end, we propose Approximate Minimum Bayes-Risk (AMBR) decoding, a hyperparameter-free method to run MBR decoding approximately. AMBR is derived from the observation that the problem of computing the sample-based MBR objective is the medoid identification problem. AMBR uses the Correlated Sequential Halving (CSH) algorithm (Baharav and Tse, 2019), the best approximation algorithm to date for the medoid identification problem, to compute the sample-based MBR objective. We evaluate AMBR on machine translation, text summarization, and image captioning tasks. The results show that AMBR achieves on par with CBP, with CBP selecting hyperparameters through an Oracle for each given computation budget.
Related papers
- Faster WIND: Accelerating Iterative Best-of-$N$ Distillation for LLM Alignment [81.84950252537618]
This paper reveals a unified game-theoretic connection between iterative BOND and self-play alignment.
We establish a novel framework, WIN rate Dominance (WIND), with a series of efficient algorithms for regularized win rate dominance optimization.
arXiv Detail & Related papers (2024-10-28T04:47:39Z) - Efficient Minimum Bayes Risk Decoding using Low-Rank Matrix Completion Algorithms [19.543681023903456]
We formulate Minimum Bayes Risk (MBR) decoding as a matrix completion problem.
We exploit this by only computing a random subset of the scores and efficiently recover the missing entries in the matrix.
Our experimental results on machine translation tasks demonstrate that the proposed method requires 1/16 utility metric computations.
arXiv Detail & Related papers (2024-06-05T00:54:03Z) - Linear-time Minimum Bayes Risk Decoding with Reference Aggregation [52.1701152610258]
Minimum Bayes Risk (MBR) decoding is a text generation technique that has been shown to improve the quality of machine translations.
It requires the pairwise calculation of a utility metric, which has quadratic complexity.
We propose to approximate pairwise metric scores with scores calculated against aggregated reference representations.
arXiv Detail & Related papers (2024-02-06T18:59:30Z) - Generating Diverse and High-Quality Texts by Minimum Bayes Risk Decoding [4.209844101827474]
We develop diversity-promoting decoding algorithms by enforcing diversity objectives to Minimum Bayes-Risk decoding.
We evaluate DMBR and KMBR on a variety of directed text generation tasks using encoder-decoder models and a large language model with prompting.
arXiv Detail & Related papers (2024-01-10T10:23:41Z) - Faster Minimum Bayes Risk Decoding with Confidence-based Pruning [8.709382540743391]
We describe an algorithm for Minimum Bayes risk (MBR) decoding which gradually grows the number of samples used to estimate the utility.
Our method requires fewer samples and drastically reduces the number of calls to the utility function compared to standard MBR.
We demonstrate the effectiveness of our approach in experiments on three language pairs, using chrF++ and COMET as utility/evaluation metrics.
arXiv Detail & Related papers (2023-11-25T03:38:14Z) - It's MBR All the Way Down: Modern Generation Techniques Through the Lens
of Minimum Bayes Risk [57.641436861482696]
Minimum Bayes Risk (MBR) decoding is a method for choosing the outputs of a machine learning system based not on the output with the highest probability, but the output with the lowest risk (expected error) among multiple candidates.
arXiv Detail & Related papers (2023-10-02T17:47:10Z) - Epsilon Sampling Rocks: Investigating Sampling Strategies for Minimum
Bayes Risk Decoding for Machine Translation [20.749494856466526]
We show how different sampling approaches for generating candidate lists for Minimum Bayes Risk decoding affect performance.
Based on our insights into their limitations, we experiment with the recently proposed epsilon-sampling approach, which prunes away all tokens with a probability smaller than epsilon.
arXiv Detail & Related papers (2023-05-17T00:11:38Z) - DC-MBR: Distributional Cooling for Minimum Bayesian Risk Decoding [53.33313271531839]
Minimum Bayesian Risk Decoding (MBR) emerges as a promising decoding algorithm in Neural Machine Translation.
MBR performs poorly with label smoothing, which is surprising as label smoothing provides decent improvement with beam search and improves generality in various tasks.
We show that the issue arises from the un-consistency of label smoothing on the token-level and sequence-level distributions.
arXiv Detail & Related papers (2022-12-08T11:40:31Z) - Integrate Lattice-Free MMI into End-to-End Speech Recognition [87.01137882072322]
In automatic speech recognition (ASR) research, discriminative criteria have achieved superior performance in DNN-HMM systems.
With this motivation, the adoption of discriminative criteria is promising to boost the performance of end-to-end (E2E) ASR systems.
Previous works have introduced the minimum Bayesian risk (MBR, one of the discriminative criteria) into E2E ASR systems.
In this work, novel algorithms are proposed in this work to integrate another widely used discriminative criterion, lattice-free maximum mutual information (LF-MMI) into E2E
arXiv Detail & Related papers (2022-03-29T14:32:46Z) - Understanding the Properties of Minimum Bayes Risk Decoding in Neural
Machine Translation [26.33252528975464]
Neural Machine Translation (NMT) currently exhibits biases such as producing translations that are too short and overgenerating frequent words.
Recent work has tied these shortcomings to beam search.
Eikema & Aziz ( 2020) propose to use Minimum Bayes Risk (MBR) decoding on unbiased samples instead.
arXiv Detail & Related papers (2021-05-18T13:31:05Z) - Adaptive Sampling for Best Policy Identification in Markov Decision
Processes [79.4957965474334]
We investigate the problem of best-policy identification in discounted Markov Decision (MDPs) when the learner has access to a generative model.
The advantages of state-of-the-art algorithms are discussed and illustrated.
arXiv Detail & Related papers (2020-09-28T15:22:24Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.